DEV Community: Minoltan Issack

4 Ways to Save Your AI Tokens 10x

Minoltan Issack — Sun, 31 May 2026 02:33:22 +0000

Token management is the silent cost killer in every AI workflow. Here’s how to outsmart it.

You’ve built the AI app. The prompts are clever, the outputs look great and then the billing dashboard loads. Token costs are spiraling. If this sounds familiar, you’re not alone. As LLM-powered workflows scale, token consumption becomes the single biggest lever between a profitable AI product and an expensive side project.

In my recent deep-dive, four practical strategies were laid out that can reduce your token usage by up to 10x without sacrificing output quality. This blog walks through each one, explains the underlying mechanics, and gives you a mental architecture for applying them in real production systems.

First, What Is a Token Really?

Before optimizing, you need to understand what you’re measuring. A token is not a word, it’s a chunk of text as the model sees it. Roughly speaking, one token ≈ 4 characters in English, or about ¾ of a word. The sentence “The quick brown fox” is approximately 5 tokens.

The critical insight: you’re billed on both input AND output tokens. Your system prompt, the entire conversation history, any retrieved documents (RAG chunks), and the model’s reply all of it counts. This is why large AI workflows can burn through budgets so fast: the context window fills up with tokens you don’t even realize you’re paying for.

“It’s not about writing shorter prompts. It’s about writing smarter ones.”

1. Prompt Design — Say More With Less

The first and most impactful strategy is also the most underestimated: redesigning your prompts from scratch with token efficiency as a first-class constraint. Most prompts are written for human readability full sentences, polite framing, repeated context. Models don’t need any of that.

The Over-Verbose Trap

Here’s what a typical “human-friendly” prompt looks like compared to a token-optimized one:

❌ VERBOSE (approx. 65 tokens)
-----------------------------------------------
Hello! I hope you're doing well. I have a task
for you today. I need you to please summarize
the following article for me in a way that is
easy to understand. Please keep it concise
and make sure to include the main points.
Here is the article: [article text]

✅ OPTIMIZED (approx. 18 tokens)
-----------------------------------------------
Summarize this article. Key points only.
3 bullets max. Be concise.

[article text]

That’s a 72% reduction in prompt overhead before even touching the content. Multiply this across thousands of API calls per day and you’re looking at enormous cost differences.

Key Prompt Design Principles

💡 Prompt Engineering Rules for Token Efficiency

Use imperative instructions — “Summarize in 3 bullets” not “Could you please provide a summary?”
Avoid pleasantries — Models ignore them; you still pay for them.
Specify output format upfront — “Respond only in JSON” prevents verbose explanations.
Cut redundant context — Don’t repeat info the model already has from earlier turns.
Use structured delimiters — XML tags or triple backticks help models parse faster with fewer clarification tokens.

Think of your prompt as a spec sheet, not a letter. Remove anything a machine doesn’t need to perform the task correctly.

2. Prompt Caching — Pay Once, Reuse Many Times

Prompt caching is one of the most powerful and least talked about token-saving features available in modern LLM APIs (supported by Anthropic Claude, among others). The idea is simple: if a large part of your prompt stays the same across requests, cache it so you don’t pay to re-process it every single time.

When Should You Cache?

Caching pays off most when you have a large static prefix a system prompt with detailed instructions, a few-shot example block, or a knowledge base document that gets appended to every request. If your system prompt is 500–2000 tokens and you’re making dozens or hundreds of calls per hour, caching delivers immediate savings.

In RAG (Retrieval-Augmented Generation) architectures, this is especially powerful. Instead of inserting 1,000 tokens of retrieved document context into every request, you cache the context once and reference it across multiple queries, a game-changer for document Q&A systems.

3. Model Selection — Right Tool, Right Job

This is the strategy that sounds obvious but is violated constantly in production: not every task needs your most powerful model. Using a frontier model (like Claude Opus or GPT-4o) for every single request is like hiring a senior architect to hang a picture frame. It works, but the cost-to-value ratio is terrible.

The Cascading Model Pattern

A powerful production architecture is the cascading router: a small, cheap model first evaluates the complexity of the incoming request. If it’s simple, it handles it directly. If it’s complex, it escalates to the frontier model. This gives you the economics of small models for the majority of traffic, with frontier quality reserved for the cases that truly need it.

4. Output Discipline — Control What Comes Back

Most developers obsess over input tokens. Far fewer think about output tokens the tokens the model generates in its response. This is a major blind spot, because output tokens are typically priced higher than input tokens, and a verbose model can silently drain your budget.

Without explicit constraints, models tend to be generous they explain their reasoning, offer alternatives, add caveats, summarize what they just said, and generally say more than you asked for. Every one of those extra sentences is a billed token.

How to Constrain Output

❌ No Output Discipline:
# Result: model explains what JSON is, writes the JSON,
# then summarizes what it wrote. ~300 tokens.
"Extract the key data from this invoice."

✅ With Output Discipline:
# Result: pure JSON, nothing else. ~60 tokens.
"Extract from this invoice. Respond ONLY in valid JSON.
Schema: {vendor, date, amount, line_items[]}
No explanation. No preamble. No markdown."

The max_tokens Parameter

Beyond prompting, you have a hard lever: the max_tokens parameter in your API call. Setting this aggressively for tasks where you know the output structure forces the model to be concise. For a classification task that returns one of five labels, setting max_tokens: 10 is entirely reasonable.

// Sentiment classification — output is ONE word
const response = await anthropic.messages.create({
  model: "claude-haiku-4-5", // small model
  max_tokens: 5, // hard cap
  messages: [{
    role: "user",
    content: `Classify sentiment. Reply ONLY with:
POSITIVE, NEGATIVE, or NEUTRAL.

Text: "${userText}"`
  }]
});

Output Formats That Save Tokens

Structured output formats tend to be more token-efficient than prose. A comparison:

Putting It All Together: The Token-Efficient Stack

These four strategies aren’t independent they compound. Here’s how a production-grade, token-efficient AI pipeline looks when all four are applied simultaneously:

Why Token Efficiency Is an Engineering Skill, Not a Hack

What I find most compelling about this framework is that it reframes token optimization not as “doing less” but as engineering precision. Just like a good software engineer writes code that’s not just functional but efficient minimal allocations, no unnecessary computations a good AI engineer writes prompts and architectures that extract maximum value from every token.

The analogy that resonates with me: token management is to AI engineering what database query optimization is to backend engineering. You can build something that works without it. But if you want to build something that scales, you have to think about it from day one.

As AI models get cheaper over time, some of this becomes less critical. But the habits and patterns you build now precise prompting, smart caching, model routing will translate directly into better system design even as the underlying economics shift.

The 4-Point Takeaway

Write prompts like specs, not letters. Cut pleasantries, redundancy, and verbose context. Every unnecessary word costs money at scale.
Prompt Caching: Identify the static prefix of your prompts system instructions, few-shot examples, RAG context and cache them. Pay once, reuse hundreds of times.
Model Selection: Build a routing layer. Route simple tasks (classification, extraction, summarization) to small fast models. Reserve frontier models for tasks where quality is non-negotiable.
Output Discipline: Tell the model exactly what format you want and set max_tokens aggressively. Output tokens are priced at a premium every verbose explanation is a cost you didn’t ask for.

Token efficiency is not about being cheap it’s about being precise. The best AI engineers are the ones who know exactly what they need from a model, ask for exactly that, and get it back in exactly the right shape. That precision is the craft.

To stay informed on the latest technical insights and tutorials, connect with me on Medium and LinkedIn. For professional inquiries or technical discussions, please contact me via email. I welcome the opportunity to engage with fellow professionals and address any questions you may have.

From Prompt Engineering to Context Engineering: The AI Revolution You Need to Know About

Minoltan Issack — Tue, 26 May 2026 02:19:47 +0000

How the way we talk to AI is fundamentally changing — and why it matters for your future

The Day I Realized Prompts Weren’t Enough

This is the exact moment thousands of developers, data scientists, and AI engineers experienced in 2025. We had gotten really good at prompt engineering — the art of asking AI the right questions in the right way. But something was missing. The AI could reason brilliantly, but it couldn’t see our world.

That’s when everything changed. Welcome to the era of Context Engineering.

What Actually Happened? The Shift Nobody Saw Coming

In July 2025, Gartner made a bold declaration: “context engineering is in, and prompt engineering is out,” predicting it will appear in 80% of AI tools by 2028. This wasn’t just another tech buzzword — it was a fundamental shift in how we architect AI systems.

But what does that actually mean?

Let me tell you a story.

The Restaurant Analogy: Understanding Context Engineering

Imagine you walk into a restaurant and tell the waiter: “I want something delicious.”

That’s prompt engineering. You gave an instruction, but the waiter has no context. They don’t know if you’re vegetarian, allergic to nuts, whether you prefer spicy food, if you’re here for a business lunch or a romantic dinner, or even what cuisine you typically enjoy.

Now imagine walking into a restaurant where:

The waiter knows your dietary preferences
They remember what you ordered last time
They can see the current menu and what’s available in the kitchen
They understand it’s your anniversary (from your reservation notes)
They know the budget range you typically work with
They have access to reviews of dishes from customers with similar tastes

That’s context engineering. Same request, completely different outcome.

Context engineering is the practice of architecting the entire information environment for AI agents — not just the prompt, but memory, tools, retrieval, and state. It’s about giving AI systems the situational awareness they need to act with relevance and precision.

The Five Layers of Context: Building the AI’s World

Think of context engineering like building a house for your AI to live in. You’re not just giving it instructions; you’re creating an entire environment. Here’s what goes into that environment:

1. The Memory Layer — What the AI Remembers

Just like you remember conversations with your friends, AI systems need memory. But not just any memory — structured, organized memory.

Short-term memory : What happened in this conversation?
Working memory : What am I actively thinking about right now?
Long-term memory : What do I know about this user, this company, this domain?

In 2026, hierarchical memory architectures have become a major focus, enabling models to process and remember vast amounts of information over extended interactions through layered memory systems.

2. The Knowledge Layer — What the AI Knows

This is where things get interesting. Instead of hoping the AI “knows” something from its training, you explicitly give it access to:

Your company’s internal documents
Industry-specific terminology
Product specifications
Historical data and patterns
Regulatory requirements
Best practices and guidelines

Think of it as building a custom library for your AI, filled with exactly the books it needs to do its job.

3. The Tool Layer — What the AI Can Do

Context isn’t just about information — it’s about capability. Modern AI systems need access to tools:

Can it query your database?
Can it send emails or create calendar events?
Can it fetch real-time data from APIs?
Can it execute code or run calculations?

The Model Context Protocol (MCP), now governed by the Agentic AI Foundation under the Linux Foundation, has become the universal standard for connecting AI agents to enterprise tools, with 97M+ monthly SDK downloads.

4. The Rules Layer — What the AI Should and Shouldn’t Do

This is about governance and guardrails:

What data can the AI access?
What actions require human approval?
What tone and style should it use?
What are the security and compliance requirements?

5. The State Layer — Where the AI Is Right Now

Context is dynamic. The AI needs to know:

What task is currently being performed?
What stage of the workflow are we in?
What just happened, and what comes next?
What’s the current environment (production, testing, etc.)?

The Architecture: How It All Works Together

Let me break down the architecture in a way that makes sense.

The Old Way: Prompt Engineering

Simple, linear, limited.

The New Way: Context Engineering

This is what Phil Schmid calls “context operations”: Context Offloading (moving information into external systems), Context Reduction (compressing old information), and dynamic context assembly.

Real-World Example: The Customer Support Revolution

Let’s see this in action with a real scenario.

User asks : “Why was my order delayed?”

With Prompt Engineering: AI generates a generic response about possible delivery delays, shipping times, etc.

With Context Engineering: The AI has access to:

Customer Data : Order #12345, shipped on May 10, expected May 13
Logistics Data : Weather delay in Memphis distribution center
Policy Knowledge : Standard compensation is 10% discount for delays over 2 days
Tools : Can check real-time tracking, issue refunds, send updated ETAs
Company Tone : Friendly, empathetic, solution-oriented

Result : “I see your order #12345 was affected by severe weather in our Memphis distribution center. It’s now scheduled to arrive tomorrow, May 14. Since this is beyond our standard delivery window, I’ve applied a 10% discount to your account. Would you like me to send a detailed tracking update to your email?”

See the difference? Same question, completely different intelligence level.

The Four Operations: How to Do Context Engineering Right

Context engineering breaks down into four key operations:

1. Context Offloading

Move information out of prompts into structured external systems — databases, vector stores, knowledge graphs. Don’t stuff everything into a single prompt.

2. Context Reduction

Compress and summarize information intelligently. Use semantic search to find only what’s relevant. Prevent “context rot” where old, irrelevant information clutters the window.

3. Context Injection

Dynamically assemble the right context at runtime based on the query. This is where RAG (Retrieval-Augmented Generation) systems shine.

4. Context Management

Version control your context. Test it. Govern it. Treat context as a first-class data product, not an afterthought.

Common Mistakes: What Not to Do

Let me save you some pain. Here are the mistakes everyone makes:

Context Dumping : Throwing everything into the prompt and hoping the AI figures it out. This is like giving someone a 500-page manual when they asked for a quick answer.
Static Context : Using the same context for every query. Context should be dynamic and query-specific.
No Governance : Giving the AI access to everything without proper access controls or audit trails.
Ignoring Memory : Treating every interaction as if it’s the first one. Users expect continuity.
Over-Engineering : Building complex context systems for simple tasks that don’t need them. Start simple, scale as needed.

The Tools of the Trade

If you’re getting into context engineering, here are the tools you should know:

LangChain & LlamaIndex : For building RAG pipelines and context management systems
Vector Databases (Pinecone, Weaviate, Qdrant): For semantic search and knowledge retrieval
MCP (Model Context Protocol): The emerging standard for connecting AI to enterprise tools
Prompt Flow & Haystack : For orchestrating complex context assembly workflows

The Future: Where We’re Headed

In 2026, the trend is toward “knowledge runtimes” that manage retrieval, verification, reasoning, access control, and audit trails as integrated operations — like how container orchestrators manage application workloads.

We’re also seeing the emergence of Cognitive AI architectures that formalize human-like memory models with discrete memory modules for short-term, working, and long-term memory.

The future isn’t about better prompts. It’s about better context architectures.

Your Takeaway: What You Should Do Next

Here’s my advice, whether you’re a developer, data scientist, business leader, or curious learner:

Start thinking in systems, not prompts : When you interact with AI, ask yourself: “What context does this system need to be truly intelligent?”
Learn the fundamentals : Understand RAG, vector databases, embedding models, and semantic search. These are the building blocks.
Experiment with context patterns : Try different ways of structuring and injecting context. There’s no one-size-fits-all solution.
Treat context as infrastructure : Organizations that treat context engineering as core infrastructure rather than an afterthought report dramatically different outcomes.
Stay updated : This field is evolving rapidly. Follow developments in MCP, agentic AI frameworks, and enterprise AI architectures.

What’s your experience with AI systems? Have you hit the limitations of prompt engineering? I’d love to hear your thoughts in the comments below.

If this article helped you understand context engineering, give it a clap 👏 and share it with someone who’s working with AI.

Why I Stopped Writing API Integrations and Started Using MCP

Minoltan Issack — Mon, 18 May 2026 02:23:31 +0000

The story of how every developer eventually hits the same wall — and what finally fixes it

Let’s Start From the Very Beginning

Forget AI for a moment.

You have a MySQL database. Inside it lives your company’s data — customer records, orders, inventory, whatever. You want to work with that data. So what do you do?

You open a MySQL client. You connect directly. You run queries. Simple.

This works perfectly — as long as you’re the only one who needs the data.

But then your frontend team needs the same data. Your mobile app needs it. Your analytics dashboard needs it. Your reporting tool needs it.

Now everyone is connecting directly to MySQL. Credentials are scattered everywhere. If the database schema changes, every single client breaks. There’s no security layer, no rate limiting, no caching.

Direct database connections don’t scale.

The API Layer Enters the Picture

So the smart engineering move is: you put an API in the middle.

You write a Python (or Node.js) backend. It connects to MySQL. It exposes clean endpoints. Now nobody talks to the database directly — they talk to the API.

This is much better. One database connection. One place to apply security, validation, and business logic. If the database changes, you update the API once and all clients keep working.

This pattern is so fundamental that every developer learns it early in their career. The API becomes the single source of truth between your data and the world.

Great. We’ve built solid engineering foundations. Now let’s bring AI into the picture.

Adding LLMs to the Mix

It’s 2024. Your company wants an AI assistant. You want it to answer questions about your data — the very data sitting in that MySQL database your API already serves.

So you think: let me connect my LLM to my API.

You start with one model. Let’s say Gemini.

You write a client file. It calls your Python API, gets data, formats it, and sends it to the Gemini API with your API key. The LLM reads the data and responds intelligently.

It works! Your AI assistant can now answer questions about your customer data.

You show it to the team. Everyone’s excited.

Then someone asks: “Can we also try Claude? I heard it’s better for reasoning.”

More LLMs, More Clients

Sure. You write another client file. This time for Claude.

# claude_client.py

import requests
import anthropic

data = requests.get("http://localhost:8000/customers").json()

client = anthropic.Anthropic(api_key="YOUR_CLAUDE_KEY")

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Here is our customer data: {data}\n\n{user_question}"
    }]
)
print(response.content[0].text)

Then the CTO says: “We should also benchmark against OpenAI.”

# openai_client.py

import requests
from openai import OpenAI

data = requests.get("http://localhost:8000/customers").json()

client = OpenAI(api_key="YOUR_OPENAI_KEY")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"Here is our customer data: {data}\n\n{user_question}"
    }]
)
print(response.choices[0].message.content)

Now you have three client files. Each one:

Calls your Python API to fetch data
Formats the data for that specific LLM
Sends it to the LLM’s API with that model’s unique syntax
Parses the response in that model’s unique response format

Three files. Manageable. You handle it.

Things are working. But then your API grows.

The Problem Starts Here

Your Python API started simple. One file. A few endpoints.

But over time, the team adds features. New data sources. New business logic. New modules.

Six months later, your backend isn’t one file anymore.

It’s 25 Python files.

Now here’s the question that should keep you up at night:

Your three LLM clients — Gemini, Claude, OpenAI — which of these 25 files do they know about?

Only the ones you hardcoded into them when you wrote them.

The customers.py endpoint, maybe orders.py. Whatever you thought to include back then.

But forecasts.py? campaigns.py? leads.py? The client files have no idea those even exist.

The Real Nightmare: Every Client Needs to Be Updated

Let’s say you add a new module — contracts.py. It's important. Your LLM assistant should definitely be able to query it.

What do you have to do?

You open gemini_client.py. Add the new endpoint. Test it. Deploy. Then open claude_client.py. Add the same endpoint. Test it. Deploy. Then open openai_client.py. Same thing again.

Three files updated for one new backend module.

Now imagine this happening every week. Every sprint. Every time a new Python file gets added to the backend.

And what if you add a fourth LLM? A fifth? Maybe you want to try a local Ollama model. Or a fine-tuned internal model. Every new LLM means another client file that needs to be kept in sync with 25 (and growing) backend files.

This is the wall every team eventually hits. You started with a clean, sensible architecture — MySQL → API → LLM clients. But as the system grows, the number of connections explodes. You’re writing the same integration logic over and over. You’re updating files constantly. One missed update means your AI gives stale or incomplete answers.

Let’s visualize how bad this gets:

Each client knows a different subset of your backend. They’re out of sync with each other. They’re all out of date with the actual backend. And every new LLM you add starts at zero — it knows nothing until you manually wire it up.

This is called the N × M problem.

N = number of LLM clients (Gemini, Claude, OpenAI, Ollama, your custom model…)
M = number of backend modules (25 Python files, growing…)
N × M = the total number of integrations you have to write and maintain

3 LLMs × 25 files = 75 custom integration points to maintain. And that number only goes up.

There has to be a better way.

What If There Was One Standard?

Step back and think about what all these clients are actually doing. They’re all:

Discovering what data/tools are available
Fetching or invoking those capabilities
Passing results back to an LLM

The logic is identical. Only the format differs — because each LLM has its own proprietary way of describing tools and functions.

What if we created one universal standard for describing capabilities — and any LLM that speaks that standard could automatically discover and use all 25 of your backend modules?

What if, when you added contracts.py, you only had to register it in one place — and all your LLM clients instantly knew about it?

That’s the idea behind Model Context Protocol (MCP).

Enter MCP: One Standard to Connect Them All

MCP (Model Context Protocol) was introduced by Anthropic in November 2024. The concept is elegant: instead of each LLM client talking directly to your backend in its own custom way, you put a standardized server in the middle.

Your backend exposes its capabilities through the MCP Server. Every LLM client — whether it’s Claude, Gemini, OpenAI, or a local Ollama model — speaks to that one server in the same universal language.

Now when you add contracts.py:

You register it once in the MCP Server
All LLM clients automatically discover and use it
Zero updates to any client file

When you add a brand new LLM — Llama, Mistral, whatever comes next:

The new client just connects to the existing MCP Server
It immediately has access to all 25 modules
Zero integration code to write

The N × M problem collapses to N + M. You maintain your backend modules (M) separately and your LLM clients (N) separately. The MCP Server is the bridge that connects them all.

How MCP is Structured: The Three Players

MCP defines exactly three roles in every interaction:

The Host

The AI application your user interacts with. Claude Desktop, VS Code with Copilot, Cursor, or your own custom chatbot. The Host contains the LLM and manages the conversation.

The Client

Lives inside the Host. Acts as the translator — converts the LLM’s requests into the MCP protocol format (JSON-RPC 2.0), sends them to the Server, and brings responses back. Each Client has a 1:1 connection with one MCP Server, but a Host can run multiple Clients simultaneously.

The Server

This is what you build. It wraps your backend capabilities — your 25 Python files — and exposes them through three standardized primitives: Resources , Tools , and Prompts.

The Three Primitives: Resources, Tools, and Prompts

This is the core design of MCP — and the mental model is beautifully simple.

Resources — Read This

Resources give the LLM read-only access to data. No side effects. No changes. Just information retrieval.

Going back to our story: your customers.py and reports.py modules — if an LLM just needs to read customer data to answer a question, you'd expose those as Resources.

@server.resource("customers/list")
async def get_customers() -> Resource:
    data = await db.fetch_all_customers()
    return Resource(content=data)

Resources are perfect for RAG-style workflows. Instead of dumping your entire database into the prompt upfront, you expose data as addressable resources the LLM can fetch on demand — much more efficient.

Resources = Query, never modify

Tools — Do This

Tools are functions the LLM can invoke to take real actions. This is where MCP becomes truly powerful for agentic use cases:

Create a new order in the database
Send an email notification
Update an inventory count
Generate and export a report
Trigger a shipping webhook

Each Tool has a typed JSON Schema so the LLM knows exactly what arguments to pass. The LLM decides when a Tool is needed, emits a structured call, and the MCP Client routes it to the Server for execution.

@server.tool("create_order")
async def create_order(customer_id: str, items: list, total: float) -> ToolResult:
    order = await db.insert_order(customer_id, items, total)
    return ToolResult(content=f"Order {order.id} created successfully")

Tools = Take action, produce side effects

Prompts — Use This Template

Prompts are reusable, parameterized templates that standardize common LLM workflows. Users select them explicitly.

For example: a “generate monthly sales report” prompt that takes month and year as parameters and assembles the perfect system message for that task — every time, consistently.

Prompts = Standardize, make repeatable

The golden rule: Resources query. Tools act. Prompts standardize.

Back to Our Story: The Before and After

Before MCP — what our system looked like:

After MCP — what it looks like now:

The architecture went from a tangled web of N × M custom connections to a clean hub-and-spoke model. Your backend team works on the MCP Server. Your AI/client team works on the LLM integrations. They no longer need to constantly sync up every time something changes.

A Quick Example: How It Feels in Practice

Here’s the kind of conversation that becomes possible once everything is connected through MCP:

User: “Show me all customers who placed orders last month but haven’t received their shipment yet, then draft a follow-up email for each of them.”

Without MCP, you’d need to manually wire orders.py, customers.py, shipping.py, and a notification tool into whichever LLM client you're using.

With MCP, the LLM:

Calls the orders Resource → gets last month's orders
Calls the customers Resource → gets customer details
Calls the shipping Resource → checks shipment status
Filters the unshipped ones
Calls the draft_email Tool → generates personalized follow-ups
Reports back to you with a summary

All of this using standardized MCP calls. No custom glue code. And if tomorrow you want to run this same workflow with Claude instead of Gemini? Just point Claude’s client at the same MCP Server. Done.

The Bigger Picture

MCP was introduced by Anthropic in November 2024 and was inspired by the Language Server Protocol (LSP) — the standard that lets code editors like VS Code support dozens of programming languages without each language needing its own custom editor plugin.

MCP does the same thing for AI: instead of every LLM needing a custom plugin for every tool, there’s one protocol that all of them speak.

Since its release, it has been adopted by OpenAI, Google DeepMind, and a rapidly growing ecosystem of developer tools. In December 2025, Anthropic donated the protocol to the Linux Foundation, making it vendor-neutral and community-governed. Today (May 2026), there are 200+ community-built MCP servers for tools like GitHub, Slack, PostgreSQL, Stripe, Figma, and Docker.

It has moved from “interesting Anthropic experiment” to the de facto infrastructure standard for agentic AI systems.

Where to Start

If this story resonates with you and you’re ready to stop writing N × M integrations:

Browse the MCP server registry at modelcontextprotocol.io — there's likely already a server for the tools you use
Install Claude Desktop and connect a community MCP server to experience it as a user first
Build a simple MCP server using the Python SDK (pip install mcp) — wrap one of your existing API endpoints
Connect it to your preferred LLM client — the same server will work with Claude, GPT, or any local model

The documentation is clean, the SDKs are mature, and the community is extremely active.

Final Thoughts

The journey from a MySQL direct connection → REST API → LLM client → MCP isn’t just a technical evolution. It’s a story that every developer who works with AI will live through.

You’ll start simple. You’ll add more LLMs. You’ll add more backend modules. And one day you’ll look at your codebase and realize you’re maintaining 75 custom integration points just to keep three AI clients in sync with a growing backend.

That’s the moment MCP starts making complete sense.

It’s not a fancy new concept. It’s the same lesson we already learned with APIs — you don’t let every client talk directly to the database. You put a standard interface in the middle.

MCP is that standard interface. But for AI.

Why RAG is the Must-Have AI Skill in 2026: 11 Types Explained!

Minoltan Issack — Tue, 12 May 2026 02:23:49 +0000

If you’ve built an AI chatbot or LLM-powered application recently, you’ve probably hit this wall:

Let’s ask question in chatgpt, “Hey what is my company policy?”, it simply replies, “I don’ know or give some generic answers of common company policies”

Your model gives beautiful, fluent answers… that are completely wrong. Or outdated. Or hallucinated.

Welcome to the context problem — and why Retrieval-Augmented Generation (RAG) has become the most critical architecture pattern in modern AI.

In this deep dive, we’ll explore the 11 types of RAG systems that are transforming how AI accesses, processes, and generates information. Whether you’re building customer support bots, enterprise search systems, or AI assistants, understanding these architectures isn’t optional anymore — it’s essential.

The Problem: Why LLMs Alone Aren’t Enough

Large Language Models are incredible. GPT-4, Claude, Gemini — they can write code, explain concepts, and hold conversations that feel eerily human.

But they have three fundamental limitations:

1. Knowledge Cutoff : They only know what they were trained on. Ask GPT-4 about something from last week? Blank stare.

2. Hallucination : When they don’t know something, they confidently make it up. Your legal chatbot citing non-existent case law? That’s hallucination.

3. No Access to Private Data : Your company’s internal documents, customer records, proprietary research — the model has never seen any of it.

This is where RAG comes in.

What Is RAG?

Retrieval-Augmented Generation is deceptively simple:

Instead of asking the LLM to answer from memory, you give it a search engine.

Here’s the flow:

The magic? The LLM doesn’t need to “know” your financials. It just needs to read and synthesize what the retriever found.

Think of it like this: The LLM is the smart analyst. The retriever is their research assistant.

Why RAG Matters in 2026

The RAG landscape has evolved dramatically. What worked in 2024 — basic vector search and prompt stuffing — is now considered “naive RAG.”

Here’s what changed:

Hybrid retrieval (combining semantic + keyword search) is now table stakes
Real-time data integration has moved from nice-to-have to mandatory
Multi-modal RAG (text + images + code) is becoming mainstream
Agentic RAG (where the model controls its own retrieval) is production-ready

In 2026, naive RAG is seen as a prototype at best and a liability at worst. The bottleneck has shifted from generation quality to retrieval precision.

If your retriever pulls three irrelevant paragraphs and misses the one critical sentence, even the best LLM will hallucinate.

Let’s dive into the 11 RAG architectures you need to know.

Type 1: Naive RAG — The Foundation

What It Is

The “Hello World” of RAG. The simplest possible implementation — embed your documents, store them, search by similarity, and feed results to an LLM.

Pipeline:

How it works:

Convert your documents into embeddings (vector representations)
Store them in a vector database (Pinecone, Weaviate, Chroma)
When a query comes in, convert it to an embedding
Find the K most similar documents (cosine similarity)
Jam those documents into the LLM prompt
Generate answer

Why It Works

Embeddings capture the meaning of text, not just keywords. So when a user asks “How do I cancel my subscription?”, it can match documents that say “terminate your account” — even though the words are different. It’s fast, easy to set up, and effective for straightforward use cases.

When to Use It

Proof of concepts and quick demos
Small, homogeneous datasets (e.g., product documentation)
Low-stakes applications where occasional errors are acceptable

Real-World Example

A customer support chatbot that searches FAQs to answer common questions like “How do I reset my password?” Works well when questions match FAQ phrasing, fails when they don’t.

Limitation: If a user asks to compare Q3 2025 revenue vs. Q3 2024, vector search might return the wrong year’s data because the semantic distance between “2024” and “2025” is negligible to an embedding model. One wrong digit, one wrong answer.

Type 2: Advanced RAG with Re-Ranking

What It Is

Naive RAG with a second-stage precision filter. It casts a wide net first, then deeply scores each result to keep only the most relevant ones before sending them to the LLM.

Pipeline:

How it works:

First pass: Cast a wide net with vector search — retrieve 50 candidates
Second pass: Run each candidate through a cross-encoder model that scores how well it actually answers the query
Keep only the top 5 highest-scoring results
Feed these to the LLM for answer generation

Why It Works

Vector search is fast but approximate — it finds documents that are conceptually close, not necessarily the most precisely relevant. Cross-encoders are slower but far more accurate because they evaluate the query and document together as a pair, not separately. The two-stage approach gives you the best of both: speed from vector search, precision from re-ranking.

When to Use It

High-precision requirements (legal, medical, financial)
Queries that need exact matches (product codes, policy numbers, citations)
When retrieval quality directly impacts business outcomes

Real-World Example

A legal research tool where missing a relevant case citation could cost millions. The re-ranker ensures that when the model says “no relevant cases found,” it’s actually true — not just a gap in the initial retrieval pass.

Type 3: Hybrid Search RAG

What It Is

A retrieval system that runs two search methods in parallel — semantic (neural) search and keyword (BM25) search — then merges the results for better overall coverage.

Pipeline:

How it works:

Semantic search: Embeds the query and finds conceptually similar documents
Keyword search (BM25): Finds documents with exact term matches
Both results are merged using Reciprocal Rank Fusion (RRF) — a scoring formula that combines rankings from both methods
The unified top results are sent to the LLM

Why It Works

Each search method has a blind spot:

Semantic search understands intent but can miss exact terms. Query: “reducing operational costs” → finds documents about “efficiency improvements” ✅
Keyword search catches specifics but misses meaning. Query: “Product Code XJ-2847B” → finds exact matches for that code ✅

Hybrid search covers both, making it reliable across a wide range of query types.

When to Use It

Enterprise search with mixed content (technical docs + marketing + internal wikis)
E-commerce (searches for product names, SKUs, specifications)
Regulatory/compliance (exact citations matter)

Real-World Example

An internal company search tool that needs to handle both “documents about our machine learning strategy” (conceptual) and “the Q3–2025 ML roadmap deck” (exact). Hybrid search handles both gracefully in a single pipeline.

Type 4: Query Decomposition RAG

What It Is

A RAG approach that breaks complex, multi-part questions into smaller, focused sub-queries — retrieves for each one separately — then synthesizes everything into a complete answer.

Pipeline:

How it works:

The complex question is sent to an LLM with instructions to decompose it
The LLM identifies and generates atomic sub-questions
Documents are retrieved in parallel for each sub-question
All retrieved context is combined and passed to the LLM
A comprehensive, synthesized answer is generated

Why It Works

LLMs struggle when a single retrieval pass has to serve multiple information needs at once. A complex question like “Compare Q3 2025 revenue to Q3 2024 and explain the growth drivers” is actually three separate questions:

What was Q3 2025 revenue?
What was Q3 2024 revenue?
What drove the growth?

Decomposition makes each information need explicit, so retrieval is precise for each one.

When to Use It

Complex analytical questions
Comparative queries (X vs. Y, before vs. after)
Multi-step reasoning tasks
Research and investigation workflows

Real-World Example

A business intelligence assistant answering executive questions like: “What were our top 3 products by revenue last quarter, how do they compare to the previous year, and what are the emerging trends in each category?” Without decomposition, the retriever grabs random snippets. With decomposition, each part gets its own precise retrieval pass.

Type 5: Step-Back Prompting RAG

What It Is

A RAG technique that first answers a broader, more general version of the user’s question, then retrieves for both the general and specific versions — giving the LLM both the “why” and the “what.”

Pipeline:

How it works:

User asks a specific question (e.g., “Why did Q4 2024 sales in the Northeast region drop?”)
The system generates a step-back question: “What factors generally affect regional sales performance?”
Retrieval runs for both the original and the step-back question
General context provides conceptual grounding; specific context provides the data
The LLM generates an answer that’s informed by both

Why It Works

Sometimes a specific question is too narrow — the retriever finds the data but the LLM lacks the conceptual framework to interpret it correctly. Step-back prompting solves this by giving the model both principles (from the general question) and specifics (from the original question), leading to better-reasoned answers.

When to Use It

Root cause analysis (“Why did X happen?”)
“Why” questions that need both principles and specifics
Educational and explanatory applications
Troubleshooting systems

Real-World Example

A technical support system answering: “Why is my API request failing with error 429?”

Step-back question: “What causes rate limiting errors in APIs?”
Combined retrieval finds both the rate limit policy AND the user’s specific usage pattern
Answer: “You hit the 1,000 req/hour limit. Your account made 1,247 requests in the last hour. Consider implementing exponential backoff.”

Without the step-back, the answer might just quote a policy number with no explanation of why it applies.

Type 6: HyDE (Hypothetical Document Embeddings) RAG

What It Is

Instead of searching with the raw user query, HyDE first generates a hypothetical ideal answer and uses that to search the knowledge base — dramatically improving retrieval accuracy for vague or conversational queries.

Pipeline:

How it works:

User asks: “What is the company’s remote work policy?”
The LLM generates a hypothetical answer: “The company allows employees to work remotely 3 days per week, requires in-office presence on Tuesdays and Thursdays…”
This hypothetical answer is embedded as a vector
The knowledge base is searched for documents similar to the hypothetical answer
The real retrieved documents are used to generate the actual, accurate answer

Why It Works

Queries and documents live in different “spaces” in an embedding model. Queries are short, casual, and vague. Documents are long, formal, and specific. Searching query-to-document has a natural mismatch.

HyDE bridges this gap: by generating a document-like text from the query, you’re searching in document space — and finding much closer matches.

When to Use It

Open-ended questions where query phrasing doesn’t match document phrasing
Conversational interfaces with casual language
Cross-lingual search (generate hypothesis in the target language)

Real-World Example

An HR chatbot where an employee asks: “Can I work from the beach?”

The actual policy document says: “Remote Work Policy: Employees may work from any location within their country of employment…”

Standard search fails because “work from the beach” doesn’t match “remote work policy.” HyDE generates a hypothetical policy-style answer, finds the real policy, and gives the correct response.

Type 7: Agentic RAG

What It Is

RAG where the LLM is in control of the retrieval loop. Instead of a fixed one-shot search, the model decides when to search, what to search for, and whether it has enough information to answer.

Pipeline:

How it works:

The agent receives the user’s question
It evaluates: “Do I have enough context to answer this?”
If not, it decides: “What should I search for?” and retrieves
It reviews what it found and decides whether to search again or proceed
This loop repeats until the agent is satisfied or hits a max iteration limit
A final, comprehensive answer is generated

Why It Works

Traditional RAG is a one-shot process: search once, generate once. Agentic RAG is iterative and self-directed. The model can refine its search based on what it finds, pull from multiple sources, recognize when it’s missing information, and stop early when the first retrieval was sufficient. This mirrors how a human researcher actually works.

When to Use It

Complex research questions requiring multiple information sources
Ambiguous queries that need multi-step investigation
Exploratory search where the answer path isn’t clear upfront
High-value decisions where accuracy is critical

Real-World Example

A financial analyst assistant answering: “Should we invest in renewable energy stocks given current policy trends?”

Agent’s thought process:

Search: “renewable energy policy 2026” → Retrieves recent legislation
Evaluate: “Need market data” → Search: “renewable energy stock performance”
Evaluate: “Need risk factors” → Search: “renewable energy investment risks”
Evaluate: “Sufficient context” → Generate comprehensive answer

Standard RAG would have answered with just the first search result.

Type 8: Multi-Modal RAG

What It Is

RAG extended beyond text to handle images, diagrams, tables, charts, and other visual content — so the system can retrieve and reason over the full richness of real-world documents.

Pipeline:

How it works:

Documents are indexed along with their visual elements (charts, diagrams, photos)
Multi-modal embeddings (CLIP, ImageBind) represent both text and images in a shared vector space
When a user queries about a chart or diagram, the relevant image is retrieved alongside text
Both text and image are passed to a multi-modal LLM (GPT-4V, Gemini)
The LLM generates an answer grounded in both visual and textual context

Why It Works

Real-world knowledge isn’t just text. Architecture diagrams, product photos, medical scans, financial charts, code screenshots — text-only RAG is blind to all of this. Multi-modal embeddings allow the system to understand and retrieve visual content with the same precision as text, and multi-modal LLMs can reason over what they “see.”

When to Use It

Technical documentation with diagrams and schematics
E-commerce with product images
Medical or scientific applications with imagery
Education with visual learning materials
Design and creative workflows

Real-World Example

An engineering documentation assistant receives the query: “Show me the wiring diagram for the hydraulic pump system.”

Retrieves: The PDF page with both the explanatory text AND the actual wiring diagram
The multi-modal LLM can see the diagram and explain: “The main pump connects to the reservoir through valve V-12, as shown in the upper-right quadrant of the diagram…”

Text-only RAG would try to describe a diagram it never saw — useless for visual troubleshooting.Type 9: Corrective RAG (CRAG) — Self-Correcting Retrieval

What It Is

RAG with a built-in quality control system. CRAG evaluates the retrieved documents before generating an answer, and takes corrective action — including falling back to a web search — when retrieval quality is poor.

Pipeline:

How it works:

Standard retrieval pulls candidate documents
A lightweight retrieval evaluator (typically T5-large) scores each document
Based on confidence scores, trigger one of three actions:

Correct (confidence > threshold): Use retrieved docs directly
Ambiguous (medium confidence): Refine with web search
Incorrect (low confidence): Discard and search web instead

Apply decompose-then-recompose to filter irrelevant parts
Generate answer from corrected context

Why It Works

Standard RAG has a dangerous blind spot: it blindly trusts whatever the retriever returns. If the retriever pulls three irrelevant documents, standard RAG will confidently hallucinate based on bad context. CRAG adds self-awareness — the system knows when its own retrieval has failed and can correct course before it’s too late.

When to Use It

High-stakes applications where hallucination is unacceptable (medical, legal, financial)
Dynamic knowledge domains where the knowledge base can become outdated
Production systems that prioritize reliability over raw speed
Compliance-heavy industries requiring explainable, auditable decisions

Real-World Example

A medical diagnosis assistant is asked: “What are the latest treatment protocols for acute lymphoblastic leukemia in children under 5?”

Without CRAG: Retrieves general ALL treatment docs from 2023 → Generates an answer that misses new 2025 protocols → Dangerous, outdated advice
With CRAG:
Retrieves 2023 docs
Evaluator detects a temporal mismatch (query asks for “latest”)
Triggers “Ambiguous” → Web search finds 2025 clinical guidelines
Combines: general protocol + recent updates → Accurate, current answer

Type 10: Graph-RAG — Reasoning Over Relationships

What It Is

RAG that uses a knowledge graph instead of (or alongside) vector embeddings. Rather than retrieving isolated document chunks, Graph-RAG traverses the relationships between entities to answer questions that require connecting multiple facts.

Pipeline:

How it works:

Indexing : Extract entities and relationships from documents → Build knowledge graph
Community detection : Use Leiden algorithm to identify hierarchical communities
Summarization : Generate summaries at each community level
Query : Extract entities from user question
Traversal : Navigate graph structure to find connected information
Synthesis : LLM generates answer from graph-structured context

Why It Works

Vector embeddings excel at semantic similarity but are blind to relationships. When asked “How did COVID-19 impact supply chains in the semiconductor industry?”, vector search retrieves documents about COVID, semiconductors, and supply chains — but misses the connections between them. Graph-RAG encodes the entire chain: COVID → factory closures → chip shortage → auto industry, enabling true multi-hop reasoning. industry)

When to Use It

Multi-hop questions that require connecting facts across documents
Relationship-heavy domains (financial networks, biological pathways, social graphs)
Enterprise knowledge management with interconnected systems
Compliance and regulation (tracing policy impacts across departments)
Legal research (case law precedents and citation chains)

Real-World Example

A pharmaceutical research assistant is asked: “Which drugs targeting protein X have shown efficacy in disease Y, and are any currently in Phase 3 trials?”

Vector RAG would retrieve separate documents about the drug, the protein, and the disease. Graph-RAG traverses the relationship chain — drug → targets → protein → implicated in → disease → trial status — and surfaces the exact answer in one connected pass.

Type 11: Adaptive RAG — Dynamic Complexity Routing

What It Is

RAG that automatically selects the right retrieval strategy based on the complexity of each query — routing simple questions to fast paths and complex questions to deeper pipelines, instead of applying the same approach to everything.

Pipeline:

How It Works

A query arrives
A classifier model analyzes the query and assigns a complexity level
The query is routed to the appropriate pipeline:

Simple (single-hop): Answer directly from LLM knowledge — no retrieval needed
Medium (factual): Standard vector RAG — one retrieval pass
Complex (multi-hop): Advanced pipeline — agentic, graph, or multi-step RAG

The selected pipeline executes and generates the answer

Why It Works

Not all queries need deep retrieval. In production systems, query complexity typically breaks down as:

40–50%: Simple (answerable from model knowledge)
30–40%: Medium (needs single-hop retrieval)
10–20%: Complex (needs multi-hop reasoning)

Applying heavy retrieval to every query wastes compute, adds latency, and increases cost by 3–10x unnecessarily. Adaptive RAG matches compute to complexity — fast when possible, thorough when required.

When to Use It

High-volume production systems where latency and cost matter
Mixed-use assistants that handle both casual and deep analytical queries
Systems with variable query types (customer support + research + reporting)
Any application where speed and accuracy must both be optimized

Real-World Example

A company-wide AI assistant handles three queries in sequence:

“What does RAG stand for?” → Simple path → LLM answers directly from training knowledge. Zero retrieval cost.
“What was our company revenue in Q3 2025?” → Medium path → Single vector search retrieves the financial report. Fast and precise.
“How did our Q3 2025 revenue compare to competitors, and what market trends explain the difference?” → Complex path → Agentic multi-step retrieval across internal data, market reports, and news sources.

All three queries get the right level of effort — no more, no less.

Choosing the Right RAG Architecture

What’s your biggest RAG challenge?

Are you struggling with retrieval quality, dealing with multi-modal content, or trying to scale to millions of documents? Drop a comment — I read and respond to every one.

If this guide helped you, share it with your team. RAG is becoming table stakes for AI applications, and the teams that master it early have a massive competitive advantage.

This guide is based on the latest RAG research and production patterns as of May 2026.

Why Companies Will Stop Asking “Do You Know AI?” and Start Asking This Instead

Minoltan Issack — Sun, 10 May 2026 09:28:46 +0000

In just six months, the standard interview question won’t be “Can you use ChatGPT?” or “Do you know AI?” Instead, senior architects and hiring managers will look you in the eye and ask: “Can you architect an integrated system using MCP, RAG, and Agents?”

The IT job market is shifting from “AI users” to “AI architects.” If you want to stay relevant, you need to understand how these three pillars fit together. Let’s break it down through a simple story.

1. The Bridge Builder: Model Context Protocol (MCP)

Imagine you have a brilliant consultant (the AI) sitting in a locked room. He’s smart, but he can’t see your emails, he can’t check your local files, and he certainly can’t see your Slack messages.

MCP is the “Universal Connector.” It is an open standard that allows AI models (like Claude) to securely sit on your “Host” (your computer or VS Code) and talk to “Servers” (your local files, Google Drive, or Slack).

The Architecture:

Host: Where you give input (e.g., Claude Desktop, VS Code).
Client: The middleman inside the host that manages connections.
Server: The program that actually “knows” how to fetch data from a specific tool (e.g., a Google Drive Server).

2. The Expert Librarian: Retrieval-Augmented Generation (RAG)

Now that our consultant has a bridge to the outside world, he needs to be an expert on your specific business. If an employee asks, “How many vacation days do I have left?” the AI can’t guess. It needs to look at the company handbook.

RAG is the “Librarian.” Instead of retraining a massive AI model (which is expensive), you give it a specific document. The AI “retrieves” the exact paragraph needed and “generates” an answer based only on that trusted data.

The Architecture:

Ingestion: Your PDF/Doc is broken into small “chunks.”
Embedding: These chunks are turned into numbers (vectors) so the machine understands them.
Retrieval: When you ask a question, the system finds the most relevant “chunk” and gives it to the AI.

3. The Decision Maker: AI Agents

A bridge (MCP) and a library (RAG) are great, but someone needs to do the work. Imagine you say: “Prepare a sales report from my local files and email it to the CEO.”

The AI Agent is the “Manager.” It doesn’t just answer, it acts. It thinks: “First, I need to use the MCP bridge to get the data. Then, I’ll use RAG to understand the company’s reporting style. Finally, I’ll trigger the email tool to send it.”

The Architecture:

Perception: Receives the user’s goal.
Brain (LLM): Decides which tools (MCP/RAG) to call and in what order.
Action: Executes the tasks end-to-end.

The Big Picture: How They Fit Together

The future isn’t about choosing one, it’s about the Unified AI System.

MCP provides the Connection to your world.
RAG provides the Knowledge from your data.
Agents provide the Execution to get things done.

Summary

The era of simply “chatting” with AI is ending. We are entering the era of building systems that work for us. Whether you are a developer or a business lead, understanding this hierarchy — Connect (MCP), Inform (RAG), and Execute (Agents) — is the key to the next decade of your career.

To stay informed on the latest technical insights and tutorials, connect with me on Medium, LinkedIn and Dev.to. For professional inquiries or technical discussions, please contact me via email. I welcome the opportunity to engage with fellow professionals and address any questions you may have.

Understanding RAG: The Architecture That’s Revolutionizing AI Responses

Minoltan Issack — Sun, 10 May 2026 08:51:14 +0000

How Retrieval-Augmented Generation Combines the Best of Search and AI

Imagine you have a super-smart friend who has read every book in the world but hasn’t left the house in three years. If you ask him about a movie that came out last week, he might make up a story just to sound helpful — we call this a “hallucination.”

This happens because traditional AI has a knowledge cutoff ; it only knows what it learned during its original training.

Now, imagine giving that same friend a high-speed internet connection and a library card. Before he answers your question, he quickly looks up the latest facts, finds the right page, and then explains it to you. That is Retrieval-Augmented Generation (RAG).

Instead of guessing from memory, the AI “retrieves” fresh data from your documents or the web and “augments” its answer with real, verified facts. It turns a guessing game into an open-book exam, giving you answers you can actually trust.

What is RAG?

Retrieval-Augmented Generation is a technique that enhances Large Language Models by connecting them to external knowledge sources. Instead of relying solely on the information learned during training, RAG systems can retrieve relevant information from external databases, documents, or APIs in real-time and use that context to generate more accurate responses.

Think of it this way: A traditional LLM is like a brilliant professor who memorized everything years ago but hasn’t read any new research. A RAG system is like that same professor, but now they can quickly consult a library of the latest papers before answering your question.

Why Do We Need RAG?

Traditional LLMs face several critical challenges:

RAG solves these problems by grounding AI responses in retrievable, verifiable external data.

The RAG Architecture: A Deep Dive

The RAG architecture consists of three main phases: Data Ingestion , Query Processing , and Response Generation. Let’s break down each component.

Phase 1: Data Ingestion Pipeline (Offline Process)

This happens before any user queries arrive — it’s the preparation phase.

Step 1: Data Collection

The system ingests data from various sources:

PDF documents, Word files, Web pages, APIs, Databases, Internal documentation, Research papers

Step 2: Text Chunking

Large documents are split into smaller, manageable chunks (typically 200–1000 tokens). Why? Because:

It’s more efficient to search through smaller pieces
LLMs have context window limits
Smaller chunks provide more precise retrieval

For example, a 50-page manual might be split into 200 chunks, each representing a specific section or concept.

Step 3: Embedding Generation

This is where the magic happens. An embedding model (like OpenAI’s text-embedding-3, Cohere’s embeddings, or open-source models like Sentence-BERT) converts each text chunk into a vector — essentially a list of numbers (typically 384 to 1536 dimensions).

What are embeddings? Embeddings are numerical representations that capture the semantic meaning of text. Similar concepts have similar vector representations, even if they use different words.

For example:

“The customer wants a refund”
“User requesting money back”

These two sentences would have very similar embedding vectors because they express the same concept, even though they use different words.

Step 4: Vector Storage

These embeddings are stored in a vector database (like Pinecone, Weaviate, ChromaDB, or FAISS) that’s optimized for fast similarity searches. The database indexes these vectors so it can quickly find the most similar ones when queried.

Phase 2: Query Processing (Runtime)

This happens when a user asks a question.

Step 1: User Query

A user submits a question: “What is your refund policy for defective products?”

Step 2: Query Embedding

The exact same embedding model used during ingestion now converts the user’s query into a vector with the same dimensions.

This consistency is crucial — you must use the same embedding model for both ingestion and queries to ensure the vector spaces align.

Step 3: Similarity Search

The system performs a semantic similarity search in the vector database. It compares the query vector against all stored vectors using mathematical distance metrics like:

Cosine similarity : Measures the angle between vectors
Euclidean distance : Measures the straight-line distance
Dot product : Measures vector alignment

The database returns the top K most similar chunks (typically 3–10 chunks).

Step 4: Context Retrieval

The system retrieves the actual text content associated with the top matching vectors. These become the “retrieved context” that will augment the prompt.

Phase 3: Response Generation

Step 1: Prompt Augmentation

The system constructs an enhanced prompt that combines:

The retrieved context (relevant chunks from the knowledge base)
The user’s original query
Instructions for the LLM

Example augmented prompt:

Context:
[Chunk 1]: "Our refund policy states that defective products can be returned within 30 days..."
[Chunk 2]: "To process a refund for defective items, customers must provide proof of purchase..."
[Chunk 3]: "Shipping costs for defective product returns are covered by the company..."

User Question: What is your refund policy for defective products?

Instructions: Answer the user's question based solely on the provided context. If the context doesn't contain the information, say so.

Step 2: LLM Generation

The augmented prompt is sent to an LLM (GPT-4, Claude, Llama, Gemini, etc.). The model generates a response that:

Is grounded in the retrieved facts
Directly answers the user’s question
Uses natural, conversational language
Can cite specific sources

Step 3: Response Delivery

The final response is returned to the user, often with source citations showing which documents the information came from.

Key Components Explained

Embedding Models

These are specialized neural networks trained to convert text into meaningful numerical representations. Popular options include:

OpenAI Embeddings : text-embedding-3-small, text-embedding-3-large
Cohere Embeddings : embed-english-v3.0
Open Source : Sentence-Transformers, BGE, E5

The quality of your embeddings directly impacts retrieval accuracy.

Vector Databases

Specialized databases optimized for storing and searching high-dimensional vectors:

Pinecone : Managed, cloud-native
Weaviate : Open-source, feature-rich
ChromaDB : Developer-friendly, embeddable
FAISS : Facebook’s library, ultra-fast
Milvus : Scalable, enterprise-grade

These databases use algorithms like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) for approximate nearest neighbor search.

Chunking Strategies

How you split your documents matters:

Fixed-size chunking : Split every N tokens
Sentence-based : Split at sentence boundaries
Semantic chunking : Split based on topic changes
Overlapping chunks : Include overlap to preserve context

Best Practices for Implementing RAG

Start with quality data : Clean, well-structured documents produce better results
Choose the right chunk size : Test different sizes (256, 512, 1024 tokens)
Use the same embedding model : Consistency between ingestion and query is crucial
Implement monitoring : Track retrieval quality and response accuracy
Add metadata filtering : Filter by date, source, category before semantic search
Test different retrieval strategies : Top-K, threshold-based, MMR (Maximum Marginal Relevance)
Optimize for your use case : Customer support needs different tuning than research applications

Popular RAG Frameworks and Tools

Several frameworks make RAG implementation easier:

LangChain : Popular Python/JavaScript framework with extensive RAG support
LlamaIndex : Specialized in data ingestion and indexing for RAG
Haystack : Production-ready framework from Deepset
Semantic Kernel : Microsoft’s framework for AI orchestration
AutoGen : Multi-agent framework with RAG capabilities

Conclusion

Retrieval-Augmented Generation represents a fundamental shift in how we build AI applications. By combining the natural language capabilities of LLMs with the precision of information retrieval, RAG delivers responses that are accurate, current, and grounded in verifiable sources.

Whether you’re building a customer support chatbot, a research assistant, or an internal knowledge management system, understanding RAG architecture is essential. The pattern is elegant: convert everything to vectors, search for similar vectors, and augment your prompts with retrieved context.

As AI continues to integrate into more applications, RAG will likely become the standard approach for any system that needs to provide factual, up-to-date, and domain-specific information. The architecture is proven, the tools are mature, and the results speak for themselves.

The question isn’t whether to use RAG — it’s how to implement it most effectively for your specific use case.

AWS Cloud Practitioner Questions | Security & Encryption

Minoltan Issack — Tue, 14 Apr 2026 11:26:56 +0000

Question 1:

To enable In-flight Encryption (In-Transit Encryption), we need to have ........................

Answer (2) : The correct answer, "an HTTPS endpoint with an SSL certificate," is right because HTTPS encrypts data in transit, ensuring security. HTTPS cannot be used without an SSL certificate, which verifies the server's identity. Other options are incorrect if they lack encryption or proper security measures. SSL certificates are essential for establishing trust and secure communication. This ensures data integrity and confidentiality during transmission.

Question 2:

Server-Side Encryption means that the data is sent encrypted to the server.

Answer (2) : Server-Side Encryption means the data is encrypted by the server after it's received, not while it's being sent. The statement is false because encryption during transmission is handled by protocols like TLS, known as in-flight encryption. Server-Side Encryption specifically refers to encrypting stored data, ensuring it is protected at rest. Other options that suggest encryption during transfer would refer to client-side or in-transit encryption, not server-side. This distinction helps ensure data security both in transit and at rest.

Question 3:

In Server-Side Encryption, where do the encryption and decryption happen?

Answer (1): The correct answer, "Both Encryption and Decryption happen on the server," is right because server-side encryption manages encryption keys and processes on the server side, meaning the server handles both tasks. The other options are incorrect because they involve the client performing encryption or decryption, which isn't the case with server-side encryption. In server-side encryption, the user doesn't have access to the keys, so they cannot encrypt or decrypt data themselves. This setup ensures secure handling of data by the server.

Question 4:

In Client-Side Encryption, the server must know our encryption scheme before we can upload the data.

Answer (1): In client-side encryption, the server acts as a "blind" storage provider and does not need to know the encryption scheme or keys to store the data. The data is fully encrypted before it leaves your device, ensuring the server only manages opaque blobs of information without any insight into the underlying cryptographic methods.

Question 5:

You need to create KMS Keys in AWS KMS before you are able to use the encryption features for EBS, S3, RDS …

Answer (2) : AWS provides managed keys that can be used for encryption without creating your own KMS keys. You only need to create custom keys if you have specific security requirements. The other options are incorrect because creating your own keys is optional, not mandatory, to enable encryption for services like EBS, S3, or RDS. AWS Managed Keys simplify the process and are ready to use. Therefore, creating KMS keys in advance is not a required step.

Question 6:

AWS KMS supports both symmetric and asymmetric KMS keys.

Answer (1): AWS KMS supports both symmetric and asymmetric keys. Symmetric keys are used for encryption and decryption with a single key. Asymmetric keys involve a key pair (RSA or ECC) used for encryption/decryption or signing/verification. The other option, "False," is incorrect because KMS indeed supports both types of keys. This allows flexible cryptographic operations for different security needs.

Question 7:

When you enable Automatic Rotation on your KMS Key, the backing key is rotated every ……………

Answer (2) : Automatic Rotation is enabled on a KMS key, it rotates every 12 months by default. The "90 days" option is incorrect because AWS does not rotate keys that frequently by default. The other options, "2 years" and "3 years," are incorrect because they exceed the standard rotation period set by AWS, which is one year. This rotation frequency balances security and operational consistency.

Question 8:

You have an AMI that has an encrypted EBS snapshot using KMS CMK. You want to share this AMI with another AWS account. You have shared the AMI with the desired AWS account, but the other AWS account still can't use it. How would you solve this problem?

Answer (2) : KMS keys are customer-managed or AWS-managed, and sharing the AMI alone does not grant access to the encryption key. The other accounts must also have permission to use the CMK to access the encrypted snapshot. The first option, "logout and login," is incorrect because credential refresh doesn't resolve key sharing issues. The third option, "you can't share an encrypted AMI," is incorrect because encrypted AMIs can be shared if the CMK permissions are properly configured. Sharing the CMK ensures the other account can decrypt and use the AMI.

Question 9:

You have created a Customer-managed CMK in KMS that you use to encrypt both S3 buckets and EBS snapshots. Your company policy mandates that your encryption keys be rotated every 6 months. What should you do?

Answer (1): AWS KMS supports automatic key rotation every year. However, since your policy requires rotation every 6 months, you need to manually rotate the key or create a new one, as automatic rotation is annual. Using AWS Managed Keys isn't suitable because their rotation is automatic but on a quarterly basis, and they don't allow custom retention periods. Manually creating and rotating keys gives control over the exact 6-month schedule. The other options do not meet the specific 6-month rotation requirement.

Question 10:

What should you use to control access to your KMS CMKs?

Answer (1) : They directly define and control access permissions for each CMK. "KMS IAM Policy" is incorrect because IAM policies manage permissions at the user or role level, not specific to each key. "AWS GuardDuty" is incorrect as it is a security threat detection service, not an access control tool. "KMS Access Control List (KMS ACL)" is incorrect because KMS does not support ACLs for controlling access. Key policies are the primary method for managing access to KMS CMKs.

Question 11:

You have a Lambda function used to process some data in the database. You would like to give your Lambda function access to the database password. Which of the following options is the most secure?

Answer (3): It keeps the sensitive data secure while allowing the Lambda to access it securely during execution. Embedding the password in the code is insecure because it can be easily exposed if the code is accessed. Having it as plaintext environment variable is also insecure as it's visible in plain text within environment settings. Encrypting it and decrypting at runtime ensures the password remains protected at rest and only accessible in memory during execution. This approach balances security and accessibility effectively.

Question 12:

You have a secret value that you use for encryption purposes, and you want to store and track the values of this secret over time. Which AWS service should you use?

Answer (2): It allows secure storage of secrets with built-in version tracking, enabling you to see historical values. "AWS KMS" can rotate encryption keys but doesn't track or store different secret values over time. "Amazon S3" offers versioning and encryption but is not specifically designed for secret management or audit tracking of secret values. SSM Parameter Store provides dedicated secret management with version history, making it the best fit.

Question 13:

Your user-facing website is a high-risk target for DDoS attacks and you would like to get 24/7 support in case they happen and AWS bill reimbursement for the incurred costs during the attack. What AWS service should you use?

Answer (2): It provides 24/7 support for DDoS attacks and offers cost reimbursement assistance through AWS's DDoS Response Team. "AWS WAF" helps protect web applications from common web exploits but does not offer 24/7 support or billing reimbursement. "AWS Shield" provides basic DDoS protection but lacks the dedicated support and cost reimbursement features of Shield Advanced. "AWS DDoS OpsTeam" is not a service but a support team; the appropriate service is AWS Shield Advanced.

Question 14:

You would like to externally maintain the configuration values of your main database, to be picked up at runtime by your application. What's the best place to store them to maintain control and version history?

Answer (4): It securely stores configuration values with version control, making it easy to update and track changes at runtime. "Amazon DynamoDB" is a NoSQL database suitable for application data but isn't mainly designed for configuration management or versioning. "Amazon S3" can store files and version data, but it's less ideal for sensitive configuration values due to lack of built-in secret management features. "Amazon EBS" provides block storage for EC2 instances and is not suitable for managing or versioning configuration data externally.

Question 15:

AWS GuardDuty scans the following data sources, EXCEPT …………….

Answer (4): AWS GuardDuty does not directly scan CloudWatch Logs data sources; it primarily analyzes other specific logs. "CloudTrail Logs" are monitored because they record API activity for security analysis. "VPC Flow Logs" document network traffic, which GuardDuty analyzes for suspicious activity. "DNS Logs" are also scanned since they help detect malicious domain requests. GuardDuty focuses on certain data sources, and CloudWatch Logs are not one of them.

Question 16:

You have a website hosted on a fleet of EC2 instances fronted by an Application Load Balancer. What should you use to protect your website from common web application attacks (e.g., SQL Injection)?

Answer (2): It allows you to create custom rules to block common web application attacks like SQL Injection and Cross-Site Scripting. "AWS Shield" provides protection against DDoS attacks but does not specifically target application-layer threats. "AWS Security Hub" is a centralized security management service and does not directly protect against web attacks. "AWS GuardDuty" detects malicious activity but is focused on threat detection rather than web application protection.

Question 17:

You would like to analyze OS vulnerabilities from within EC2 instances. You need these analyses to occur weekly and provide you with concrete recommendations in case vulnerabilities are found. Which AWS service should you use?

Answer (3) : It automatically analyzes EC2 instances for security vulnerabilities and provides detailed findings and recommendations. "AWS Shield" focuses on protecting against DDoS attacks and does not analyze OS vulnerabilities. "Amazon GuardDuty" detects threats and malicious activity but does not perform vulnerability assessments. "AWS Config" monitors configuration compliance but does not provide detailed vulnerability analysis or recommendations.

Question 18:

What is the most suitable AWS service for storing RDS DB passwords which also provides you automatic rotation?

Answer (1) : It securely stores database passwords and provides automatic rotation, reducing manual management. "AWS KMS" is a key management service and does not store or rotate passwords directly. "AWS SSM Parameter Store" can store passwords but lacks built-in automatic rotation features. Secrets Manager is specifically designed for secret management and automated credential rotation.

Question 19:

Which AWS service allows you to centrally manage EC2 Security Groups and AWS Shield Advanced across all AWS accounts in your AWS Organization?

Answer (4): It centrally manages security policies across multiple AWS accounts, including Security Groups and Shield Advanced. "AWS GuardDuty" detects security threats but does not handle centralized management of security groups or Shield. "AWS Config" monitors resource compliance, but it does not manage security policies across accounts. It tracks changes but doesn't enforce security rules centrally.

Question 20:

Which AWS service helps you protect your sensitive data stored in S3 buckets?

Answer(3) : It uses machine learning to identify and protect sensitive data in S3 buckets. "AWS KMS" is a key management service that encrypts data but does not identify or classify sensitive information in S3. "Amazon GuardDuty" detects security threats but doesn't specifically protect or identify sensitive data. "Amazon Shield" focuses on DDoS protection and does not manage or analyze data stored in S3.

Question 21:

An online-payment company is using AWS to host its infrastructure. The frontend is created using VueJS and is hosted on an S3 bucket and the backend is developed using PHP and is hosted on EC2 instances in an Auto Scaling Group. As their customers are worldwide, they use both CloudFront and Aurora Global database to implement multi-region deployments to provide the lowest latency and provide availability, and resiliency. A new feature required which gives customers the ability to store data encrypted on the database and this data must not be disclosed even by the company admins. The data should be encrypted on the client side and stored in an encrypted format. What do you recommend to implement this?

Answer (1) : Lambda is not designed for client-side encryption of database data. "Using Aurora Client-side Encryption and CloudHSM" is incorrect because while CloudHSM provides hardware security, it is not specifically integrated for client-side encryption in this context. "Using Lambda Client-side Encryption and CloudHSM" is incorrect because Lambda alone doesn't handle client-side encryption for databases, and CloudHSM is not tailored for this use case.

Question 22:

You have an S3 bucket that is encrypted with SSE-KMS. You have been tasked to replicate the objects to a target bucket in the same AWS region but with a different KMS Key. You have configured the S3 replication, the target bucket, and the target KMS key and it is still not working. What is missing to make the S3 replication work?

Answer (3): You need to configure permissions for both the source KMS key (kms:Decrypt) and the target KMS key (kms:Encrypt) so that S3 replication can access and use them properly. The other options are incorrect because replication is supported, no support ticket is needed, and the source and target keys do not have to be the same. Proper permissions are necessary for encryption and decryption during replication.

Question 23:

You have generated a public certificate using LetsEncrypt and uploaded it to the ACM so you can use and attach to an Application Load Balancer that forwards traffic to EC2 instances. As this certificate is generated outside of AWS, it does not support the automatic renewal feature. How would you be notified 30 days before this certificate expires so you can manually generate a new one?

Answer (2) : allows you to receive notifications 30 days before the certificate expires. Linking ACM to a third-party provider like Let's Encrypt does not provide automated notifications from AWS. Using monthly expiration events or CloudWatch alarms won't give you the timely warning needed 30 days in advance. EventBridge is suitable for scheduled, daily checks, ensuring proactive renewal alerts.

Question 24:

You have created the main Edge-Optimized API Gateway in us-west-2 AWS region. This main Edge-Optimized API Gateway forwards traffic to the second level API Gateway in ap-southeast-1. You want to secure the main API Gateway by attaching an ACM certificate to it. Which AWS region are you going to create the ACM certificate in?

Answer (1) : ACM certificates for CloudFront distributions must be created in the us-east-1 region, as AWS only supports CloudFront-related certificates there. "us-west-2" is incorrect because ACM certificates in this region cannot be used directly with CloudFront or Edge-Optimized API Gateway. "ap-southeast-1" is incorrect since it's not the region for ACM certificates used with CloudFront. "Both us-east-1 and us-west-2" is incorrect because only us-east-1 supports ACM certificates for CloudFront distributions.

Question 25:

You are managing an AWS Organization with multiple AWS accounts. Each account has a separate application with different resources. You want an easy way to manage Security Groups and WAF Rules across those accounts as there was a security incident the last week and you want to tighten up your resources. Which AWS service can help you to do so?

Answer (4) : AWS Firewall Manager allows centralized management of security policies, such as Security Groups and WAF rules, across multiple AWS accounts in an organization. It simplifies enforcement and updates, especially after security incidents.
Others are incorrect because:

AWS GuardDuty is primarily for threat detection, not policy management.
Amazon Shield provides DDoS protection but doesn't manage Security Groups or WAF rules.
Amazon Inspector assesses security vulnerabilities but doesn't handle centralized rule management.

To stay informed on the latest technical insights and tutorials, connect with me on Medium, LinkedIn, and Dev.to. For professional inquiries or technical discussions, please contact me via email. I welcome the opportunity to engage with fellow professionals and address any questions you may have. All blogs in this series will be optimized, fine-tuned, developed, and updated in a timely manner to reflect the latest AWS changes, exam updates, and real-world best practices.

AWS Cloud Practitioner Questions | RDS, Aurora, & ElastiCache

Minoltan Issack — Sun, 12 Apr 2026 08:11:16 +0000

Question 1:

Amazon RDS supports the following databases, EXCEPT:

Answer (1): Amazon RDS does not support MongoDB. Instead, RDS supports other databases such as MySQL, MariaDB, and Microsoft SQL Server. This helps you understand which databases are compatible with Amazon RDS and clarifies that MongoDB is not included in this managed service.

Question 2:

You're planning for a new solution that requires a MySQL database that must be available even in case of a disaster in one of the Availability Zones. What should you use?

Answer: (3) Multi-AZ deployments in Amazon RDS automatically create a synchronous standby replica of your database in a different Availability Zone. This setup provides high availability and durability, ensuring that if one AZ experiences a failure or disaster, the database remains available in the other AZ without manual intervention. In contrast, Read Replicas are mainly used for scaling read operations rather than disaster recovery, as they are asynchronous and may not provide immediate failover support in case of an AZ failure. Enabling Multi-AZ is the recommended approach for disaster recovery within a single region to ensure continuous availability.

Question 3:

We have an RDS database that struggles to keep up with the demand of requests from our website. Our million users mostly read news, and we don't post news very often. Which solution is NOT adapted to this problem?

Answer: (2) "RDS Multi-AZ" provides high availability and automatic failover in case of an Availability Zone failure. It ensures durability but does not improve read performance. "Read Replicas" are designed for scaling read operations, not for disaster recovery. "ElastiCache" improves read speed by caching data, not by providing database failover. Therefore, Multi-AZ is correct for high availability, while the others focus on scaling and caching.

Question 4:

You have set up read replicas on your RDS database, but users are complaining that upon updating their social media posts, they do not see their updated posts right away. What is a possible cause for this?

Answer (2) : Read Replicas use asynchronous replication, which can cause delays, leading to eventual consistency, so users might not see their updates immediately. Multi-AZ provides high availability and automatic failover but doesn't improve read scalability. ElastiCache speeds up read access by caching data but does not handle database replication or failover. Therefore, for ensuring data consistency, Read Replicas' asynchronous nature makes them less immediate. The other options serve different purposes like high availability or caching.

Question 5:

Which RDS (NOT Aurora) feature when used does not require you to change the SQL connection string?

Answer (1): Multi-AZ maintains the same connection string because it automatically handles failover to the standby replica without requiring connection string changes. In contrast, Read Replicas have their own endpoints and DNS names, so applications need to be updated to connect to them directly. Multi-AZ provides high availability but not read scaling. Read Replicas support read scalability but require configuration changes in the application. Therefore, Multi-AZ does not require changes to the connection string.

Question 6:

Your application running on a fleet of EC2 instances managed by an Auto Scaling Group behind an Application Load Balancer. Users have to constantly log back in and you don't want to enable Sticky Sessions on your ALB as you fear it will overload some EC2 instances. What should you do?

Answer (3): Storing session data in ElastiCache allows multiple EC2 instances to access user sessions quickly and efficiently, supporting stateless application design. RDS could store session data but offers lower performance compared to ElastiCache, which is optimized for fast access. Using your own load balancer doesn't address session management and can lead to complexity. EBS volumes are not suitable for shared session storage across instances due to limitations and performance concerns. Therefore, ElastiCache is the best choice for managing user sessions without sticky sessions.

Question 7:

An analytics application is currently performing its queries against your main production RDS database. These queries run at any time of the day and slow down the RDS database which impacts your users' experience. What should you do to improve the users' experience?

Answer (1): Setting up a Read Replica allows analytics queries to run independently, so they won't slow down the main database. Multi-AZ is mainly for high availability and automatic failover, not for offloading read workloads. Running queries at night limits real-time performance and doesn't address ongoing query impacts during the day. Read Replicas improve performance by distributing read traffic, making the user experience better. The other options do not effectively handle the problem of heavy, ongoing query load.

Question 8:

You would like to ensure you have a replica of your database available in another AWS Region if a disaster happens to your main AWS Region. Which database do you recommend to implement this easily?

Answer (4): Aurora Global Database is designed for disaster recovery across regions by allowing replicas in multiple AWS regions. RDS Read Replicas are limited to the same region and don't support cross-region disaster recovery. RDS Multi-AZ is for high availability within a single region and does not provide cross-region replication. Aurora Read Replicas are regional but do not have the global multi-region capability. Aurora Global Database is the best option for multi-region disaster recovery.

Question 9:

How can you enhance the security of your ElastiCache Redis Cluster by allowing users to access your ElastiCache Redis Cluster using their IAM Identities (e.g., Users, Roles)?

Answer (2): Using IAM Authentication allows users to securely access ElastiCache Redis with their IAM identities, enabling fine-grained access control and auditability. Redis Authentication relies on a password, which is less integrated with AWS identity management. Security Groups control network traffic but do not handle user authentication directly. IAM Authentication is specifically designed for integrating AWS user identities with ElastiCache for better security. The other options do not provide direct IAM-based user access control.

Question 10:

Your company has a production Node.js application that is using RDS MySQL 5.6 as its database. A new application programmed in Java will perform some heavy analytics workload to create a dashboard on a regular hourly basis. What is the most cost-effective solution you can implement to minimize disruption for the main application?

Answer (2): Creating a Read Replica in a different AZ allows the analytics workload to run without affecting the main database's performance. This minimizes disruption for the primary application while handling heavy analytics separately. Enabling Multi-AZ only provides high availability and automatic failover, not workload separation. Running analytics on the source database could slow down the main application and cause performance issues. Using a cross-AZ Read Replica is the most cost-effective and suitable solution for this scenario.

Question 11:

You would like to create a disaster recovery strategy for your RDS PostgreSQL database so that in case of a regional outage the database can be quickly made available for both read and write workloads in another AWS Region. The DR database must be highly available. What do you recommend?

Answer (2): Creating a read replica in a different region provides a backup that can be quickly promoted during a regional outage, ensuring high availability. Enabling Multi-AZ on the main database improves local availability but does not protect against regional failures. Creating a read replica in the same region with Multi-AZ doesn't provide cross-region disaster recovery. The "Enable Multi-Region" option does not exist in RDS; cross-region replication must be set up manually. The correct approach is to create a read replica in the target region for effective disaster recovery.

Question 12:

You have migrated the MySQL database from on-premises to RDS. You have a lot of applications and developers interacting with your database. Each developer has an IAM user in the company's AWS account. What is a suitable approach to give access to developers to the MySQL RDS DB instance instead of creating a DB user for each one?

Answer (3): Enabling IAM Database Authentication allows developers to access the RDS MySQL instance using their IAM credentials, simplifying user management. It eliminates the need to create individual database users and passwords for each developer. By default, IAM users do not have direct access to RDS databases without this feature enabled. Using Amazon Cognito is primarily for user authentication in mobile or web applications, not for direct database access. The correct choice streamlines access control while maintaining security via IAM.

Question 13:

Which of the following statement is true regarding replication in both RDS Read Replicas and Multi-AZ?

Answer (2): Read Replicas use asynchronous replication, which allows data to be copied to the replica with a slight delay, suitable for scaling and offloading read traffic. Multi-AZ deployments use synchronous replication, ensuring data is written to both the primary and standby instances simultaneously for high availability. The other options incorrectly state both use asynchronous or synchronous replication, which is not accurate. Synchronous replication in Multi-AZ provides data consistency during failover. Therefore, the correct answer accurately reflects the different replication methods used.

Question 14:

How do you encrypt an unencrypted RDS DB instance?

Answer (3): The correct method involves creating a snapshot, copying it with encryption enabled, and restoring the instance from this encrypted snapshot, as encryption cannot be directly enabled on an existing unencrypted RDS instance. The first option, encrypting directly from the console without snapshotting, is not possible because RDS does not support on-the-fly encryption of running instances. The second option, stopping the database before snapshotting, is unnecessary; snapshots can be created while the database is running. Restoring from an encrypted snapshot applies encryption to the new instance, which is the correct approach. This process ensures data encryption without downtime or complex configurations.

Question 15:

For your RDS database, you can have up to ............ Read Replicas.

Answer (2): The correct answer is 15, which is the maximum number of Read Replicas allowed for an RDS database, providing scalable read capacity. The choice of 5 is too low and limits scalability unnecessarily. The option of 7 is also below the maximum limit, so it does not represent the highest possible replicas. The limit is set to 15 for most database engines, allowing significant read scaling. Therefore, 15 is the correct maximum number allowed by AWS.

Question 16:

Which RDS database technology does NOT support IAM Database Authentication?

Answer (2): Oracle does not support IAM Database Authentication, so it cannot leverage AWS IAM for database access. PostgreSQL and MySQL, on the other hand, do support IAM authentication, enabling secure, centralized access management through IAM roles. The other options, "PostgreSQL" and "MySQL," support IAM, making them incorrect choices for this question. Oracle's architecture and authentication methods differ, which is why it does not integrate with IAM-based authentication. Therefore, Oracle is the correct answer as it does not support IAM Database Authentication.

Question 17:

You have an un-encrypted RDS DB instance and you want to create Read Replicas. Can you configure the RDS Read Replicas to be encrypted?

Answer (1): You cannot create encrypted Read Replicas from an un-encrypted RDS DB instance because encryption must be enabled at the source instance before replication. AWS does not allow converting or encrypting a Read Replica after it has been created from an unencrypted source. To have an encrypted Read Replica, you must first encrypt the source database through snapshot and restore procedures. This restriction ensures data at rest remains encrypted and secure. Therefore, the correct answer is "No."

Question 18:

An application running in production is using an Aurora Cluster as its database. Your development team would like to run a version of the application in a scaled-down application with the ability to perform some heavy workload on a need-basis. Most of the time, the application will be unused. Your CIO has tasked you with helping the team to achieve this while minimizing costs. What do you suggest?

Answer (3): Aurora Serverless automatically scales capacity up or down based on workload, making it cost-effective for infrequent and variable usage, which matches the team's needs. Using a global database is more suited for multi-region replication and not cost-efficient for small, infrequent workloads. An RDS database or running Aurora on EC2 would require maintaining resources constantly, increasing costs when the app is unused. Shutting down EC2 instances only addresses compute, not the database cost, and is less flexible than Aurora Serverless. Therefore, Aurora Serverless best minimizes costs while handling variable workloads.

Question 19:

How many Aurora Read Replicas can you have in a single Aurora DB Cluster?

Answer (3): Aurora natively supports both MySQL and PostgreSQL, making it compatible with those database engines. Aurora does not support MariaDB, Oracle, or MS SQL Server directly; these are separate from Aurora's supported engines. MariaDB is similar but not officially supported as an Aurora engine. Oracle and MS SQL Server are proprietary databases and are not compatible with Aurora. Therefore, "MySQL and PostgreSQL" is the correct answer, supporting Aurora's capabilities.

Question 20:

Amazon Aurora supports both …………………….. databases.

Answer (2): Aurora supports only MySQL and PostgreSQL engines, making it compatible with both. MariaDB is not supported by Aurora, so you can't use it directly. Oracle and MS SQL Server are proprietary databases with different architectures, so they are not compatible with Aurora. Aurora is designed to work specifically with MySQL and PostgreSQL for seamless integration. Therefore, "MySQL and PostgreSQL" is correct because only these two are supported by Aurora.

Question 21:

You work as a Solutions Architect for a gaming company. One of the games mandates that players are ranked in real-time based on their score. Your boss asked you to design then implement an effective and highly available solution to create a gaming leaderboard. What should you use?

Answer (4): ElastiCache for Redis with Sorted Sets is ideal for real-time ranking because it allows fast, in-memory updates and retrievals of ordered data, making leaderboards highly responsive and available. RDS for MySQL can store data, but it's slower for real-time updates and querying, which is critical for gaming leaderboards. Amazon Aurora provides high availability but isn't optimized for the ultra-low latency and real-time ranking needed here. ElastiCache for Memcached offers fast caching but lacks built-in support for ordered data types like Sorted Sets. Therefore, Redis Sorted Sets are the best fit for creating a highly available, real-time gaming leaderboard.

Question 22:

You need full customization of an Oracle Database on AWS. You would like to benefit from using the AWS services. What do you recommend?

Answer (2): RDS Custom for Oracle provides full customization options on AWS, allowing more control over the database environment, including access to the underlying OS and configurations. RDS for Oracle offers managed service with limited customization, suitable for standardized use cases but not full control. Deploying Oracle on EC2 gives complete customization but requires managing the infrastructure and maintenance yourself, which is less optimized than RDS Custom. RDS Custom strikes a balance by providing control while reducing administrative overhead. Therefore, RDS Custom for Oracle is the best choice for full customization with managed AWS services.

Question 23:

You need to store long-term backups for your Aurora database for disaster recovery and audit purposes. What do you recommend?

Answer (2): Perform On Demand Backups allows you to manually create backups that can be stored for as long as needed for disaster recovery and audits. Automated Backups have a maximum retention period of 35 days, which is insufficient for long-term storage. Aurora Database Cloning creates copies of the database but does not serve as a long-term backup solution. On Demand Backups give you control over backup retention duration beyond the automated retention period. Therefore, performing on-demand backups is best for long-term storage needs.

Question 24:

Your development team would like to perform a suite of read and write tests against your production Aurora database because they need access to production data as soon as possible. What do you advise?

Answer (4): Using Aurora Cloning creates a fast, separate copy of the database for testing without impacting production. Creating a Read Replica allows read-only access but isn't suitable for write testing or immediate data access. Testing directly against the production database risks affecting live users and data integrity. Making a DB Snapshot and restoring it is slower and unnecessary when cloning provides a quicker, safer option. Therefore, Aurora Cloning is the best choice for testing without affecting production performance or data.

Question 25:

You have 100 EC2 instances connected to your RDS database and you see that upon a maintenance of the database, all your applications take a lot of time to reconnect to RDS, due to poor application logic. How do you improve this?

Answer (4): Using RDS Proxy helps manage database connections efficiently, reducing connection time during failovers or maintenance. Fixing all the applications is impractical and time-consuming. Disabling Multi-AZ removes high availability features, risking longer downtime during failover. Enabling Multi-AZ improves availability but doesn't address connection interruptions during maintenance. Therefore, RDS Proxy is best for maintaining persistent connections and minimizing disruption.

AWS Cloud Practitioner Questions | IAM Advanced

Minoltan Issack — Sun, 22 Mar 2026 06:01:39 +0000

Question 1:

You have strong regulatory requirements to only allow fully internally audited AWS services in production. You still want to allow your teams to experiment in a development environment while services are being audited. How can you best set this up?

Answer (3): By creating an AWS Organization with separate Organizational Units (OUs) for Prod and Dev, and applying a Service Control Policy (SCP) on the Prod OU, you effectively enforce compliance in your production environment while allowing flexibility for experimentation in development. This setup aligns with your regulatory requirements by ensuring only vetted services are accessible in production.

Question 2:

You are managing the AWS account for your company, and you want to give one of the developers access to read files from an S3 bucket. You have updated the bucket policy to this, but he still can't access the files in the bucket. What is the problem?

{

    "Version": "2012-10-17",

    "Statement": [{

        "Sid": "AllowsRead",

        "Effect": "Allow",

        "Principal": {

            "AWS": "arn:aws:iam::123456789012:user/Dave"

        },

        "Action": "s3:GetObject",

        "Resource": "arn:aws:s3:::static-files-bucket-xxx"

     }]

}

Answer (3): The permission specified in the bucket policy only grants access to the bucket itself, not to the objects within it. By changing the resource to "arn:aws:s3:::static-files-bucket-xxx/*," you allow access to the individual files, which is necessary for object-level permissions.

Question 3:

You have 5 AWS Accounts that you manage using AWS Organizations. You want to restrict access to certain AWS services in each account. How should you do that?

Answer (2): By selecting "Using AWS Organizations SCP," you correctly identified the most effective way to restrict access to specific AWS services across multiple accounts, as Service Control Policies provide a centralized method for managing permissions within your organization. This aligns with your goal of implementing governance and compliance measures across your AWS accounts effectively.

Question 4:

Which of the following IAM condition key you can use only to allow API calls to a specified AWS region?

Answer (4): It specifically allows or denies API calls based on the region specified in the request, aligning perfectly with the requirement of controlling access to a specified AWS region. This understanding helps you effectively manage permissions and enforce regional restrictions in your AWS environment.

Question 5:

When configuring permissions for EventBridge to configure a Lambda function as a target you should use ………………….. but when you want to configure a Kinesis Data Streams as a target you should use

Answer (2): Using a resource-based policy for EventBridge allows you to define permissions directly on the Lambda function, while an identity-based policy is appropriate for Kinesis Data Streams, as it manages permissions based on the IAM role or user accessing the service. This distinction is key for correctly configuring permissions in AWS.

AWS Cloud Practitioner Questions | Networking & VPC

Minoltan Issack — Sat, 21 Feb 2026 10:35:24 +0000

Question 1:

What does this CIDR 10.0.4.0/28 correspond to?

Answer (1): CIDR notation "/28" indicates a subnet with 16 available IP addresses, ranging from the starting address 10.0.4.0 to 10.0.4.15, as only the last four bits change in this subnet. Great job understanding how CIDR notation works!

Question 2:

You have a corporate network of size 10.0.0.0/8 and a satellite office of size 192.168.0.0/16. Which CIDR is acceptable for your AWS VPC if you plan on connecting your networks later on?

Answer (2): It fits within the private IP address range and does not overlap with your existing networks, which is essential for proper routing and connectivity in your AWS VPC. This choice also adheres to the maximum CIDR size requirement in AWS, ensuring effective network management.

How to get the answer: A Step-by-Step Guide

1. Identify the "Taken" Space
First, look at the private IP ranges already in use. According to RFC 1918, there are three main blocks reserved for private networks:

10.0.0.0/8: (Used by your Corporate Network)
172.16.0.0/12: (Available)
192.168.0.0/16: (Used by your Satellite Office)

2. Apply the Rule of Non-Overlap
If you choose a VPC range that sits inside the 10.x.x.x or 192.168.x.x space, your routers won't know where to send a packet.

Example: If your VPC is 10.0.1.0/24 and your Corporate network is 10.0.0.0/8, the Corporate network contains the VPC range. When a computer in the office tries to talk to the VPC, it might think that IP address is just down the hall in the office rather than across the VPN/Direct Connect to AWS.

3. Select from the Remaining Private Space
Since the 10.x and 192.168.x blocks are occupied, the 172.16.0.0/12 block is your rest candidate, but a common choice is 172.16.0.0/16, which provides 65,536 IP addresses - plenty for most VPC needs.

Note : A /12 is significantly larger than a /16. In networking, the smaller the prefix number, the larger the network. A /12 contains sixteen /16 networks. AWS simply won't let you type 172.16.0.0/12 into the console.

Question 3:

You plan on creating a subnet and want it to have at least capacity for 28 EC2 instances. What's the minimum size you need to have for your subnet?

Answer (3): The minimum size you need is a ** /26 **. While a /27 provides 32 total addresses, once AWS takes its 5 reserved IPs, you are left with only 27 usable slots. Since you need 28, you must move up to the next binary step, which is a /26.

The Calculation
If you need 28 instances, your total IP requirement is:

28 (for your EC2 instances)
+ 5 (AWS Reserved IPs)
= 33 Total IP addresses required.

Now, we look at CIDR notation (which works in powers of 2) to find the smallest block that fits at least 33 addresses:

Question 4:

Security Groups operate at the ................. level while NACLs operate at the ................. level.

Answer (1): Security Groups operate at the instance level while NACLs operate at the subnet level.

Question 5:

You have attached an Internet Gateway to your VPC, but your EC2 instances still don't have access to the internet. What is NOT a possible issue?

Answer (3): Security groups in AWS are stateful, meaning that if an outgoing request is allowed, the corresponding inbound response will also be allowed, making this option not applicable to your EC2 instances' internet access issue. Keep up the great work understanding security groups!

Question 6:

You would like to provide Internet access to your EC2 instances in private subnets with IPv4 while making sure this solution requires the least amount of administration and scales seamlessly. What should you use?

Answer (3): It is the best option for providing seamless internet access to your EC2 instances in private subnets while minimizing administrative overhead, as it automatically scales with your traffic demands. This choice aligns perfectly with your goal of efficient and hassle-free network management.

Why the other answers are wrong:

1. Egress-Only Internet Gateway (EOIGW)

The Flaw: Egress-Only IGWs are strictly for IPv6 traffic.
Why it fails here: Your question specifically asks for IPv4 access. IPv4 and IPv6 use entirely different protocols for "hiding" private instances. An EOIGW cannot translate IPv4 addresses.

2. NAT Instances

The Flaw: These are DIY (Do-It-Yourself) virtual machines. Why it fails here: * High Administration: You are responsible for managing the EC2 instance, patching the OS, and configuring the NAT software (like iptables).
Poor Scaling: If your traffic exceeds the instance's bandwidth, you have to manually upgrade the instance size (vertical scaling) or set up a complex fleet (horizontal scaling). It does not scale "seamlessly" like a NAT Gateway does.
Single Point of Failure: Unless you set up a high-availability script, if that one instance crashes, your entire private subnet loses internet access.

Question 7:

VPC Peering has been enabled between VPC A and VPC B, and the route tables have been updated for VPC A. But, the EC2 instances cannot communicate. What is the likely issue?

Answer (2): In VPC Peering, both VPCs need updated route tables to allow communication between them; neglecting VPC B's route table can block traffic. This understanding highlights the importance of proper configuration in networking setups on AWS.

Question 8:

You have set up a Direct Connect connection between your corporate data center and your VPC A in your AWS account. You need to access VPC B in another AWS region from your corporate datacenter as well. What should you do?

Answer (3): It enables you to access multiple VPCs across different regions from your corporate data center, providing a seamless connection. This choice effectively aligns with the objective of optimizing network connectivity in multi-region architectures.

Question 9:

When using VPC Endpoints, what are the only two AWS services that have a Gateway Endpoint available?

Answer (3): These are the only AWS services that support a Gateway Endpoint, which allows private connections to your VPC without using public IPs. This understanding is crucial for efficiently managing secure connections in your AWS architecture.

Question 10:

AWS reserves 5 IP addresses each time you create a new subnet in a VPC. When you create a subnet with CIDR 10.0.0.0/24, the following IP addresses are reserved, EXCEPT ....................

Answer (4): AWS reserves the first four IP addresses (10.0.0.0 to 10.0.0.3) in a subnet for specific functions, meaning 10.0.0.4 is the first usable address and not reserved. This understanding is key when managing IP addresses within your VPC's subnets.

The Reserved List for 10.0.0.0/24
For this specific subnet, the reserved addresses are:

10.0.0.0: Network address.
10.0.0.1: Reserved by AWS for the VPC router.
10.0.0.2: Reserved by AWS for mapping to Amazon Provided DNS.
10.0.0.3: Reserved by AWS for future use.
10.0.0.255: Network broadcast address (AWS does not support broadcast, but it reserves this address anyway).

Question 11:

You have 3 VPCs A, B, and C. You want to establish a VPC Peering connection between all the 3 VPCs. What should you do?

Answer (2): Because VPC Peering does not support transitive relationships, meaning each VPC must be directly peered with every other VPC to enable communication. This understanding is crucial for establishing effective connections among multiple VPCs in your AWS environment.

Question 12:

How can you capture information about IP traffic inside your VPCs?

Answer (1): Because this feature allows you to capture and analyze IP traffic data for network interfaces in your VPC, essential for monitoring network activity and auditing connections. Understanding this capability aligns with your learning objective of effectively managing and securing your AWS network infrastructure.

Question 13:

If you want a 500 Mbps Direct Connect connection between your corporate datacenter to AWS, you would choose a .................. connection.

Answer (2): It supports connections specifically at 500 Mbps, making it the appropriate choice for establishing your desired Direct Connect connection to AWS. This understanding aligns well with your learning about optimizing network performance within your AWS architecture.

Question 14:

When you set up an AWS Site-to-Site VPN connection between your corporate on-premises datacenter and VPCs in AWS Cloud, what are the two major components you want to configure for this connection?

Answer (4): Because these are the essential components needed to establish a Site-to-Site VPN connection between your on-premises datacenter and the AWS Cloud. This understanding aligns with your goal of mastering AWS networking and ensuring secure communication between environments.

Question 15:

Your company has several on-premises sites across the USA. These sites are currently linked using private connections, but your private connections provider has been recently quite unstable, making your IT architecture partially offline. You would like to create a backup connection that will use the public Internet to link your on-premises sites, that you can failover in case of issues with your provider. What do you recommend?

Answer (2): It allows you to establish secure communications between multiple on-premises sites over the public Internet using a hub-and-spoke model. This solution aligns perfectly with your objective of ensuring reliable backup connectivity for your environments during potential outages.

Question 16:

You need to set up a dedicated connection between your on-premises corporate datacenter and AWS Cloud. This connection must be private, consistent, and traffic must not travel through the Internet. Which AWS service should you use?

Answer (3): It provides a dedicated, private connection between your on-premises datacenter and AWS, ensuring consistent performance without passing through the public Internet. This aligns perfectly with your goal of establishing a reliable and secure network infrastructure.

Wrong Choices

1. AWS Site-to-Site VPN
Think of this as the "Fast and Affordable" alternative to Direct Connect. It creates an encrypted tunnel between your on-premises data center and your AWS VPC using the Public Internet.
2. AWS PrivateLink
PrivateLink is fundamentally different. It isn't a "network-to-network" connection; it is a "Service-to-Service" connection. It allows you to expose a specific service (like a database or a third-party API) to another VPC or on-premises network without ever using an Internet Gateway, NAT Gateway, or Peering.
4. Amazon EventBridge
EventBridge is often a "distractor" answer when you are asked about establishing a network connection. The reason EventBridge is not the answer for a "dedicated connection" or "private network link" is a matter of Layer and Purpose.

Question 17:

Using a Direct Connect connection, you can access both public and private AWS resources.

Answer (1): You can indeed access both public resources, like AWS S3 buckets, and private resources, such as EC2 instances in a Virtual Private Cloud (VPC). This understanding reinforces your knowledge of how to optimize secure connectivity to AWS resources.

Question 18:

You want to scale up an AWS Site-to-Site VPN connection throughput, established between your on-premises data and AWS Cloud, beyond a single IPsec tunnel's maximum limit of 1.25 Gbps. What should you do?

Answer (3): It allows you to scale multiple Site-to-Site VPN connections and aggregate traffic efficiently, overcoming the 1.25 Gbps limit of a single IPsec tunnel. This choice showcases your understanding of how Transit Gateway can enhance connectivity and performance in AWS networking.

Question 19:

You have a VPC in your AWS account that runs in a dual-stack mode. You are continuously trying to launch an EC2 instance, but it fails. After further investigation, you have found that you are no longer have IPv4 addresses available. What should you do?

Answer (3): You chose the appropriate solution to increase the number of available IPv4 addresses, allowing you to launch your EC2 instance successfully. This action directly addresses the issue of address depletion in your VPC while maintaining your current network configuration.

Question 20:

A web application backend is hosted on EC2 instances in private subnets fronted by an Application Load Balancer in public subnets. There is a requirement to give some of the developers access to the backend EC2 instances but without exposing the backend EC2 instances to the Internet. You have created a bastion host EC2 instance in the public subnet and configured the backend EC2 instances Security Group to allow traffic from the bastion host. Which of the following is the best configuration for bastion host Security Group to make it secure?

Answer (2): Ensured that SSH access to the bastion host is secure, allowing developers to manage backend EC2 instances without exposing them to the internet. This configuration supports your learning objective of implementing secure access to resources in AWS environments.

Question 21:

A company has set up a Direct Connect connection between their corporate data center to AWS. There is a requirement to prepare a cost-effective secure backup connection in case there are issues with this Direct Connect connection. What is the most cost effective and secure solution you recommend?

Answer (3): By selecting "Setup a Site-to-Site VPN connection as a backup," you chose a cost-effective solution that provides a secure alternative in case the primary Direct Connect connection fails. This approach ensures continuous connectivity while balancing security and cost, aligning well with the goal of maintaining reliable access to AWS resources.

Question 22:

Which AWS service allows you to protect and control traffic in your VPC from layer 3 to layer 7?

Answer (1): The service designed to protect and control traffic in your VPC across multiple layers, ensuring robust security for your cloud resources. This aligns with your learning objective of understanding traffic management and security within AWS environments.

Question 23:

A web application hosted on a fleet of EC2 instances managed by an Auto Scaling Group. You are exposing this application through an Application Load Balancer. Both the EC2 instances and the ALB are deployed on a VPC with the following CIDR 192.168.0.0/18. How do you configure the EC2 instances' security group to ensure only the ALB can access them on port 80?

Answer (3): By choosing "Add an Inbound Rule with port 80 and ALB's Security Group as the source," you ensured that only the Application Load Balancer can communicate with your EC2 instances, significantly enhancing your security posture. This aligns with your learning objective of understanding VPC traffic management and the importance of using security groups for precise access control.

AWS Cloud Practitioner Questions | High availability & Scalability

Minoltan Issack — Wed, 11 Feb 2026 09:59:53 +0000

Question 1:

Scaling an EC2 instance from r4.large to r4.4xlarge is called .....................

Correct Answer: (2) Scaling an EC2 instance from a smaller size (r4.large) to a larger one (r4.4xlarge) is an example of upgrading the resources of a single instance, which defines vertical scalability. This concept focuses on increasing the capacity of existing hardware rather than adding more instances.

Question 2:

Running an application on an Auto Scaling Group that scales the number of EC2 instances in and out is called .....................

Correct Answer: (1) Running an application on an Auto Scaling Group involves adding or removing instances to handle changes in demand, which perfectly exemplifies the concept of horizontally scaling by increasing capacity through multiple instances rather than upgrading a single instance's resources.

Question 3:

Elastic Load Balancers provide a .......................

Correct Answer: (2) Elastic Load Balancers provide a constant endpoint for your application, allowing you to manage changes in the underlying infrastructure without affecting how your users connect to your services. This ensures reliability and accessibility, aligning with best practices in application scalability.

Question 4:

You are running a website on 10 EC2 instances fronted by an Elastic Load Balancer. Your users are complaining about the fact that the website always asks them to re-authenticate when they are moving between website pages. You are puzzled because it's working just fine on your machine and in the Dev environment with 1 EC2 instance. What could be the reason?

Correct Answer: (3) Sticky Sessions enabled on the Elastic Load Balancer, user requests may be routed to different EC2 instances, causing loss of session data and prompting re-authentication. This feature ensures that users are consistently directed to the same instance, maintaining their session state as they navigate the website.

Question 5:

You are using an Application Load Balancer to distribute traffic to your website hosted on EC2 instances. It turns out that your website only sees traffic coming from private IPv4 addresses which are in fact your Application Load Balancer's IP addresses. What should you do to get the IP address of clients connected to your website?

Correct Answer: (2) To get the client IP address from the X-Forwarded-For header" is correct because the Application Load Balancer (ALB) uses this header to forward the original client's IP address to your EC2 instances, enabling accurate tracking of user traffic. This capability is essential for effective logging, analytics, and security measures on your site.

Question 6:

You hosted an application on a set of EC2 instances fronted by an Elastic Load Balancer. A week later, users begin complaining that sometimes the application just doesn't work. You investigate the issue and found that some EC2 instances crash from time to time. What should you do to protect users from connecting to the EC2 instances that are crashing?

Correct Answer: (1) This feature allows the Elastic Load Balancer to automatically monitor the health of your EC2 instances. By doing so, it prevents routing traffic to any instances that are unhealthy or crashed, ensuring a better experience for your users.

Question 7:

You are working as a Solutions Architect for a company and you are required to design an architecture for a high-performance, low-latency application that will receive millions of requests per second. Which type of Elastic Load Balancer should you choose?

Correct Answer: (2) It is designed to handle millions of requests per second, delivering the highest performance and lowest latency, making it ideal for high-performance applications. This choice aligns with the objective of optimizing application efficiency in demanding environments.

Question 8:

Application Load Balancers support the following protocols, EXCEPT:

Correct Answer: (3) Application Load Balancers are specifically designed to support application-layer protocols such as HTTP, HTTPS, and WebSocket, but do not support transport-layer protocols like TCP. This distinction is crucial for understanding how different load balancers operate based on the protocols they manage.

Question 9:

Application Load Balancers can route traffic to different Target Groups based on the following, EXCEPT:

Correct Answer: (1) Application Load Balancers do not route traffic based on geographic location; instead, they can route based on criteria like URL Path and Hostname. This distinction helps clarify how ALBs function in managing traffic efficiently.

Question 10:

Registered targets in a Target Groups for an Application Load Balancer can be one of the following, EXCEPT:

Correct Answer: (2) Registered targets in an Application Load Balancer's Target Groups can only include EC2 Instances, Private IP Addresses, and Lambda Functions, but not other load balancers. This distinction highlights the specific roles each service plays within the AWS ecosystem.

Question 11:

For compliance purposes, you would like to expose a fixed static IP address to your end-users so that they can write firewall rules that will be stable and approved by regulators. What type of Elastic Load Balancer would you choose?

Correct Answer: (2) It allows you to attach an Elastic IP address, providing a stable and fixed static IP for compliance purposes, which is essential for your end-users. This capability makes it an ideal choice for ensuring consistency in firewall rules and regulatory approval.

Question 12:

You want to create a custom application-based cookie in your Application Load Balancer. Which of the following you can use as a cookie name?

Correct Answer: (2) it is a valid cookie name you can define for your custom application-based cookies in an Application Load Balancer, while the other options are reserved names used by AWS. This distinction helps ensure you create custom cookies effectively for managing user sessions in your application.

Question 13:

You have a Network Load Balancer that distributes traffic across a set of EC2 instances in us-east-1. You have 2 EC2 instances in us-east-1b AZ and 5 EC2 instances in us-east-1e AZ. You have noticed that the CPU utilization is higher in the EC2 instances in us-east-1b AZ. After more investigation, you noticed that the traffic is equally distributed across the two AZs. How would you solve this problem?

Correct Answer: (1) It ensures that traffic is distributed evenly across all your EC2 instances in different Availability Zones, helping to balance the CPU utilization among them. This effectiveness directly addresses the issue of uneven resource usage in your load-balanced environment.

Question 14:

Which feature in both Application Load Balancers and Network Load Balancers allows you to load multiple SSL certificates on one listener?

Correct Answer: (2) It is the feature that allows multiple SSL certificates to be bound to a single listener in both Application Load Balancers and Network Load Balancers. This capability enables you to host multiple secure domains on the same IP address, making it efficient and cost-effective for managing SSL certificates.

Question 15:

You have an Application Load Balancer that is configured to redirect traffic to 3 Target Groups based on the following hostnames: users.example.com, api.external.example.com, and checkout.example.com. You would like to configure HTTPS for each of these hostnames. How do you configure the ALB to make this work?

Correct Answer: (3) SNI allows you to assign multiple SSL certificates to different hostnames on the same Application Load Balancer listener, making it possible to securely configure HTTPS for all your specified domains efficiently. This aligns with your learning objective of understanding how to manage SSL certificates in a load-balanced environment.

Question 16:

You have an application hosted on a set of EC2 instances managed by an Auto Scaling Group that you configured both desired and maximum capacity to 3. Also, you have created a CloudWatch Alarm that is configured to scale out your ASG when CPU Utilization reaches 60%. Your application suddenly received huge traffic and is now running at 80% CPU Utilization. What will happen?

Correct Answer: (1) The maximum capacity of your Auto Scaling Group is set to 3, which means it cannot scale beyond this limit regardless of the increased CPU utilization. This reinforces your understanding of Auto Scaling Group configurations and their constraints.

Question 17:

You have an Auto Scaling Group fronted by an Application Load Balancer. You have configured the ASG to use ALB Health Checks, then one EC2 instance has just been reported unhealthy. What will happen to the EC2 instance?

Correct Answer: (3) Auto Scaling Group (ASG) uses Application Load Balancer (ALB) health checks to monitor instance health. When an instance is marked unhealthy by the ALB, the ASG terminates it and launches a new instance to maintain the desired capacity and reliability.

Question 18:

Your boss asked you to scale your Auto Scaling Group based on the number of requests per minute your application makes to your database. What should you do?

Correct Answer: (1) Standard CloudWatch metrics do not capture requests per minute for database connections. This approach allows you to effectively monitor your application's needs and scale the Auto Scaling Group accordingly, aligning with your objective of understanding dynamic scaling based on application performance.

Question 19:

An application is deployed with an Application Load Balancer and an Auto Scaling Group. Currently, you manually scale the ASG and you would like to define a Scaling Policy that will ensure the average number of connections to your EC2 instances is around 1000. Which Scaling Policy should you use?

Correct Answer: (3) It allows you to automatically adjust the number of EC2 instances in your Auto Scaling Group to maintain a specific metric, such as the average number of connections, close to your target of 1000. This approach effectively simplifies scaling based on real-time performance metrics, aligning directly with your objective of automating resource management.

Question 20:

You have an ASG and a Network Load Balancer. The application on your ASG supports the HTTP protocol and is integrated with the Load Balancer health checks. You are currently using the TCP health checks. You would like to migrate to using HTTP health checks, what do you do?

Correct Answer: (2) The Network Load Balancer (NLB) is capable of using HTTP health checks, which are more tailored for applications supporting the HTTP protocol. This ensures more accurate monitoring of application availability and performance.

Question 21:

You have a website hosted in EC2 instances in an Auto Scaling Group fronted by an Application Load Balancer. Currently, the website is served over HTTP, and you have been tasked to configure it to use HTTPS. You have created a certificate in ACM and attached it to the Application Load Balancer. What you can do to force users to access the website using HTTPS instead of HTTP?

Correct Answer: (2) By configuring the Application Load Balancer to redirect HTTP to HTTPS, you ensure that all traffic to your website is securely encrypted, enhancing user privacy and site security. This action directly meets the learning objective of effectively managing web application traffic and implementing security best practices within AWS environments.

AWS Use Cases | Enhanced Streak System for Game Portal with Leaderboards & Rewards

Minoltan Issack — Mon, 01 Dec 2025 17:04:50 +0000

Introduction to Streaks

A streak is a consecutive count of days (or actions) a user performs a specific activity without breaking the chain. Streaks are commonly used in:

Habit-tracking apps (e.g., Duolingo, Headspace)
Gaming (daily login rewards, consecutive wins)
Fitness apps (workout consistency)
E-learning platforms (daily learning goals)

How AWS Helps Implement Streaks

AWS provides serverless and scalable solutions to track streaks efficiently:

AWS Lambda → Runs streak logic (increment, reset, reward checks) DynamoDB → Stores user streak data (last activity, current streak count)
API Gateway → Exposes APIs for frontend (web/mobile apps)
Amazon Cognito (Optional) → Handles user authentication
AWS CDK → Easy Deployment

Use Cases for Streaks & Implementation Steps

1. Daily Login Streaks (Gaming/Fitness Apps)

Goal: Reward users for logging in daily.

Implementation Steps:

1. Set Up DynamoDB Table

Table: UserStreak
Partition Key: userId (String)
Sort Key: streakType
Attributes: currentStreak, lastLogin, longestStreak

2. Create streakTrack Lambda Function

Checks if the user logged in today → skip
If logged in yesterday → increment streak
If missed a day → reset streak

import { UpdateItemCommand, GetItemCommand } from "@aws-sdk/client-dynamodb";
import { marshall, unmarshall } from "@aws-sdk/util-dynamodb";
import { ddbClient } from "./client";

const TABLE_NAME = process.env.STREAK_TABLE_NAME;
const MAX_FREEZE_DAYS = 2;

export const handler = async (event) => {
  try {
    const { userId } = JSON.parse(event.body);
    if (!userId) {
      return { statusCode: 400, body: JSON.stringify({ error: "userId is required" }) };
    }

    const today = new Date().toISOString().split("T")[0];
    const yesterday = new Date();
    yesterday.setDate(yesterday.getDate() - 1);
    const yesterdayStr = yesterday.toISOString().split("T")[0];

    // ✅ Get current streak and freeze days
    const { currentStreak, lastLogin, freezeDaysRemaining } = await getUserData(userId);

    // ✅ If already logged in today
    if (lastLogin === today) {
      return success({ message: "Already logged in today", currentStreak, freezeDaysRemaining });
    }

    let newStreak = 1;
    let newFreeze = freezeDaysRemaining;

    // ✅ Case 1: Consecutive login (yesterday)
    if (lastLogin === yesterdayStr) {
      newStreak = currentStreak + 1;
    } 
    // ✅ Case 2: Missed days but has freeze days → use one
    else if (freezeDaysRemaining > 0) {
      newStreak = currentStreak; // keep streak intact
      newFreeze = freezeDaysRemaining - 1; // use one freeze day
    }

    // ✅ Update DB
    await updateUserData(userId, today, newStreak, newFreeze);

    return success({
      message: freezeDaysRemaining > 0 && lastLogin !== yesterdayStr ? 
        "Missed day covered by a freeze day" : "Streak updated",
      currentStreak: newStreak,
      freezeDaysRemaining: newFreeze
    });

  } catch (err) {
    console.error("Error:", err);
    return { statusCode: 500, body: JSON.stringify({ error: err.message }) };
  }
};

// 🔹 Get user streak & freeze data
async function getUserData(userId) {
  const { Item } = await ddbClient.send(new GetItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ userId, streakType: "daily" }), // using same PK as freeze
  }));

  if (!Item) return { currentStreak: 0, lastLogin: null, freezeDaysRemaining: 0 };

  const data = unmarshall(Item);
  return {
    currentStreak: data.currentStreak || 0,
    lastLogin: data.lastLogin || null,
    freezeDaysRemaining: data.freezeDaysRemaining || 0
  };
}

// 🔹 Update streak and freeze count
async function updateUserData(userId, today, newStreak, newFreeze) {
  await ddbClient.send(new UpdateItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ userId, streakType: "daily" }),
    UpdateExpression: "SET currentStreak = :cs, lastLogin = :dt, freezeDaysRemaining = :fd",
    ExpressionAttributeValues: marshall({
      ":cs": newStreak,
      ":dt": today,
      ":fd": newFreeze
    })
  }));
}

// 🔹 Helper success response
function success(body) {
  return {
    statusCode: 200,
    headers: { "Access-Control-Allow-Origin": "*" },
    body: JSON.stringify(body)
  };
}

3.Create streakFreeze Lambda Function

import { GetItemCommand, UpdateItemCommand } from "@aws-sdk/client-dynamodb";
import { marshall, unmarshall } from "@aws-sdk/util-dynamodb";
import { ddbClient } from "./client.js";

const STREAK_TABLE_NAME = process.env.STREAK_TABLE_NAME;

export const handler = async (event) => {
    try {
        const { userId } = await validateAndParseInput(event.body);

        const { freezeDaysRemaining, itemExists } = await getCurrentFreezeDays(userId);

        if (freezeDaysRemaining >= 2) {
            return formatErrorResponse(400, "Maximum freeze days (2) already reached");
        }

        const updatedFreeze = await updateFreezeDays(userId, freezeDaysRemaining, itemExists);

        return {
            statusCode: 200,
            headers: { "Access-Control-Allow-Origin": "*" },
            body: JSON.stringify({
                status: "success",
                freezeDaysRemaining: updatedFreeze
            })
        };

    } catch (error) {
        console.error("handler: ", error);
        return formatErrorResponse(400, error.message);
    }
};

async function validateAndParseInput(body) {
    const payload = JSON.parse(body);
    const { userId } = payload;

    if (!userId) {
        throw new Error("Missing required field: userId");
    }

    return { userId };
}

async function getCurrentFreezeDays(userId) {
    const { Item } = await ddbClient.send(new GetItemCommand({
        TableName: STREAK_TABLE_NAME,
        Key: marshall({ userId, streakType: "daily" }),
        ProjectionExpression: "freezeDaysRemaining"
    }));

    return {
        freezeDaysRemaining: Item ? unmarshall(Item).freezeDaysRemaining || 0 : 0,
        itemExists: !!Item
    };
}

async function updateFreezeDays(userId, currentFreezeDays, itemExists) {
    const updateParams = {
        TableName: STREAK_TABLE_NAME,
        Key: marshall({ userId, streakType: "daily" }),
        UpdateExpression: "SET freezeDaysRemaining = :newVal",
        ExpressionAttributeValues: marshall({ ":newVal": currentFreezeDays + 1 }),
        ReturnValues: "ALL_NEW"
    };

    if (!itemExists) {
        // For new records, set additional default values
        updateParams.UpdateExpression = "SET freezeDaysRemaining = :newVal, currentStreak = :zero, longestStreak = :zero, lastActivity = :empty";
        updateParams.ExpressionAttributeValues = marshall({
            ":newVal": 1,
            ":zero": 0,
            ":empty": ""
        });
    }

    const { Attributes } = await ddbClient.send(new UpdateItemCommand(updateParams));
    return unmarshall(Attributes).freezeDaysRemaining;
}

function formatErrorResponse(statusCode, message) {
    return {
        statusCode,
        headers: { "Access-Control-Allow-Origin": "*" },
        body: message
    };
}

4. Set Up API Gateway

POST /streak/track → Triggers Lambda
POST /streak/freeze

5. Frontend Integration

Call API when user logs in
Display streak count

Example Explanation

Initial Conditions

currentStreak = 3
freezeDaysRemaining = 1
lastLogin = 2025-07-28

✅ Case 1: User logs in on 2025–07–29 (yesterday was last login)

Lambda receives event: { "userId": "1134" }
It checks lastLogin === yesterday (2025-07-28) → ✅ yes.
No freeze day is used.
currentStreak = 4, freezeDaysRemaining = 1
Response:

{
  "message": "Streak updated",
  "currentStreak": 4,
  "freezeDaysRemaining": 1
}

✅ Case 2: User skips 2025–07–29, logs in on 2025–07–30

Missed one day (2025–07–29)

Lambda checks: lastLogin = 2025-07-28, today = 2025-07-30
lastLogin !== yesterday, so normally streak would reset.
But freezeDaysRemaining > 0 → ✅ use one freeze.
currentStreak stays 3, freezeDaysRemaining = 0
Response:

{
  "message": "Missed day covered by a freeze day",
  "currentStreak": 3,
  "freezeDaysRemaining": 0
}

✅ Case 3: User skips 2025–07–31, logs in on 2025–08–01

Missed two consecutive days and has no freeze left

Lambda checks: lastLogin = 2025-07-28, today = 2025-08-01
lastLogin !== yesterday, and freezeDaysRemaining = 0
No freeze day available → streak resets to 1
currentStreak = 1, freezeDaysRemaining = 0
Response:

{
  "message": "Streak updated",
  "currentStreak": 1,
  "freezeDaysRemaining": 0
}

✅ Case 4: User later earns a freeze day (via freeze API)

User calls /streak/freeze with { "userId": "1134", "action": "add" }
Freeze Lambda increments freezeDaysRemaining but caps it at 2.

{
  "status": "success",
  "freezeDaysRemaining": 1
}

✅ Case 5: User tries to manually use a freeze

Calls /streak/freeze with { "userId": "1134", "action": "use" }
Lambda checks: freezeDaysRemaining > 0 → ✅ yes, decreases by 1.
If already 0, returns error:

{ "error": "No freeze days remaining" }

🔥 How This Works Together

1. Streak Lambda

Auto-consumes freeze only when needed (user missed a day).
Never lets streak reset unnecessarily if freeze is available.

2. Freeze Lambda

Adds freeze days when rewarded.
Allows manual usage (optional) if needed.

2. Consecutive Wins Streak (Gaming Leaderboards)

Goal: Track players’ winning streaks and reward top performers.
Implementation Steps:

1. DynamoDB Table

UserStreak
PK: userId
Sort Key: streakType
Attributes: currentWinStreak, maxWinStreak, lastWinDate

2. Lambda Function

After a game ends, check if the player won
Increment streak if last game was a win
Reset if lost

import { GetItemCommand, UpdateItemCommand } from "@aws-sdk/client-dynamodb";
import { marshall, unmarshall } from "@aws-sdk/util-dynamodb";
import { ddbClient } from "./client.js";

const TABLE_NAME = process.env.STREAK_TABLE_NAME;

export const handler = async (event) => {
  try {
    const { userId, won } = JSON.parse(event.body);

    if (!userId || won === undefined) {
      return formatResponse(400, { error: "userId and won (true/false) are required" });
    }

    const today = new Date().toISOString().split("T")[0];
    const yesterday = new Date();
    yesterday.setDate(yesterday.getDate() - 1);
    const yesterdayStr = yesterday.toISOString().split("T")[0];

    // Get current game streak data
    const { currentWinStreak, maxWinStreak, lastWinDate } = await getGameStreak(userId);

    let newWinStreak = currentWinStreak;
    let newMaxWinStreak = maxWinStreak;

    if (won) {
      // If last game was yesterday, continue streak, else reset to 1
      newWinStreak = lastWinDate === yesterdayStr ? currentWinStreak + 1 : 1;

      // Update max streak
      if (newWinStreak > maxWinStreak) {
        newMaxWinStreak = newWinStreak;
      }

      // Update DynamoDB
      await updateGameStreak(userId, today, newWinStreak, newMaxWinStreak);
    } else {
      // Player lost → reset current streak
      newWinStreak = 0;
      await updateGameStreak(userId, today, newWinStreak, maxWinStreak);
    }

    return formatResponse(200, {
      message: won ? "Game won streak updated" : "Game lost, streak reset",
      currentWinStreak: newWinStreak,
      maxWinStreak: newMaxWinStreak
    });

  } catch (err) {
    console.error("Error updating game streak:", err);
    return formatResponse(500, { error: err.message });
  }
};

// 🔹 Get current streak from DynamoDB
async function getGameStreak(userId) {
  const { Item } = await ddbClient.send(new GetItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ userId, streakType: "game" }),
    ProjectionExpression: "currentWinStreak, maxWinStreak, lastWinDate"
  }));

  if (!Item) {
    return { currentWinStreak: 0, maxWinStreak: 0, lastWinDate: null };
  }

  const data = unmarshall(Item);
  return {
    currentWinStreak: data.currentWinStreak || 0,
    maxWinStreak: data.maxWinStreak || 0,
    lastWinDate: data.lastWinDate || null
  };
}

// 🔹 Update streak in DynamoDB
async function updateGameStreak(userId, today, currentWinStreak, maxWinStreak) {
  await ddbClient.send(new UpdateItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ userId, streakType: "game" }),
    UpdateExpression: "SET currentWinStreak = :cws, maxWinStreak = :mws, lastWinDate = :ld",
    ExpressionAttributeValues: marshall({
      ":cws": currentWinStreak,
      ":mws": maxWinStreak,
      ":ld": today
    }),
    ReturnValues: "UPDATED_NEW"
  }));
}

// 🔹 Helper response formatter
function formatResponse(statusCode, body) {
  return {
    statusCode,
    headers: { "Access-Control-Allow-Origin": "*" },
    body: JSON.stringify(body)
  };
}

✅ Example Flow
🟢 Case 1: User wins consecutive games

lastWinDate: 2025–07–30
today: 2025–07–31
Result: currentWinStreak = 3, maxWinStreak = 3

🔴 Case 2: User loses

won: false
Result: currentWinStreak = 0, maxWinStreak stays as it was.

Conclusion

Streaks are a powerful engagement tool, and AWS makes implementation easy:
✅ Serverless & Scalable (Lambda + DynamoDB)
✅ Real-Time Updates (API Gateway)
✅ Reward Integration (Lambda + DynamoDB)
✅ Cost-Effective (Pay-per-use pricing)

Next Steps:

Start with a basic daily login streak
Expand to game win streaks and habit tracking
Add rewards & leaderboards for higher engagement

Advance on Streaks

1. Milestone Offers (Risk/Reward)

When users hit milestones (e.g., 7 days), give them a choice:

Option A: Continue safely (streak grows normally)
Option B: Gamble (“Break your streak now for 3x rewards!”)

2. Smart Streak Logic

Tracks timezone-aware daily activity
Handles edge cases (midnight checks, server delays)

3. Leaderboard Logic

Add reward for higher in the leaderboard

For CDK Implementation — My Reposiotry