π― Who this is for: Engineers preparing for AI/ML roles involving agent systems, LLM orchestration, or production AI pipelines. Whether you're interviewing at a startup or a FAANG, these are the questions being asked in 2026.
π Table of Contents
- Section 1 β Fundamentals & Core Concepts (Q1βQ3)
- Section 2 β Protocols & Architecture (MCP & A2A) (Q4βQ9)
- Section 3 β Memory & Context Management (Q10βQ12)
- Section 4 β RAG vs. Agents vs. Agentic RAG (Q13βQ15)
- Section 5 β Multi-Agent Systems & Conflict Resolution (Q16βQ18)
- Section 6 β Frameworks: LangGraph & CrewAI (Q19βQ23)
- Section 7 β Tool Calling & Error Handling (Q24βQ26)
- Quick-Reference Cheat Sheet
Section 1 β Fundamentals & Core Concepts
Q1: What is an AI Agent and how is it different from a regular Chatbot?
Definition: An AI Agent is an intelligent system that can Perceive, Reason, and Take Action autonomously β going far beyond text generation.
| Chatbot | AI Agent |
|---|---|
| Generates text responses only | Plans, uses tools, and executes actions |
| Stateless β each reply is isolated | Stateful β tracks goals across multiple steps |
Flow: User Query β Response
|
Flow: User Query β Plan β Tool Use β Execution β Response
|
| Cannot call external APIs | Integrates with calendars, APIs, databases |
Real-world example:
Task: "Book me the cheapest flight to Berlin next Friday"
π€ Chatbot: "You can check Google Flights or MakeMyTrip."
π¦Ύ AI Agent:
1. Checks your Google Calendar for conflicts
2. Searches Skyscanner, Kayak, and Google Flights
3. Compares prices across airlines
4. Books the cheapest option
5. Sends a confirmation email
π‘ Interview Tip: Lead with the Perceive β Reason β Act framework, then give a concrete before/after scenario. Interviewers want to see you understand the behavioral difference, not just the definition.
Q2: What is ReAct (Reasoning + Acting)?
Definition: A prompting framework where the agent cycles through Thought β Action β Observation until the task is complete.
| Step | What Happens | Example |
|---|---|---|
| 1. Thought | Agent reasons about what to do next | "I need real-time weather data for Tokyo" |
| 2. Action | Calls a tool or API | weather_api(location="Tokyo") |
| 3. Observation | Receives and processes the result | 28Β°C, humidity 75%, no rain |
| 4. Repeat / Answer | Loops or delivers final response | "It's warm and humid β no umbrella needed" |
# Simplified ReAct pseudocode
while not final_answer:
thought = llm.think(context)
action = llm.decide_action(thought)
observation = tools.execute(action)
context.append(thought, action, observation)
π‘ Interview Tip: Walk through a concrete ReAct loop out loud. Pick a real task (weather, database query, stock lookup) and narrate each Thought/Action/Observation step. This shows you understand the loop, not just the acronym.
Q3: Reactive vs. Proactive Agents β What's the Difference?
| Reactive Agent | Proactive Agent |
|---|---|
| Waits for a user request to act | Acts autonomously based on goals or triggers |
| Example: Customer support bot that only replies when messaged | Example: Cloud monitor that detects 95% CPU and auto-scales β nobody asked |
| Simple and predictable | More powerful; prevents problems before they occur |
π‘ Interview Tip: Always mention that production agents are Hybrid β reactive to user input but proactively monitoring their environment. This signals real-world architectural maturity.
Section 2 β Protocols & Architecture (MCP & A2A)
Q4: What is MCP (Model Context Protocol) and why does it matter?
Definition: An open standard created by Anthropic β often called the "USB-C for AI." It gives AI models a single, universal way to connect to tools and data sources.
Without MCP:
Agent ββcustom codeββ> Slack
Agent ββcustom codeββ> GitHub
Agent ββcustom codeββ> Google Drive
Agent ββcustom codeββ> Postgres
With MCP:
Agent ββMCPββ> [Slack | GitHub | Google Drive | Postgres | ...]
(one protocol, infinite tools)
Key benefits:
- β Build Once, Connect Everywhere β one MCP server works with any MCP-compatible host
- β No vendor lock-in β swap the underlying LLM without rewriting integrations
- β Security by declaration β servers expose only what they explicitly declare
Q5: Explain the MCP Architecture
User
β
βΌ
Host Application (Claude Desktop / VS Code / Cursor)
β
βΌ
MCP Client βββββ manages connections, sends requests
β
βΌ
MCP Server βββββ exposes Tools, Resources, Prompts
β
βΌ
External Tool (GitHub API / Postgres / Slack)
| Layer | Role |
|---|---|
| Host | The app the user interacts with (e.g., Claude Desktop, VS Code) |
| Client | Lives inside the Host; manages connections to one or more servers |
| Server | Exposes capabilities to the client; can be local or remote |
Q6: What are the Three Core MCP Primitives?
| Primitive | Description | Example | Controlled By |
|---|---|---|---|
| Tools | Actions the model can trigger |
send_email, run_sql_query
|
The Model |
| Resources | Data the app can read | DB tables, PDFs, schemas | The Application |
| Prompts | Reusable instruction templates | "Summarize this report" | The User |
Q7: What is the Agent-to-Agent (A2A) Protocol?
Definition: An open protocol by Google enabling agents to communicate, collaborate, delegate, and share work with each other.
MCP (Vertical) A2A (Horizontal)
βββββββββββββ ββββββββββββββββ
Agent Agent A ββββββββΊ Agent B
β β β
βΌ βΌ βΌ
Tool Worker Worker
| MCP | A2A | |
|---|---|---|
| Direction | Vertical (Agent β Tool) | Horizontal (Agent β Agent) |
| Purpose | Connect AI to tools/data | Connect multiple AI agents |
| Led by | Anthropic | |
| Analogy | Tool belt | Org chart |
π‘ Interview Tip: Both MCP and A2A are needed in complex production systems. MCP gives the agent its tools; A2A lets agents delegate to each other. Frame them as complementary, not competing.
Q8: What is an Agent Card?
Think of it as a LinkedIn profile for an AI Agent β or more technically, an OpenAPI spec for agent capabilities.
{
"name": "FlightBookingAgent",
"description": "Books flights, hotels, and car rentals",
"skills": ["search_flights", "compare_prices", "book_ticket"],
"endpoint": "https://agents.example.com/flight",
"auth": { "type": "bearer" }
}
Purpose: Allows other agents to discover capabilities and understand how to delegate tasks before collaborating β enabling true autonomous agent discovery.
Q9: What is a Task in A2A and what are its lifecycle states?
A Task is the fundamental unit of work exchanged between agents.
Submitted βββΊ Working βββΊ Completed
β
ββββΊ Input Required βββΊ Working (resumed)
β
ββββΊ Failed
ββββΊ Canceled
| State | Meaning |
|---|---|
| Submitted | Task created and received |
| Working | Agent actively processing |
| Input Required | Needs clarification (e.g., "Window or aisle seat?") |
| Completed | Finished successfully |
| Failed | Unrecoverable error |
| Canceled | Stopped by user or orchestrator |
Section 3 β Memory & Context Management
Q10: What are the Different Types of Memory in AI Agents?
| Memory Type | Analogy | Description | Example |
|---|---|---|---|
| Short-term | RAM | In-context history; lost at session end | Follows the current thread |
| Long-term | Hard Disk | Stored in Vector DBs; persists across sessions | "Welcome back, Aman!" |
| Episodic | Diary | Records of specific past interactions | "Last week you asked about RAG" |
| Semantic | Textbook | General world/domain knowledge | "Python is a programming language" |
π‘ Interview Tip: The RAM / Hard Disk analogy lands every time. Use it to make the distinction instantly clear, then layer in Vector DBs as the implementation detail.
Q11: How Do You Implement Long-Term Memory in an AI Chain?
# 5-step long-term memory pattern
# Step 1: User has a conversation
user_input = "Tell me about LangGraph state management"
# Step 2: Embed the conversation
embedding = openai.embeddings.create(
input=user_input,
model="text-embedding-3-small"
)
# Step 3: Store in Vector DB
vector_db.upsert(
id=session_id,
vector=embedding,
metadata={"text": user_input, "timestamp": now()}
)
# Step 4: On next session, retrieve relevant context
results = vector_db.query(
vector=new_embedding,
top_k=5 # cosine similarity search
)
# Step 5: Inject into prompt
prompt = f"Previous context: {results}\n\nUser: {new_query}"
Key tools: Chroma (local dev), Pinecone (production), FAISS (self-hosted), Weaviate (hybrid search)
π‘ Interview Tip: Name specific tools and mention cosine similarity search for retrieval. This signals hands-on experience vs. theoretical knowledge.
Q12: What is Memory Overflow and How Do You Solve It?
Problem: When conversation history exceeds the model's context window (e.g., 128k tokens), older context gets truncated β silently losing important state.
| Strategy | How It Works | Best For |
|---|---|---|
| Summarization | Compress older messages into a running summary | Long conversations with recurring themes |
| Relevance Filtering | Retrieve only memory similar to the current query | Domain-specific agents |
| Sliding Window | Keep only the last N turns in context | Chatbots with short-lived context |
| Tiered Memory | Hot β Warm (summarized) β Cold (archived) | Enterprise agents with long histories |
Tiered Memory Architecture:
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ
β HOT MEMORY β β WARM MEMORY β β COLD MEMORY β
β (Last 20 βββββΊβ (Summarized, βββββΊβ (Archived, β
β messages) β β last 7 days) β β vector-indexed) β
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ
Fast Medium Slow but vast
Section 4 β RAG vs. Agents vs. Agentic RAG
Q13: What is RAG and How is it Different from an AI Agent?
RAG flow (linear, read-only):
User Query βββΊ Retrieve Documents βββΊ Generate Answer
Agent flow (iterative, read-write):
User Query βββΊ Plan βββΊ Select Tool βββΊ Execute βββΊ Observe βββΊ Final Response
β²__________________________| (loop until done)
| RAG | AI Agent | |
|---|---|---|
| Pattern | Linear, single-pass | Iterative loop |
| Capability | Retrieves and reads | Plans and acts |
| State | Stateless | Stateful |
| Best for | Static Q&A on documents | Multi-step tasks requiring action |
Q14: RAG vs. Agent vs. Agentic RAG β When to Use What?
| Approach | Use When | Example Task |
|---|---|---|
| RAG Only | Pure Q&A on static documents | "What is our refund policy?" |
| Agent Only | Task requires action, no docs needed | "Book a flight", "Send an email" |
| Agentic RAG | Need to search docs AND take action | "Check refund policy, then process the refund in the DB" |
Q15: What is Agentic RAG?
Basic RAG: Fixed single-pass retrieval. Retrieve top-K chunks. Answer.
Agentic RAG: The agent controls the retrieval strategy dynamically.
ββββββββββββββββββββββββββββββββββββ
β AGENTIC RAG LOOP β
β β
User Query βββββββββΊβ Route query to correct DB β
β β β
β Retrieve relevant chunks β
β β β
β Evaluate quality β
β β β
β Poor? βββΊ Refine & retry β
β β β
β Good? βββΊ Multi-hop if needed β
β β β
β Final answer β
ββββββββββββββββββββββββββββββββββββ
Multi-hop example β "Process a refund for order #8821":
- Find order
#8821in the Orders DB - Retrieve the refund policy from Policy Docs
- Cross-reference policy with order details
- Call the Payments API to initiate the refund
π‘ Interview Tip: Mentioning routing, quality evaluation, and multi-hop reasoning immediately separates your answer from candidates who only know basic RAG.
Section 5 β Multi-Agent Systems & Conflict Resolution
Q16: What are Multi-Agent Systems and Why are They Useful?
Definition: Multiple specialized agents collaborating on tasks too large or complex for a single agent.
Single Agent π Multi-Agent System π
ββββββββββββββ βββββββββββββββββββββ
One agent handles βββββββββββββββββββββββ
everything β Manager Agent β
ββββββββββββ¬βββββββββββ
Jack-of-all-trades β delegates
= master of none ββββββββββββΌβββββββββββ
βΌ βΌ βΌ
Researcher Writer Editor
(expert) (expert) (expert)
Benefits: Specialization β Parallelism β Scalability β Fault Tolerance
Q17: Communication Patterns in Multi-Agent Systems
Sequential / Pipeline Hierarchical
ββββββββββββββββββ ββββββββββββ
A βββΊ B βββΊ C Manager
β βββ Worker A
β βββ Worker B
βββ Worker C
Peer-to-Peer (A2A) Broadcast
ββββββββββββββββββ βββββββββ
A ββββΊ B ββββΊ C A βββΊ B
ββββΊ C
ββββΊ D
| Pattern | Best For |
|---|---|
| Sequential | Simple, ordered pipelines (Researcher β Writer β Editor) |
| Hierarchical | Complex branching workflows with auditability requirements |
| Peer-to-Peer | Dynamic delegation using A2A (agents discover each other) |
| Broadcast | Real-time data fan-out (market data β Trading + Risk + Reporting) |
Q18: How Do You Handle Conflicts When Agents Disagree?
| Strategy | How It Works | Best For |
|---|---|---|
| Voting / Majority | Majority opinion wins across N agents | Classification, labelling |
| Supervisor Agent | Master agent has final authority | High-stakes decisions |
| Debate & Judge | Agents argue positions; Judge agent picks winner | Open-ended reasoning |
| Confidence Scores | Highest-confidence agent is selected | Model ensembles |
| Human-in-the-Loop | Escalate to a human for the final call | Regulated/irreversible actions |
Section 6 β Frameworks: LangGraph & CrewAI
Q19: What is LangGraph?
Definition: A Python library for building stateful, graph-based AI agents β an extension of LangChain designed for production-grade complexity.
| LangChain (Chains) | LangGraph |
|---|---|
| Linear execution only | Loops, branching, parallel nodes |
| No native state management | Shared State object across all nodes |
| No HITL built-in | Native checkpoint + pause/resume |
| Good for simple pipelines | Good for complex production workflows |
Q20: Nodes, Edges, and State in LangGraph
from langgraph.graph import StateGraph
from typing import TypedDict
# State: shared memory flowing through the graph
class AgentState(TypedDict):
query: str
retrieved_docs: list
llm_response: str
needs_retry: bool
# Nodes: functions that modify State
def retrieval_node(state: AgentState) -> AgentState:
docs = vector_db.search(state["query"])
return {"retrieved_docs": docs}
def llm_node(state: AgentState) -> AgentState:
response = llm.invoke(state["query"], context=state["retrieved_docs"])
return {"llm_response": response}
def evaluator_node(state: AgentState) -> AgentState:
quality = evaluate(state["llm_response"])
return {"needs_retry": quality < 0.7}
# Edges: define execution flow (including conditional loops)
graph = StateGraph(AgentState)
graph.add_node("retrieve", retrieval_node)
graph.add_node("generate", llm_node)
graph.add_node("evaluate", evaluator_node)
graph.add_edge("retrieve", "generate")
graph.add_edge("generate", "evaluate")
graph.add_conditional_edges("evaluate",
lambda s: "retrieve" if s["needs_retry"] else "END"
)
Q21: What is Human-in-the-Loop (HITL) in LangGraph?
Definition: The ability to pause graph execution at a designated node and wait for human approval before continuing.
from langgraph.checkpoint.sqlite import SqliteSaver
# Save state to checkpoint store before pausing
checkpointer = SqliteSaver.from_conn_string("agent_state.db")
graph = workflow.compile(
checkpointer=checkpointer,
interrupt_before=["send_email_node"] # pause here for human approval
)
# Agent runs, then pauses before sending
result = graph.invoke(state, config={"thread_id": "task_001"})
# Human reviews and approves...
human_approval = get_human_input()
# Graph resumes from exact checkpoint
if human_approval:
graph.invoke(None, config={"thread_id": "task_001"})
Use for: Sending emails, processing refunds, financial transactions, deploying code β any irreversible or regulated action.
π‘ Interview Tip: HITL is a top interview signal. Frame it as a safety + compliance feature: "For any action that is irreversible or involves money/data, we insert a human approval checkpoint before execution."
Q22: What is CrewAI?
Definition: A Python framework for orchestrating role-based teams of AI agents. You declare agent identities in plain language β CrewAI handles delegation, collaboration, and retry logic.
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Senior Market Research Analyst",
goal="Find the top 5 AI trends for Q3 2026",
backstory="Expert in tech markets with 10 years experience",
tools=[web_search_tool, pdf_reader_tool]
)
writer = Agent(
role="Technical Content Writer",
goal="Transform research into a compelling blog post",
backstory="Specializes in making complex AI topics accessible",
tools=[text_editor_tool]
)
research_task = Task(
description="Research the top AI agent trends of Q3 2026",
agent=researcher
)
write_task = Task(
description="Write a 1500-word post based on the research",
agent=writer
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
process=Process.hierarchical # Manager LLM coordinates
)
result = crew.kickoff()
Q23: Process Types in a Crew
| Process | How It Works | Best For |
|---|---|---|
| Sequential | Tasks run one after another in fixed order | Simple, linear pipelines |
| Hierarchical β | Manager LLM assigns and reviews tasks dynamically | Complex production systems |
| Consensual | Agents collaborate as peers to reach agreement | Research synthesis, balanced analysis |
β Hierarchical is the production default β it gives you auditability and dynamic task assignment.
Section 7 β Tool Calling & Error Handling
Q24: What is Tool Calling and How Does It Work?
β οΈ Critical clarification: The LLM never executes code. It decides which tool to call and outputs a structured JSON request. The host application runs the actual code.
Step 1: User βββββββββββββββββββββββββββββββββββΊ LLM
"What's the AAPL stock price?" (receives query + tool schema)
Step 2: LLM ββββββββββββββββββββββββββββββββββββΊ Application
{ "tool": "get_stock_price", (LLM decides, outputs JSON)
"args": { "ticker": "AAPL" } }
Step 3: Application ββββββββββββββββββββββββββββΊ External API
runs get_stock_price(ticker="AAPL") (application executes)
Step 4: External API βββββββββββββββββββββββββββΊ Application βββΊ LLM
{ "price": 211.34, "change": "+1.2%" } (result returned as observation)
Step 5: LLM ββββββββββββββββββββββββββββββββββββΊ User
"AAPL is currently trading at $211.34, (final answer)
up 1.2% today."
Q25: Handling Errors and Hallucinated Tool Calls
Problem: LLM calls a tool that doesn't exist, passes wrong argument types, or generates malformed JSON.
def safe_tool_call(tool_name: str, args: dict, max_retries: int = 3):
# Layer 1: Tool name validation
if tool_name not in REGISTERED_TOOLS:
return {"error": f"Unknown tool: {tool_name}. Available: {list(REGISTERED_TOOLS.keys())}"}
# Layer 2: Schema validation (Pydantic)
tool_schema = REGISTERED_TOOLS[tool_name].schema
try:
validated_args = tool_schema(**args)
except ValidationError as e:
return {"error": f"Invalid arguments: {e}"}
# Layer 3: Try/except with retry
for attempt in range(max_retries):
try:
result = REGISTERED_TOOLS[tool_name].execute(validated_args)
return result
except Exception as e:
if attempt == max_retries - 1:
# Layer 4: Graceful failure after max retries
return {"error": f"Tool failed after {max_retries} attempts: {str(e)}"}
# Feed error back to LLM for self-correction
context.append({"role": "tool", "content": f"Attempt {attempt+1} failed: {e}"})
| Defense Layer | Implementation |
|---|---|
| Name Validation | Check tool name against registered tool list before execution |
| Schema Validation | Use Pydantic models or JSON Schema to verify argument types |
| Try / Except | Wrap every call; return structured errors back to LLM |
| Retry with Correction | Pass error as observation so LLM can self-correct |
| Max Retry Cap | Limit to 3 attempts; escalate or fail gracefully |
π‘ Interview Tip: Mentioning Pydantic for schema validation and a max-retry cap (to prevent infinite loops) shows production awareness. Naive agents that retry forever are a real production problem.
Q26: Parallel Tool Calling β What is it and When Should You Use it?
Definition: Requesting multiple tool calls in a single LLM response and executing them simultaneously.
# Sequential (slow): 3 calls Γ ~3 seconds each = ~9 seconds total
weather = get_weather("Tokyo") # 3s
stock = get_stock_price("AAPL") # 3s
news = get_top_news("AI") # 3s
# Parallel (fast): all run at once = ~3 seconds total
import asyncio
async def parallel_tools():
weather, stock, news = await asyncio.gather(
get_weather_async("Tokyo"),
get_stock_price_async("AAPL"),
get_top_news_async("AI")
)
return weather, stock, news
| Sequential | Parallel | |
|---|---|---|
| Time | Sum of all latencies | Slowest single tool |
| Use when | Tool B depends on Tool A's output | Tools are independent of each other |
| Example | Get User ID β Get Orders for that ID | Get Weather + Stock + News |
ποΈ Quick-Reference Cheat Sheet
| Topic | Key Takeaway |
|---|---|
| Agent Core Loop | Perceive β Reason β Plan β Act β Observe (ReAct framework) |
| MCP vs A2A | MCP = Agent β Tool (vertical). A2A = Agent β Agent (horizontal) |
| Memory Types | Short-term (RAM) β Long-term (Vector DB) β Episodic β Semantic |
| RAG vs Agent | RAG retrieves & reads. Agents retrieve & act. Agentic RAG does both. |
| LangGraph vs CrewAI | LangGraph = stateful graph workflows. CrewAI = role-based agent teams. |
| Tool Calling | LLM decides; Application executes. LLM never runs code directly. |
| Parallel Tools | Use when tools are independent. Sequential when there's a dependency. |
| Conflict Resolution | Voting β Supervisor β Debate β Confidence β Human-in-the-Loop |
| HITL | Pause + checkpoint for irreversible actions. Safety & compliance essential. |
| Error Handling | Validate name β validate schema β try/except β retry (max 3) β escalate |
π― Top 5 Interview Tips
1. Use concrete examples.
For every concept, give a before/after real-world scenario (e.g., Chatbot vs. Agent booking a flight). Abstract definitions without examples are forgettable.
2. Name your tools.
Cite Pydantic, Chroma, Pinecone, LangGraph, CrewAI by name β it signals hands-on experience, not just theory.
3. Mention production concerns unprompted.
Bring up retry limits, Human-in-the-Loop, and fault tolerance before being asked. It shows you think about systems in production, not just proofs-of-concept.
4. Structure every answer the same way.
Definition β Key Distinction β Code/Example β When to use β this format is clear, complete, and easy to follow under pressure.
5. Connect MCP and A2A together.
Explicitly link them: "MCP handles tool integration; A2A handles agent collaboration β you need both in a full multi-agent system." This shows system-level thinking.
Resources to Go Deeper
- Anthropic MCP Documentation
- Google A2A Protocol (GitHub)
- LangGraph Documentation
- CrewAI Documentation
Found this useful? Drop a β€οΈ and share it with someone preparing for their next AI engineering interview. And if there's a question I missed β drop it in the comments below.
Top comments (0)