Avinash Hedaoo

Posted on Jun 29

The AI Agent Interview Master Guide: 26 Questions You Must Know in 2026

#ai #python #machinelearning #career

🎯 Who this is for: Engineers preparing for AI/ML roles involving agent systems, LLM orchestration, or production AI pipelines. Whether you're interviewing at a startup or a FAANG, these are the questions being asked in 2026.

📋 Table of Contents

Section 1 — Fundamentals & Core Concepts (Q1–Q3)
Section 2 — Protocols & Architecture (MCP & A2A) (Q4–Q9)
Section 3 — Memory & Context Management (Q10–Q12)
Section 4 — RAG vs. Agents vs. Agentic RAG (Q13–Q15)
Section 5 — Multi-Agent Systems & Conflict Resolution (Q16–Q18)
Section 6 — Frameworks: LangGraph & CrewAI (Q19–Q23)
Section 7 — Tool Calling & Error Handling (Q24–Q26)
Quick-Reference Cheat Sheet

Section 1 — Fundamentals & Core Concepts

Q1: What is an AI Agent and how is it different from a regular Chatbot?

Definition: An AI Agent is an intelligent system that can Perceive, Reason, and Take Action autonomously — going far beyond text generation.

Chatbot	AI Agent
Generates text responses only	Plans, uses tools, and executes actions
Stateless — each reply is isolated	Stateful — tracks goals across multiple steps
Flow: `User Query → Response`	Flow: `User Query → Plan → Tool Use → Execution → Response`
Cannot call external APIs	Integrates with calendars, APIs, databases

Real-world example:

Task: "Book me the cheapest flight to Berlin next Friday"

🤖 Chatbot: "You can check Google Flights or MakeMyTrip."

🦾 AI Agent:
  1. Checks your Google Calendar for conflicts
  2. Searches Skyscanner, Kayak, and Google Flights
  3. Compares prices across airlines
  4. Books the cheapest option
  5. Sends a confirmation email

💡 Interview Tip: Lead with the Perceive → Reason → Act framework, then give a concrete before/after scenario. Interviewers want to see you understand the behavioral difference, not just the definition.

Q2: What is ReAct (Reasoning + Acting)?

Definition: A prompting framework where the agent cycles through Thought → Action → Observation until the task is complete.

Step	What Happens	Example
1. Thought	Agent reasons about what to do next	"I need real-time weather data for Tokyo"
2. Action	Calls a tool or API	`weather_api(location="Tokyo")`
3. Observation	Receives and processes the result	`28°C, humidity 75%, no rain`
4. Repeat / Answer	Loops or delivers final response	"It's warm and humid — no umbrella needed"

# Simplified ReAct pseudocode
while not final_answer:
    thought = llm.think(context)
    action = llm.decide_action(thought)
    observation = tools.execute(action)
    context.append(thought, action, observation)

💡 Interview Tip: Walk through a concrete ReAct loop out loud. Pick a real task (weather, database query, stock lookup) and narrate each Thought/Action/Observation step. This shows you understand the loop, not just the acronym.

Q3: Reactive vs. Proactive Agents — What's the Difference?

Reactive Agent	Proactive Agent
Waits for a user request to act	Acts autonomously based on goals or triggers
Example: Customer support bot that only replies when messaged	Example: Cloud monitor that detects 95% CPU and auto-scales — nobody asked
Simple and predictable	More powerful; prevents problems before they occur

💡 Interview Tip: Always mention that production agents are Hybrid — reactive to user input but proactively monitoring their environment. This signals real-world architectural maturity.

Section 2 — Protocols & Architecture (MCP & A2A)

Q4: What is MCP (Model Context Protocol) and why does it matter?

Definition: An open standard created by Anthropic — often called the "USB-C for AI." It gives AI models a single, universal way to connect to tools and data sources.

Without MCP:

Agent ──custom code──> Slack
Agent ──custom code──> GitHub  
Agent ──custom code──> Google Drive
Agent ──custom code──> Postgres

With MCP:

Agent ──MCP──> [Slack | GitHub | Google Drive | Postgres | ...]
               (one protocol, infinite tools)

Key benefits:

✅ Build Once, Connect Everywhere — one MCP server works with any MCP-compatible host
✅ No vendor lock-in — swap the underlying LLM without rewriting integrations
✅ Security by declaration — servers expose only what they explicitly declare

Q5: Explain the MCP Architecture

User
 │
 ▼
Host Application (Claude Desktop / VS Code / Cursor)
 │
 ▼
MCP Client  ◄──── manages connections, sends requests
 │
 ▼
MCP Server  ◄──── exposes Tools, Resources, Prompts
 │
 ▼
External Tool (GitHub API / Postgres / Slack)

Layer	Role
Host	The app the user interacts with (e.g., Claude Desktop, VS Code)
Client	Lives inside the Host; manages connections to one or more servers
Server	Exposes capabilities to the client; can be local or remote

Q6: What are the Three Core MCP Primitives?

Primitive	Description	Example	Controlled By
Tools	Actions the model can trigger	`send_email`, `run_sql_query`	The Model
Resources	Data the app can read	DB tables, PDFs, schemas	The Application
Prompts	Reusable instruction templates	"Summarize this report"	The User

Q7: What is the Agent-to-Agent (A2A) Protocol?

Definition: An open protocol by Google enabling agents to communicate, collaborate, delegate, and share work with each other.

MCP (Vertical)          A2A (Horizontal)
─────────────           ────────────────
Agent                   Agent A ◄──────► Agent B
  │                        │                │
  ▼                         ▼                ▼
Tool                    Worker           Worker

	MCP	A2A
Direction	Vertical (Agent ↔ Tool)	Horizontal (Agent ↔ Agent)
Purpose	Connect AI to tools/data	Connect multiple AI agents
Led by	Anthropic	Google
Analogy	Tool belt	Org chart

💡 Interview Tip: Both MCP and A2A are needed in complex production systems. MCP gives the agent its tools; A2A lets agents delegate to each other. Frame them as complementary, not competing.

Q8: What is an Agent Card?

Think of it as a LinkedIn profile for an AI Agent — or more technically, an OpenAPI spec for agent capabilities.

{
  "name": "FlightBookingAgent",
  "description": "Books flights, hotels, and car rentals",
  "skills": ["search_flights", "compare_prices", "book_ticket"],
  "endpoint": "https://agents.example.com/flight",
  "auth": { "type": "bearer" }
}

Purpose: Allows other agents to discover capabilities and understand how to delegate tasks before collaborating — enabling true autonomous agent discovery.

Q9: What is a Task in A2A and what are its lifecycle states?

A Task is the fundamental unit of work exchanged between agents.

Submitted ──► Working ──► Completed
                │
                ├──► Input Required ──► Working (resumed)
                │
                ├──► Failed
                └──► Canceled

State	Meaning
Submitted	Task created and received
Working	Agent actively processing
Input Required	Needs clarification (e.g., "Window or aisle seat?")
Completed	Finished successfully
Failed	Unrecoverable error
Canceled	Stopped by user or orchestrator

Section 3 — Memory & Context Management

Q10: What are the Different Types of Memory in AI Agents?

Memory Type	Analogy	Description	Example
Short-term	RAM	In-context history; lost at session end	Follows the current thread
Long-term	Hard Disk	Stored in Vector DBs; persists across sessions	"Welcome back, Aman!"
Episodic	Diary	Records of specific past interactions	"Last week you asked about RAG"
Semantic	Textbook	General world/domain knowledge	"Python is a programming language"

💡 Interview Tip: The RAM / Hard Disk analogy lands every time. Use it to make the distinction instantly clear, then layer in Vector DBs as the implementation detail.

Q11: How Do You Implement Long-Term Memory in an AI Chain?

# 5-step long-term memory pattern

# Step 1: User has a conversation
user_input = "Tell me about LangGraph state management"

# Step 2: Embed the conversation
embedding = openai.embeddings.create(
    input=user_input,
    model="text-embedding-3-small"
)

# Step 3: Store in Vector DB
vector_db.upsert(
    id=session_id,
    vector=embedding,
    metadata={"text": user_input, "timestamp": now()}
)

# Step 4: On next session, retrieve relevant context
results = vector_db.query(
    vector=new_embedding,
    top_k=5  # cosine similarity search
)

# Step 5: Inject into prompt
prompt = f"Previous context: {results}\n\nUser: {new_query}"

Key tools: Chroma (local dev), Pinecone (production), FAISS (self-hosted), Weaviate (hybrid search)

💡 Interview Tip: Name specific tools and mention cosine similarity search for retrieval. This signals hands-on experience vs. theoretical knowledge.

Q12: What is Memory Overflow and How Do You Solve It?

Problem: When conversation history exceeds the model's context window (e.g., 128k tokens), older context gets truncated — silently losing important state.

Strategy	How It Works	Best For
Summarization	Compress older messages into a running summary	Long conversations with recurring themes
Relevance Filtering	Retrieve only memory similar to the current query	Domain-specific agents
Sliding Window	Keep only the last N turns in context	Chatbots with short-lived context
Tiered Memory	Hot → Warm (summarized) → Cold (archived)	Enterprise agents with long histories

Tiered Memory Architecture:
┌─────────────┐    ┌──────────────────┐    ┌──────────────────────┐
│  HOT MEMORY │    │   WARM MEMORY    │    │    COLD MEMORY       │
│  (Last 20   │───►│  (Summarized,    │───►│  (Archived,          │
│   messages) │    │   last 7 days)   │    │   vector-indexed)    │
└─────────────┘    └──────────────────┘    └──────────────────────┘
     Fast                Medium                    Slow but vast

Section 4 — RAG vs. Agents vs. Agentic RAG

Q13: What is RAG and How is it Different from an AI Agent?

RAG flow (linear, read-only):

User Query ──► Retrieve Documents ──► Generate Answer

Agent flow (iterative, read-write):

User Query ──► Plan ──► Select Tool ──► Execute ──► Observe ──► Final Response
                 ▲__________________________|  (loop until done)

	RAG	AI Agent
Pattern	Linear, single-pass	Iterative loop
Capability	Retrieves and reads	Plans and acts
State	Stateless	Stateful
Best for	Static Q&A on documents	Multi-step tasks requiring action

Q14: RAG vs. Agent vs. Agentic RAG — When to Use What?

Approach	Use When	Example Task
RAG Only	Pure Q&A on static documents	"What is our refund policy?"
Agent Only	Task requires action, no docs needed	"Book a flight", "Send an email"
Agentic RAG	Need to search docs AND take action	"Check refund policy, then process the refund in the DB"

Q15: What is Agentic RAG?

Basic RAG: Fixed single-pass retrieval. Retrieve top-K chunks. Answer.

Agentic RAG: The agent controls the retrieval strategy dynamically.

                    ┌──────────────────────────────────┐
                    │         AGENTIC RAG LOOP          │
                    │                                    │
User Query ────────►│  Route query to correct DB        │
                    │       │                            │
                    │  Retrieve relevant chunks          │
                    │       │                            │
                    │  Evaluate quality                  │
                    │       │                            │
                    │  Poor? ──► Refine & retry          │
                    │       │                            │
                    │  Good? ──► Multi-hop if needed     │
                    │       │                            │
                    │  Final answer                     │
                    └──────────────────────────────────┘

Multi-hop example — "Process a refund for order #8821":

Find order #8821 in the Orders DB
Retrieve the refund policy from Policy Docs
Cross-reference policy with order details
Call the Payments API to initiate the refund

💡 Interview Tip: Mentioning routing, quality evaluation, and multi-hop reasoning immediately separates your answer from candidates who only know basic RAG.

Section 5 — Multi-Agent Systems & Conflict Resolution

Q16: What are Multi-Agent Systems and Why are They Useful?

Definition: Multiple specialized agents collaborating on tasks too large or complex for a single agent.

Single Agent 😓              Multi-Agent System 🚀
──────────────               ─────────────────────
One agent handles            ┌─────────────────────┐
  everything                 │   Manager Agent      │
                             └──────────┬──────────┘
Jack-of-all-trades                      │ delegates
  = master of none           ┌──────────┼──────────┐
                             ▼          ▼           ▼
                         Researcher  Writer      Editor
                         (expert)   (expert)   (expert)

Benefits: Specialization → Parallelism → Scalability → Fault Tolerance

Q17: Communication Patterns in Multi-Agent Systems

Sequential / Pipeline          Hierarchical
──────────────────             ────────────
A ──► B ──► C                  Manager
                                 │ ├── Worker A
                                 │ ├── Worker B
                                 └── Worker C

Peer-to-Peer (A2A)             Broadcast
──────────────────             ─────────
A ◄──► B ◄──► C                A ──► B
                                 ├──► C
                                 └──► D

Pattern	Best For
Sequential	Simple, ordered pipelines (Researcher → Writer → Editor)
Hierarchical	Complex branching workflows with auditability requirements
Peer-to-Peer	Dynamic delegation using A2A (agents discover each other)
Broadcast	Real-time data fan-out (market data → Trading + Risk + Reporting)

Q18: How Do You Handle Conflicts When Agents Disagree?

Strategy	How It Works	Best For
Voting / Majority	Majority opinion wins across N agents	Classification, labelling
Supervisor Agent	Master agent has final authority	High-stakes decisions
Debate & Judge	Agents argue positions; Judge agent picks winner	Open-ended reasoning
Confidence Scores	Highest-confidence agent is selected	Model ensembles
Human-in-the-Loop	Escalate to a human for the final call	Regulated/irreversible actions

Section 6 — Frameworks: LangGraph & CrewAI

Q19: What is LangGraph?

Definition: A Python library for building stateful, graph-based AI agents — an extension of LangChain designed for production-grade complexity.

LangChain (Chains)	LangGraph
Linear execution only	Loops, branching, parallel nodes
No native state management	Shared State object across all nodes
No HITL built-in	Native checkpoint + pause/resume
Good for simple pipelines	Good for complex production workflows

Q20: Nodes, Edges, and State in LangGraph

from langgraph.graph import StateGraph
from typing import TypedDict

# State: shared memory flowing through the graph
class AgentState(TypedDict):
    query: str
    retrieved_docs: list
    llm_response: str
    needs_retry: bool

# Nodes: functions that modify State
def retrieval_node(state: AgentState) -> AgentState:
    docs = vector_db.search(state["query"])
    return {"retrieved_docs": docs}

def llm_node(state: AgentState) -> AgentState:
    response = llm.invoke(state["query"], context=state["retrieved_docs"])
    return {"llm_response": response}

def evaluator_node(state: AgentState) -> AgentState:
    quality = evaluate(state["llm_response"])
    return {"needs_retry": quality < 0.7}

# Edges: define execution flow (including conditional loops)
graph = StateGraph(AgentState)
graph.add_node("retrieve", retrieval_node)
graph.add_node("generate", llm_node)
graph.add_node("evaluate", evaluator_node)

graph.add_edge("retrieve", "generate")
graph.add_edge("generate", "evaluate")
graph.add_conditional_edges("evaluate", 
    lambda s: "retrieve" if s["needs_retry"] else "END"
)

Q21: What is Human-in-the-Loop (HITL) in LangGraph?

Definition: The ability to pause graph execution at a designated node and wait for human approval before continuing.

from langgraph.checkpoint.sqlite import SqliteSaver

# Save state to checkpoint store before pausing
checkpointer = SqliteSaver.from_conn_string("agent_state.db")

graph = workflow.compile(
    checkpointer=checkpointer,
    interrupt_before=["send_email_node"]  # pause here for human approval
)

# Agent runs, then pauses before sending
result = graph.invoke(state, config={"thread_id": "task_001"})

# Human reviews and approves...
human_approval = get_human_input()

# Graph resumes from exact checkpoint
if human_approval:
    graph.invoke(None, config={"thread_id": "task_001"})

Use for: Sending emails, processing refunds, financial transactions, deploying code — any irreversible or regulated action.

💡 Interview Tip: HITL is a top interview signal. Frame it as a safety + compliance feature: "For any action that is irreversible or involves money/data, we insert a human approval checkpoint before execution."

Q22: What is CrewAI?

Definition: A Python framework for orchestrating role-based teams of AI agents. You declare agent identities in plain language — CrewAI handles delegation, collaboration, and retry logic.

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Market Research Analyst",
    goal="Find the top 5 AI trends for Q3 2026",
    backstory="Expert in tech markets with 10 years experience",
    tools=[web_search_tool, pdf_reader_tool]
)

writer = Agent(
    role="Technical Content Writer",
    goal="Transform research into a compelling blog post",
    backstory="Specializes in making complex AI topics accessible",
    tools=[text_editor_tool]
)

research_task = Task(
    description="Research the top AI agent trends of Q3 2026",
    agent=researcher
)

write_task = Task(
    description="Write a 1500-word post based on the research",
    agent=writer
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.hierarchical  # Manager LLM coordinates
)

result = crew.kickoff()

Q23: Process Types in a Crew

Process	How It Works	Best For
Sequential	Tasks run one after another in fixed order	Simple, linear pipelines
Hierarchical ⭐	Manager LLM assigns and reviews tasks dynamically	Complex production systems
Consensual	Agents collaborate as peers to reach agreement	Research synthesis, balanced analysis

⭐ Hierarchical is the production default — it gives you auditability and dynamic task assignment.

Section 7 — Tool Calling & Error Handling

Q24: What is Tool Calling and How Does It Work?

⚠️ Critical clarification: The LLM never executes code. It decides which tool to call and outputs a structured JSON request. The host application runs the actual code.

Step 1: User ──────────────────────────────────► LLM
        "What's the AAPL stock price?"           (receives query + tool schema)

Step 2: LLM ───────────────────────────────────► Application
        { "tool": "get_stock_price",              (LLM decides, outputs JSON)
          "args": { "ticker": "AAPL" } }

Step 3: Application ───────────────────────────► External API
        runs get_stock_price(ticker="AAPL")       (application executes)

Step 4: External API ──────────────────────────► Application ──► LLM
        { "price": 211.34, "change": "+1.2%" }   (result returned as observation)

Step 5: LLM ───────────────────────────────────► User
        "AAPL is currently trading at $211.34,    (final answer)
         up 1.2% today."

Q25: Handling Errors and Hallucinated Tool Calls

Problem: LLM calls a tool that doesn't exist, passes wrong argument types, or generates malformed JSON.

def safe_tool_call(tool_name: str, args: dict, max_retries: int = 3):

    # Layer 1: Tool name validation
    if tool_name not in REGISTERED_TOOLS:
        return {"error": f"Unknown tool: {tool_name}. Available: {list(REGISTERED_TOOLS.keys())}"}

    # Layer 2: Schema validation (Pydantic)
    tool_schema = REGISTERED_TOOLS[tool_name].schema
    try:
        validated_args = tool_schema(**args)
    except ValidationError as e:
        return {"error": f"Invalid arguments: {e}"}

    # Layer 3: Try/except with retry
    for attempt in range(max_retries):
        try:
            result = REGISTERED_TOOLS[tool_name].execute(validated_args)
            return result
        except Exception as e:
            if attempt == max_retries - 1:
                # Layer 4: Graceful failure after max retries
                return {"error": f"Tool failed after {max_retries} attempts: {str(e)}"}
            # Feed error back to LLM for self-correction
            context.append({"role": "tool", "content": f"Attempt {attempt+1} failed: {e}"})

Defense Layer	Implementation
Name Validation	Check tool name against registered tool list before execution
Schema Validation	Use Pydantic models or JSON Schema to verify argument types
Try / Except	Wrap every call; return structured errors back to LLM
Retry with Correction	Pass error as observation so LLM can self-correct
Max Retry Cap	Limit to 3 attempts; escalate or fail gracefully

💡 Interview Tip: Mentioning Pydantic for schema validation and a max-retry cap (to prevent infinite loops) shows production awareness. Naive agents that retry forever are a real production problem.

Q26: Parallel Tool Calling — What is it and When Should You Use it?

Definition: Requesting multiple tool calls in a single LLM response and executing them simultaneously.

# Sequential (slow): 3 calls × ~3 seconds each = ~9 seconds total
weather = get_weather("Tokyo")        # 3s
stock   = get_stock_price("AAPL")     # 3s
news    = get_top_news("AI")          # 3s

# Parallel (fast): all run at once = ~3 seconds total
import asyncio

async def parallel_tools():
    weather, stock, news = await asyncio.gather(
        get_weather_async("Tokyo"),
        get_stock_price_async("AAPL"),
        get_top_news_async("AI")
    )
    return weather, stock, news

	Sequential	Parallel
Time	Sum of all latencies	Slowest single tool
Use when	Tool B depends on Tool A's output	Tools are independent of each other
Example	Get User ID → Get Orders for that ID	Get Weather + Stock + News

🗒️ Quick-Reference Cheat Sheet

Topic	Key Takeaway
Agent Core Loop	Perceive → Reason → Plan → Act → Observe (ReAct framework)
MCP vs A2A	MCP = Agent ↔ Tool (vertical). A2A = Agent ↔ Agent (horizontal)
Memory Types	Short-term (RAM) → Long-term (Vector DB) → Episodic → Semantic
RAG vs Agent	RAG retrieves & reads. Agents retrieve & act. Agentic RAG does both.
LangGraph vs CrewAI	LangGraph = stateful graph workflows. CrewAI = role-based agent teams.
Tool Calling	LLM decides; Application executes. LLM never runs code directly.
Parallel Tools	Use when tools are independent. Sequential when there's a dependency.
Conflict Resolution	Voting → Supervisor → Debate → Confidence → Human-in-the-Loop
HITL	Pause + checkpoint for irreversible actions. Safety & compliance essential.
Error Handling	Validate name → validate schema → try/except → retry (max 3) → escalate

🎯 Top 5 Interview Tips

1. Use concrete examples.
For every concept, give a before/after real-world scenario (e.g., Chatbot vs. Agent booking a flight). Abstract definitions without examples are forgettable.

2. Name your tools.
Cite Pydantic, Chroma, Pinecone, LangGraph, CrewAI by name — it signals hands-on experience, not just theory.

3. Mention production concerns unprompted.
Bring up retry limits, Human-in-the-Loop, and fault tolerance before being asked. It shows you think about systems in production, not just proofs-of-concept.

4. Structure every answer the same way.
Definition → Key Distinction → Code/Example → When to use — this format is clear, complete, and easy to follow under pressure.

5. Connect MCP and A2A together.
Explicitly link them: "MCP handles tool integration; A2A handles agent collaboration — you need both in a full multi-agent system." This shows system-level thinking.

Resources to Go Deeper

Found this useful? Drop a ❤️ and share it with someone preparing for their next AI engineering interview. And if there's a question I missed — drop it in the comments below.

DEV Community