DEV Community

Cover image for 48/60 Days System Design Questions
Joud Awad
Joud Awad

Posted on

48/60 Days System Design Questions

Your AI agent just got a user message: “Book me a flight to Dubai next Friday.”

The LLM has access to 12 tools: search_flights, get_user_preferences, check_calendar, book_flight, send_confirmation, get_weather…

How does the agent decide which tools to call, in what order, and when to stop?

A) ReAct loop — model reasons step-by-step, emits a “thought” then picks one tool at a time, observes output, repeats until it self-decides it’s done

B) Parallel tool calling — model emits ALL required tool calls in a single response, executes them concurrently, feeds all results back in one context update

C) Forced function schema — you lock the model into a strict JSON schema per turn; it can’t produce free text, only structured tool calls you defined

D) Planner-executor split — a lightweight planner LLM creates a tool call DAG upfront, a separate executor runs the graph, results flow back to planner only at checkpoints

Pick one — A, B, C, or D — and tell me why you’d use it in production.

Full breakdown in the comments. 👇

Top comments (7)

Collapse
 
thejoud1997 profile image
Joud Awad

A — ReAct loop Correct for most cases

ReAct (Reason + Act) is the dominant production pattern for a reason.

The model emits a “Thought:” explaining what it’s doing, then an “Action:” with a single tool call, then observes the result before deciding what’s next.

Why it works:

• Each step is observable and debuggable

• The model can bail out, retry, or change direction after each tool result

• Works even with tools that have side effects — you can gate dangerous calls

• Supported natively by OpenAI, Anthropic, LangChain, LlamaIndex out of the box

The trap: ReAct is sequential. If you need get_user_preferences AND check_calendar AND search_flights all before book_flight, you’re making 3 round trips when you could make 1.

Collapse
 
thejoud1997 profile image
Joud Awad

B — Parallel tool calling Right when dependencies allow it

Modern LLMs (GPT-4o, Claude 3.5+) can emit multiple tool calls in a single response.

If the tools are independent — run them in parallel. Massive latency win.

Example: get_user_preferences + check_calendar have no dependency on each other. Call both at once. Then feed both results back before calling search_flights.

The trap: not all tools are independent. If Tool B needs the output of Tool A, parallel calling breaks your context. The model doesn’t “know” B failed because A hadn’t run yet — it just gets a bad result and may hallucinate a fix.

Production pattern: use parallel calling for the fan-out phase, ReAct for the sequential decision phase.

Collapse
 
thejoud1997 profile image
Joud Awad

C — Forced function schema The senior engineer trap

Looks like control. Actually limits you.

When you force strict JSON schema mode, you lose the model’s ability to reason before calling.

This breaks down fast when:

• The user’s intent is ambiguous (should the agent call search_flights or clarify_intent?)

• A tool fails and the model needs to decide whether to retry or give up

• You need conditional logic between steps

Use schema enforcement for output validation, not tool dispatch. Validate what comes back from the model — don’t handcuff the model going in…

Collapse
 
thejoud1997 profile image
Joud Awad

D — Planner-executor split 🏗️ Right at scale, overkill for most

This is the architecture behind production agentic pipelines like Devin, AutoGPT-style systems, and enterprise agent frameworks.

The planner builds a DAG of tool calls upfront. The executor runs it. The planner only re-engages at checkpoints or on failure.

Benefits:

• Parallelism by default (the DAG has clear dependency edges)

• Cost-efficient (the powerful planner model runs less)

• Auditable — you can inspect the plan before running it

The trap: the planner has to get the full plan right before execution starts. If the world changes mid-run (API returns unexpected data, user updates their request), you need replanning. That’s a whole second problem.

Most teams reach for this too early. Start with ReAct. Add parallel tool calling when latency hurts. Only build planner-executor when you have complex multi-step workflows with 10+ tools and real parallelism requirements…

Collapse
 
dc_codes_edcf29466db281f5 profile image
c0d3l0v3r

I think A is correct Approach, because I planning the steps and then acting based on them is the main core concept behind the Agentic AI systems.

B -> Will call all the tools, will require more computing power for each request we would be putting our system under too much load.

C-> This hinders the ability of the genAI to be Gen AI .....

D -> might work but looks very complicated and difficult to maintain dependencies.

Collapse
 
richard_smith_154156d471ef profile image
Richard Smith

The "start with ReAct, add complexity when you actually need it" advice is solid. Always tempting to over-engineer agent pipelines too early.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.