From One Tool to a Plan — Multi-Step Agents with NVIDIA NIM

#nvidia #ai #python #tutorial

In Part 5 we gave the model a list of tools and let it pick one. Ask the time, it calls the clock. Ask about the AI Club, it calls the retriever. That's already an agent — but a shallow one. Every question got answered in a single tool call.

Real questions aren't like that. "How many days until the next AI Club meeting?" has no single tool that answers it. The model has to search the knowledge base to learn the club meets on Thursday, then do date math on "Thursday" to count the days. Two tools, in order, where the second one can't run until the first one comes back.

That's the jump this post makes: from picking a tool to running a plan. The pattern has a name — ReAct, for Reason + Act — and it's the loop underneath almost every agent framework you'll meet later. We build it in plain Python on the same hosted NIM endpoint, and we print the trace so you can watch the model think.

I'm B Torkian, NVIDIA Developer Champion at USC. Part 6 of the series.

What you're adding

User question
  → NIM call (with tools schema)
  → model calls a tool       (Act)
  → your code runs it, returns the result   (Observe)
  → NIM call again — model reads the result and decides:
        another tool?  →  loop
        done?          →  final answer       (Reason)
  → repeat until answered or you hit the step cap

Part 5 had this exact loop — but the demo questions only ever went around it once. Part 6 changes two things so it goes around multiple times on purpose:

A third tool that depends on another tool's output, so a single call can't finish the job.
A visible trace, so the multi-step reasoning shows up as control flow you can read.

The chat call from Part 1, the retriever from Part 2, and the refusal fallback from Parts 1 and 3 all carry forward unchanged.

What "multi-step" actually means here

A one-shot tool call looks like this:

Q: When does the AI Club meet?
model → search_campus_info("AI Club meeting") → "every Thursday at 5 PM" → A: Thursdays at 5 PM.

A multi-step plan looks like this:

Q: How many days until the next AI Club meeting?
model → search_campus_info("AI Club meeting day") → "every Thursday"
model reads that, then → days_until_weekday("Thursday") → "in 5 days, on June 18"
model reads that → A: The next meeting is this Thursday, June 18 — 5 days away.

Nothing in the framework changed. The same loop runs twice instead of once, because the model decided — after seeing the first result — that it needed a second tool. The intelligence is in the model choosing the sequence; your job is to give it good tools and a loop that doesn't fall over.

Step 1 — Carry the setup forward

You need the client, MODEL, the knowledge_base, and retrieve_context from Parts 1, 2, and 5. The Colab notebook has a compact prerequisite cell; the standalone part6_react_agent.py defines everything from scratch so it runs on its own.

We stay on meta/llama-3.3-70b-instruct — the same bump we made in Part 5. It matters even more here: choosing one tool is forgiving, but sequencing tools (search first, calculate second) is where the smaller model loses the plot. Same hosted endpoint; only the model string is different from Parts 1–4.

MODEL = "meta/llama-3.3-70b-instruct"
LOCAL_TZ = "America/Los_Angeles"   # so "today" is consistent across the tools

Step 2 — Three tools, one of which forces chaining

The clock and the retriever you already know. The new one is days_until_weekday — and it's deliberately useless on its own. It needs a weekday as input, and the only way to learn the right weekday is to search the knowledge base first.

WEEKDAYS = ["Monday", "Tuesday", "Wednesday", "Thursday",
            "Friday", "Saturday", "Sunday"]

def get_current_time(timezone: str = LOCAL_TZ) -> str:
    try:
        zone = ZoneInfo(timezone)
    except Exception:
        zone = ZoneInfo("UTC")
    return datetime.now(zone).strftime("%A, %B %d, %Y at %I:%M %p %Z")

def search_campus_info(query: str) -> str:
    return retrieve_context(query, k=3)   # the Part 2 retriever, reused

def days_until_weekday(weekday: str) -> str:
    target = weekday.strip().capitalize()
    if target not in WEEKDAYS:
        return f"'{weekday}' is not a valid weekday."
    today = datetime.now(ZoneInfo(LOCAL_TZ))
    delta = (WEEKDAYS.index(target) - today.weekday()) % 7
    date_str = (today + timedelta(days=delta)).strftime("%B %d, %Y")
    if delta == 0:
        return f"Today is {target} ({date_str}) — that is 0 days away."
    return f"The next {target} is in {delta} day(s), on {date_str}."

That days_until_weekday dependency on search_campus_info is the whole lesson. It's what turns "call a tool" into "make a plan."

Step 3 — Describe the tools, and hint at the order

The schema is what the model reads to decide what to call. For a multi-step agent, the descriptions should hint at sequence, not just purpose. Notice the last line of days_until_weekday:

tools = [
    {"type": "function", "function": {
        "name": "search_campus_info",
        "description": "Search the USC campus knowledge base for facts about "
                       "clubs, labs, workshops, office hours, tutoring, and the "
                       "NVIDIA Developer Program. Use this to find WHEN or WHERE "
                       "something happens. Always call this for any USC fact.",
        "parameters": {"type": "object",
            "properties": {"query": {"type": "string",
                "description": "The USC campus question or search phrase."}},
            "required": ["query"]},
    }},
    {"type": "function", "function": {
        "name": "get_current_time",
        "description": "Get the current date, day of week, and time. Use this when "
                       "the answer depends on what day or time it is right now.",
        "parameters": {"type": "object",
            "properties": {"timezone": {"type": "string",
                "description": "IANA time zone, e.g. America/Los_Angeles."}}},
    }},
    {"type": "function", "function": {
        "name": "days_until_weekday",
        "description": "Calculate how many days from today until the next given "
                       "weekday. Use this AFTER you know which day an event happens. "
                       "You usually have to call search_campus_info first.",
        "parameters": {"type": "object",
            "properties": {"weekday": {"type": "string",
                "description": "A weekday name, e.g. Monday, Thursday."}},
            "required": ["weekday"]},
    }},
]

available_tools = {
    "search_campus_info": search_campus_info,
    "get_current_time": get_current_time,
    "days_until_weekday": days_until_weekday,
}

"You usually have to call search_campus_info first" is prompt engineering aimed at the model's planner. Vague tool docs produce an agent that calls things in the wrong order or skips a step.

Step 4 — The ReAct loop, with the trace turned on

Same skeleton as Part 5, with three things worth slowing down for: a bigger step budget, a printed trace, and tool execution wrapped so a bad call can't crash the loop.

SYSTEM_PROMPT = (
    "You are a USC campus assistant that solves questions step by step using tools. "
    "Work in a loop: think about what you still need, call ONE tool to get it, read "
    "the result, then decide whether you can answer or need another tool. Many "
    "questions need more than one tool — to find how many days until an event, first "
    "search for the day it happens, then call days_until_weekday with that day. "
    "Base your final answer strictly on tool results. If the tools cannot answer, "
    "reply exactly: I don't have that information — check with the USC AI Club."
)

MAX_STEPS = 5   # multi-step questions need more room than Part 5's cap of 3

def run_agent(question: str, verbose: bool = True) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": question},
    ]

    for step in range(1, MAX_STEPS + 1):
        response = client.chat.completions.create(
            model=MODEL, messages=messages, tools=tools,
            tool_choice="auto", temperature=0.2, max_tokens=400,
        )
        message = response.choices[0].message
        messages.append(message.model_dump(exclude_none=True))

        if not message.tool_calls:            # model is done → final answer
            return message.content

        for tool_call in message.tool_calls:  # run every tool it asked for
            name = tool_call.function.name
            try:
                arguments = json.loads(tool_call.function.arguments or "{}")
            except json.JSONDecodeError:
                arguments = {}

            if name not in available_tools:
                result = f"Tool '{name}' is not available."
            else:
                try:
                    result = available_tools<a href="**arguments">name</a>
                except Exception as exc:       # a bad call must not kill the agent
                    result = f"Tool '{name}' failed: {exc}"

            if verbose:
                print(f"  step {step} · acting  -> {name}({json.dumps(arguments)})")
                print(f"  step {step} · observe <- {result}")

            messages.append({"role": "tool", "tool_call_id": tool_call.id,
                             "name": name, "content": str(result)})

    return "I reached the step limit before finishing — try asking a narrower question."

What changed from Part 5, and why:

MAX_STEPS = 5 — a one-shot loop can stop at 3. A planner needs room to search, calculate, and sometimes correct itself. Keep the cap small and visible; an agent with no hard stop will occasionally spiral.
The trace — printing acting -> and observe <- each iteration is the single most useful debugging habit for agents. When an agent misbehaves, it's almost always because it called the wrong tool or read the result wrong, and the trace shows you exactly which.
try/except around the tool call — the model writes the arguments, which means the model can write bad arguments. Catch it and hand the error back as a tool result; the agent will usually recover on the next step instead of crashing your program.

Step 5 — Run it and read the trace

for question in [
    "How many days until the next USC AI Club meeting?",  # search -> days_until_weekday
    "Is the USC GPU lab open right now?",                 # clock + search, then reason
    "When does the USC AI Club meet?",                    # one tool is enough
    "What is the campus wifi password?",                  # nothing to find — refuse
]:
    print(f"Q: {question}")
    print(f"A: {run_agent(question, verbose=True)}\n")

What you should see in the trace:

Days until the meeting — two steps: search_campus_info returns "every Thursday," then days_until_weekday("Thursday") returns the count. The model only answers after the second observation.
Is the lab open right now — the model pulls the current day and hour from get_current_time, the posted hours (Mon–Fri, 10 AM–6 PM) from search_campus_info, then reasons about whether now is inside that window.
When does the club meet — one search, done. A good agent doesn't pad its plan with tools it doesn't need.
Wifi password — it searches, finds nothing, and falls back to the refusal line. The Part 3 refusal pattern still holds, now inside a multi-step loop.

Model behavior isn't perfectly deterministic — some runs take a slightly different path. That's worth seeing too: the trace lets you watch the variance instead of guessing about it.

Step 6 — What you actually built

The assistant can now reason across steps:

Workshop 1 gave it a brain (the chat call).
Workshop 2 gave it memory of facts (retrieval).
Workshop 3 gave it judgment (guardrails).
Workshop 4 gave it portability (hosted or local).
Workshop 5 gave it hands (one tool call).
Workshop 6 gave it a plan (chaining tools in a loop).

This is the architecture under LangGraph, CrewAI, AutoGen, and the rest. They add state machines, retries, sub-agents, and dashboards — but the center is the loop you just wrote: call the model with tools, run what it asks for, feed the result back, repeat. Common next steps:

More tools — a calendar, a ticketing API, a web search, a code sandbox.
A real planner that writes the full step list before any tool fires, instead of deciding one step at a time.
Memory across turns so the agent remembers what it already looked up.
Observability — that acting/observe trace, but logged and searchable. Production agents live or die on it.

If you take one thing from the whole series: an LLM is a normal Python function with a weird interior, and an agent is a while loop around it. You own the loop. The model just fills in the blanks.

Get the code

Repo: github.com/torkian/nvidia-nim-workshop
One-click Colab: Open part6_react_agent.ipynb
Local Python: part6_react_agent.py in the repo (python3 part6_react_agent.py after pip install -r requirements.txt).

MIT licensed. I run this at USC — fork it, swap the knowledge base and the tools for your school, your club, your project.