Shreekansha

Posted on Feb 2 • Originally published at Medium

AI Agents : How Modern GenAI Systems Actually Work

#machinelearning #ai #programming #softwareengineering

AI Agents Explained with Code: How Modern GenAI Systems Actually Work
From simple prompts to tool-using AI agents — a hands-on Python example.

In the early days of Generative AI, we were enamored with chatbots. We sent a text string to a model, and it returned a text string. This request-response loop is powerful for creative writing or answering trivia, but it is fundamentally reactive. It cannot "do" anything outside the boundaries of its training data.

The industry has since moved from simple chatbots to AI agents. If a chatbot is a brain in a jar, an agent is a brain with hands.

Why Chatbots Are Limited

Traditional Large Language Models (LLMs) are static. Once a model is trained, its knowledge is frozen in time. Furthermore, an LLM alone cannot interact with your private database, check the current weather, or execute a script on your local machine.

When you ask a standard chatbot, "What is the stock price of Apple right now?", it will likely apologize and say it doesn't have real-time access. This is a limitation of the interface, not necessarily the intelligence. To solve this, we need a system that can observe its environment, reason about what it needs, and use external tools to fill the gaps.

What AI Agents Really Are

An AI agent is a system that uses an LLM as its central reasoning engine to complete a goal by interacting with the world.

Unlike a standard program where every logic path is hard-coded with if/else statements, an agent is autonomous. You give it a high-level objective, and the agent decides which steps to take, which tools to call, and how to interpret the results of those actions.
The fundamental difference is the shift from explicit programming to goal-oriented orchestration.

Core Components of an AI Agent

To understand an agent, we can break it down into four primary modules:

The Brain (LLM): The reasoning core that parses instructions and decides on the next action.

Planning: The ability to break a complex goal (e.g., "Research this company and write a summary") into smaller, manageable sub-tasks.

Memory: Short-term memory (the conversation history) and long-term memory (retrieving documents or past experiences).

Tools (Actuators): A set of functions the agent can call, such as a web search, a database query, or a calculator.

Simple Python Example: An Agent Using Tools
The following code demonstrates a "ReAct" (Reasoning + Acting) agent. This is a common pattern where the agent thinks, takes an action, observes the result, and repeats until the task is done.

In this example, we define a simple agent that has access to a mock "Database" tool and a "Calculator" tool.

import json

# --- 1. Tool Definitions ---
# These are the 'hands' of our agent.

def get_user_data(username):
    """Retrieves account balance from a mock database."""
    db = {
        "alice": {"balance": 1500},
        "bob": {"balance": 2800}
    }
    return db.get(username.lower(), "User not found")

def calculate_tax(amount, rate=0.15):
    """Calculates tax on a specific amount."""
    return amount * rate

# Map tool names to actual functions
AVAILABLE_TOOLS = {
    "get_user_data": get_user_data,
    "calculate_tax": calculate_tax
}

# --- 2. The Agent Core ---

class SimpleAgent:
    def __init__(self, system_prompt):
        self.system_prompt = system_prompt
        self.memory = []

    def construct_prompt(self, user_input):
        # We define a strict format for the agent to follow
        prompt = f"{self.system_prompt}\n\nAvailable Tools:\n"
        for name, func in AVAILABLE_TOOLS.items():
            prompt += f"- {name}: {func.__doc__}\n"

        prompt += f"\nUser Goal: {user_input}\n"
        prompt += "Your response must be in JSON format with two keys: 'thought' and 'action'.\n"
        prompt += "If an action is needed, 'action' should be {'name': 'tool_name', 'parameters': {...}}.\n"
        prompt += "If the goal is met, 'action' should be {'name': 'final_answer', 'parameters': {'answer': '...'}}.\n"
        return prompt

    def run(self, user_goal):
        print(f"Goal: {user_goal}")

        # In a real system, you would call an LLM API here.
        # We are mocking the LLM's 'Thought' and 'Action' cycle.

        # Turn 1: Agent decides to fetch user data
        turn_1_output = {
            "thought": "I need to find Bob's balance to calculate his tax.",
            "action": {"name": "get_user_data", "parameters": {"username": "bob"}}
        }
        self.execute_turn(turn_1_output)

        # Turn 2: Agent receives balance and decides to calculate tax
        turn_2_output = {
            "thought": "Bob's balance is 2800. Now I will calculate the 15% tax.",
            "action": {"name": "calculate_tax", "parameters": {"amount": 2800}}
        }
        self.execute_turn(turn_2_output)

        # Turn 3: Final answer
        turn_3_output = {
            "thought": "I have the tax amount (420.0). I can now provide the final answer.",
            "action": {"name": "final_answer", "parameters": {"answer": "Bob's estimated tax is 420.0."}}
        }
        self.execute_turn(turn_3_output)

    def execute_turn(self, llm_response):
        thought = llm_response['thought']
        action = llm_response['action']

        print(f"\nThought: {thought}")

        if action['name'] == "final_answer":
            print(f"RESULT: {action['parameters']['answer']}")
        else:
            tool_name = action['name']
            params = action['parameters']
            result = AVAILABLE_TOOLS[tool_name](**params)
            print(f"Action: Called {tool_name} with {params}")
            print(f"Observation: {result}")

# --- 3. Execution ---

SYSTEM_INSTRUCTION = "You are a financial assistant that uses tools to answer user queries accurately."
agent = SimpleAgent(SYSTEM_INSTRUCTION)
agent.run("Calculate the tax for Bob's account balance.")

Step-by-Step Explanation of the Code
The Toolset

We start by defining Python functions. Each function has a docstring. In a real-world scenario, the LLM reads these docstrings to understand what the tool does. This is how the agent "learns" its capabilities dynamically.

The JSON Interface

Reliability is the biggest challenge in agent development. By forcing the model to output JSON with a thought and an action key, we create a structured bridge between the probabilistic nature of the AI and the deterministic nature of our code.

The Execution Loop
The agent follows a loop:

Reason: The LLM analyzes the goal and current state.
Act: It selects a tool and provides arguments.
Observe: The Python code executes the tool and returns the result to the LLM.
Repeat: The LLM uses the new "Observation" to decide the next move.

Real-World Use Cases

AI agents are moving into production environments in several key areas:

Autonomous Coding Agents: Tools that can read an entire codebase, find a bug, write a fix, run the tests, and submit a pull request.

Customer Support Orchestrators: Agents that don't just chat but actually check order status in a CRM and issue refunds through a payment API.

Data Analysis: Agents that can write and execute SQL queries to generate reports based on natural language questions.

Common Mistakes and Misconceptions

One common mistake is building a monolithic agent. Developers often try to create one "super-agent" that has 50 different tools. This leads to "tool confusion," where the LLM struggles to pick the right one. A better approach is to build a team of specialized agents—one for database access, one for calculation, and one for final formatting.

Another misconception is that agents are deterministic. Even with structured output, an agent might decide on a different path each time it runs. Robust error handling and "guardrail" functions are essential to ensure the agent doesn't enter an infinite loop or perform dangerous actions.

Conclusion

AI agents represent a fundamental shift in how we build software. We are moving away from writing every line of logic ourselves and toward designing systems that can reason through problems. By providing the LLM with a clear structure, specialized tools, and a feedback loop, we can build applications that are more flexible and capable than anything we have seen before.

As the line between human architecting and AI execution blurs, the most valuable skill for a developer is no longer just writing the code, but designing the environment in which an agent can successfully operate. This requires a new mental model: one where the boundary between a bug in the code and a failure in the agent's reasoning is managed through rigorous evaluation and robust system design.

DEV Community

AI Agents : How Modern GenAI Systems Actually Work

Top comments (0)