I Baked a Football Cake and It Taught Me About Building AI Agents

#ai #agents #beginners #python

I recently baked a football cake and it helped me realize AI agents work just like layered desserts. Here’s how flavors, molds and icing maps to agentic design.

The code example below is designed to break down user goals into actionable steps and execute them using either custom tools or LLM reasoning. It uses regex to extract numbered steps from the LLM’s plan output. Executes each step by matching keywords like "search" or "compute" to the appropriate tool, or falls back to LLM reasoning.

Just like a custom cake has layers of flavor, structure, and decoration, an AI agent has its own stack.

It uses llama 3 LLM model and created two custom simple tools:

search_tool() - simulates a search engine returning mock results.
compute_tool() - simulates a computation task returning a placeholder result.

Implementing the agent architecture:

Base model: The sponge layer. I used llama 3, Ollama LLM model. It handles basic reasoning via the LLM.

# -------------------------------------------------------
# Ollama LLM Wrapper
# -------------------------------------------------------
class OllamaLLM:
    def __init__(self, model="llama3"):
        self.model = model

    def __call__(self, prompt: str) -> str:
        """Send a prompt to a local Ollama instance."""
        resp = requests.post(
            "http://localhost:11434/api/generate",
            json={"model": self.model, "prompt": prompt, "stream": False}
        )
        text = json.loads(resp.text).get("response", "")
        return text

# -------------------------------------------------------
# Base Agent 
# -------------------------------------------------------
class AgentCore:
    def __init__(self, llm):
        self.llm = llm

    def reason(self, prompt):
        return self.llm(prompt)

Tool execution: The icing and decor are the search and compute tools.

# -------------------------------------------------------
# Local Tools
# -------------------------------------------------------
def search_tool(query: str) -> dict:
    return {
        "tool": "search",
        "query": query,
        "results": [
            {"title": "Top NFL QBs 2024", "eff": 98.1},
            {"title": "Quarterback Rankings", "eff": 95.6},
        ],
    }


def compute_tool(task: str) -> dict:
    return {
        "tool": "compute",
        "task": task,
        "result": 42,  # we pretend the tool computed something important
    }

Prompt logic: Like flavor choices, it defines behavior. It adds step parsing and tool execution.

# -------------------------------------------------------
# Agent prompt and structured tool execution
# -------------------------------------------------------
class StructuredAgent(AgentCore):

    def parse_steps(self, plan: str):
        """Extract step lines starting with numbers."""
        lines = plan.split("\n")
        steps = []
        for line in lines:
            match = STEP_REGEX.search(line.strip())
            if match:
                cleaned = match.group(2).strip()
                steps.append(cleaned)
        return steps

    def execute_step(self, step: str):
        step_lower = step.lower()

        if "search" in step_lower:
            return search_tool(step)

        if "calculate" in step_lower or "compute" in step_lower:
            return compute_tool(step)

        # fallback: let the model reason
        return self.reason(step)

    def run(self, goal: str):
        PLAN_PROMPT =f"""You are a task decomposition engine.  
Your ONLY job is to break the user's goal into a small set of concrete, functional steps.
Your outputs MUST stay within the domain of the user’s goal.  
If the goal references football, metrics, or sports, remain in that domain only.

RULES:
- Only return steps directly needed to complete the user’s goal.
- Do NOT invent topics, examples, reviews, or unrelated domains.
- Do NOT expand into full explanations.
- No marketing language.
- No creative writing.
- No assumptions beyond the user's exact goal.
- No extra commentary.

FORMAT:
1. <short step>
2. <short step>
3. <short step>

User goal: "{goal}"
"""
        plan = self.llm(PLAN_PROMPT)
        steps = self.parse_steps(plan)

        outputs = []
        for step in steps:
            outputs.append({
                "step": step,
                "output": self.execute_step(step)
            })
        return outputs

User-facing output: The final taste = formatted responses. It formats the output for user facing responses.

# -------------------------------------------------------
# User Facing Agent (Formatted Output Layer)
# -------------------------------------------------------
class FinalAgent(StructuredAgent):
    def respond(self, goal: str):
        results = self.run(goal)

        formatted = "\n".join(
            f"- **{r['step']}** → {r['output']}"
            for r in results
        )

        return (
            f"## Result for goal: *{goal}*\n\n"
            f"{formatted}\n"
        )

Takeaway

Whether you're baking or writing code, structure matters. Think in layers. And if you ever need a sweet analogy to explain AI agents try cake. Got a dev inspired dessert metaphor? Drop it in the comments and let’s make tech tasty.

Tutorial

System Requirements:
Python version: 3.11.6
Ollama: Install and run Ollama locally to serve the Llama 3 model.

import requests
import json
import re

# parser that detects all common LLM step styles, including:
# 1. Do X
# 1) Do X
# Step 1: Do X
# **Step 1:** Do X
# - Step 1: Do X
# ### Step 1
# Step One:

STEP_REGEX = re.compile(
    r"(?:^|\s)(?:\*\*)?(?:Step\s*)?(\d+)[\.\):\- ]+(.*)", re.IGNORECASE
)



# -------------------------------------------------------
# Ollama LLM Wrapper
# -------------------------------------------------------
class OllamaLLM:
    def __init__(self, model="llama3"):
        self.model = model

    def __call__(self, prompt: str) -> str:
        """Send a prompt to a local Ollama instance."""
        resp = requests.post(
            "http://localhost:11434/api/generate",
            json={"model": self.model, "prompt": prompt, "stream": False}
        )
        text = json.loads(resp.text).get("response", "")
        return text

# -------------------------------------------------------
# Base Agent 
# -------------------------------------------------------
class AgentCore:
    def __init__(self, llm):
        self.llm = llm

    def reason(self, prompt):
        return self.llm(prompt)

# -------------------------------------------------------
# Local Tools
# -------------------------------------------------------
def search_tool(query: str) -> dict:
    return {
        "tool": "search",
        "query": query,
        "results": [
            {"title": "Top NFL QBs 2024", "eff": 98.1},
            {"title": "Quarterback Rankings", "eff": 95.6},
        ],
    }


def compute_tool(task: str) -> dict:
    return {
        "tool": "compute",
        "task": task,
        "result": 42,  # we pretend the tool computed something important
    }



# -------------------------------------------------------
# Agent prompt and structured tool execution
# -------------------------------------------------------
class StructuredAgent(AgentCore):

    def parse_steps(self, plan: str):
        """Extract step lines starting with numbers."""
        lines = plan.split("\n")
        steps = []
        for line in lines:
            match = STEP_REGEX.search(line.strip())
            if match:
                cleaned = match.group(2).strip()
                steps.append(cleaned)
        return steps

    def execute_step(self, step: str):
        step_lower = step.lower()

        if "search" in step_lower:
            return search_tool(step)

        if "calculate" in step_lower or "compute" in step_lower:
            return compute_tool(step)

        # fallback: let the model reason
        return self.reason(step)

    def run(self, goal: str):
        PLAN_PROMPT =f"""You are a task decomposition engine.  
Your ONLY job is to break the user's goal into a small set of concrete, functional steps.
Your outputs MUST stay within the domain of the user’s goal.  
If the goal references football, metrics, or sports, remain in that domain only.

RULES:
- Only return steps directly needed to complete the user’s goal.
- Do NOT invent topics, examples, reviews, or unrelated domains.
- Do NOT expand into full explanations.
- No marketing language.
- No creative writing.
- No assumptions beyond the user's exact goal.
- No extra commentary.

FORMAT:
1. <short step>
2. <short step>
3. <short step>

User goal: "{goal}"
"""
        plan = self.llm(PLAN_PROMPT)
        steps = self.parse_steps(plan)

        outputs = []
        for step in steps:
            outputs.append({
                "step": step,
                "output": self.execute_step(step)
            })
        return outputs


# -------------------------------------------------------
# User Facing Agent (Formatted Output Layer)
# -------------------------------------------------------
class FinalAgent(StructuredAgent):
    def respond(self, goal: str):
        results = self.run(goal)

        formatted = "\n".join(
            f"- **{r['step']}** → {r['output']}"
            for r in results
        )

        return (
            f"## Result for goal: *{goal}*\n\n"
            f"{formatted}\n"
        )


# -------------------------------------------------------
# Test Cases
# -------------------------------------------------------
if __name__ == "__main__":
    agent = FinalAgent(llm=OllamaLLM("llama3"))

    tests = [
        "Compare NFL quarterback efficiency metrics and summarize insights.",
        "Search for top training drills for youth football players.",
        "Compute a simple metric and explain how you'd structure the process.",
    ]

    for i, t in enumerate(tests, 1):
        print("=" * 70)
        print(f"TEST {i}: {t}")
        print(agent.respond(t))
        print()

Sample Output

======================================================================
TEST 1: Compare NFL quarterback efficiency metrics and summarize insights.

Result for goal: Compare NFL quarterback efficiency metrics and summarize insights.

Gather data on NFL quarterback statistics → Here are some key statistics for NFL quarterbacks, gathered from various sources including Pro-Football-Reference.com, ESPN, and NFL.com:

Passing Statistics

Career Passing Yards: + Tom Brady: 73,517 yards (most in NFL history) + Drew Brees: 72,503 yards + Peyton Manning: 71,940 yards
Career Touchdowns: + Tom Brady: 624 touchdowns + Drew Brees: 571 touchdowns + Aaron Rodgers: 462 touchdowns
Interceptions: + Brett Favre: 336 interceptions (most in NFL history) + Eli Manning: 244 interceptions + Philip Rivers: 234 interceptions
Completion Percentage: + Aaron Rodgers: 65.5% completion percentage (highest in NFL history) + Drew Brees: 64.7% + Tom Brady: 63.4%

Rushing Statistics

Career Rushing Yards: + Cam Newton: 5,442 yards + Russell Wilson: 3,911 yards + Michael Vick: 3,844 yards
Career Rushing Touchdowns: + Cam Newton: 85 rushing touchdowns (most among QBs) + Russell Wilson: 44 rushing touchdowns + Michael Vick: 36 rushing touchdowns

Other Statistics

Win-Loss Record: + Tom Brady: 230-72 regular season record (best among QBs) + Drew Brees: 208-115 regular season record + Peyton Manning: 208-141 regular season record
Playoff Wins: + Tom Brady: 32 playoff wins (most in NFL history) + Joe Montana: 23 playoff wins + Terry Bradshaw: 20 playoff wins

Note: These statistics are accurate as of the end of the 2020 NFL season and may change over time.

I hope this helps! Let me know if you have any specific questions or if there's anything else I can help with.

Identify relevant efficiency metrics (e.g., passer rating, yards per attempt) → Here are some common efficiency metrics used to evaluate quarterbacks:

Passer Rating: A quarterback's cumulative performance based on completion percentage, yards per attempt, touchdowns, and interceptions. * Formula: (Completions - Attempts + Touchdowns * 5 + Interceptions * -2) / Attempted Passes
Yards Per Attempt (YPA): Measures a quarterback's average yardage gained per pass attempt.
Completion Percentage: The percentage of passes completed out of total attempts.
Touchdown-to-Interception Ratio: Evaluates a quarterback's ability to score touchdowns compared to throwing interceptions.
Red Zone Efficiency: Measures a quarterback's success in scoring touchdowns within the opponent's 20-yard line (e.g., red zone).
Third Down Conversion Percentage: Assesses a quarterback's ability to convert third-down plays into first downs.
Fourth Quarter Points Per Game: Evaluates a quarterback's performance in critical, late-game situations.
Adjusted Net Yards Per Attempt (ANY/A): A more advanced metric that adjusts for opponent strength and incorporates additional factors like sacks and fumbles.

Other efficiency metrics used to evaluate quarterbacks include:

Quarterback Wins: A simple measure of a quarterback's contribution to their team's win-loss record.
Passer Rating Average per Game: A variation of the traditional passer rating formula, adjusted for games played.
Sack Rate: Measures a quarterback's frequency of being sacked relative to their total pass attempts.
Fumble Rate: Evaluates a quarterback's tendency to fumble the ball relative to their total pass attempts.

Keep in mind that each metric has its strengths and weaknesses, and no single efficiency metric can fully capture a quarterback's performance.

Calculate averages and rankings for each quarterback → {'tool': 'compute', 'task': 'Calculate averages and rankings for each quarterback', 'result': 42}