Over the past week I built a multi-agent AI system that autonomously scans the internet for bargains, estimates the true value of products using three different pricing techniques, and pushes a notification straight to my phone the moment it finds a deal worth acting on. Along the way I picked up a ton of practical, transferable lessons about agentic AI architecture, RAG, fine-tuning vs. prompting, tool calling, and shipping a real (if scrappy) product with a Gradio front end.
This post is my write-up of the whole journey, with the code that made it work.
The Big Picture: A Team of Specialist Agents
The system, nicknamed "The Price Is Right", is built around the idea that no single model is the best at everything. Instead of one giant prompt, the architecture splits the problem into focused agents that each do one job well, coordinated by a planning agent:
- Scanner Agent – trawls deal RSS feeds and uses a cheap LLM to pick the 5 most promising, well-described deals.
- Specialist Agent – a small open-source model (Llama 3.2) that I fine-tuned specifically to estimate product prices, deployed serverlessly on Modal.
- Frontier Agent – a frontier model (GPT-5.1) doing price estimation with RAG, pulling similar products from a vector database for context.
- Ensemble Agent – combines the Specialist, Frontier, and a classic neural network into a single weighted estimate.
- Messaging Agent – sends a push notification to my phone via Pushover when a great deal is found.
- Autonomous Planning Agent – the "brain" that ties everything together using function/tool calling, deciding what to scan, what to estimate, and when to notify.
- Deal Agent Framework – the orchestration layer that runs the whole loop and persists memory between runs.
- Gradio UI – a live dashboard showing deals as they're discovered, plus a 3D visualization of the underlying product vector store.
Here's how the five days of building this broke down.
Day 1: Deploying Models to the Cloud with Modal
The first lesson was about infrastructure: how do you run a model (especially a fine-tuned open-source LLM) without managing your own GPU server? The answer here was Modal, a serverless platform for running Python functions in the cloud — including on GPUs.
The "hello world" of Modal is refreshingly simple. You define an App, an Image (essentially a container spec with pip dependencies), and decorate a normal Python function with @app.function:
# hello.py
import modal
from modal import Image
# Setup
app = modal.App("hello")
image = Image.debian_slim().pip_install("requests")
# Hello!
@app.function(image=image)
def hello() -> str:
import requests
response = requests.get("https://ipinfo.io/json")
data = response.json()
city, region, country = data["city"], data["region"], data["country"]
return f"Hello from {city}, {region}, {country}!!"
# New - added thanks to student Tue H.!
@app.function(image=image, region="eu")
def hello_europe() -> str:
import requests
response = requests.get("https://ipinfo.io/json")
data = response.json()
city, region, country = data["city"], data["region"], data["country"]
return f"Hello from {city}, {region}, {country}!!"
What I loved here is the region="eu" parameter — with one keyword argument you can pin where in the world your function actually executes, which matters for latency, data residency, and sometimes cost.
Calling it locally vs. remotely is just as simple from a notebook:
from hello import app, hello, hello_europe
with app.run():
reply = hello.local() # runs on your machine
with app.run():
reply = hello.remote() # runs on Modal's cloud
Running a Real LLM on a GPU
The next step up is running an actual language model. This is where Modal's GPU support and Secrets come in — you don't want to hardcode your Hugging Face token, so you register it once in Modal's dashboard under a name (e.g. huggingface-secret) and reference it in code:
# llama.py
import modal
from modal import Image
# Setup
app = modal.App("llama")
image = Image.debian_slim().pip_install("torch", "transformers", "accelerate")
secrets = [modal.Secret.from_name("huggingface-secret")]
GPU = "T4"
MODEL_NAME = "meta-llama/Llama-3.2-3B"
@app.function(image=image, secrets=secrets, gpu=GPU, timeout=1800)
def generate(prompt: str) -> str:
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto")
set_seed(42)
inputs = tokenizer.encode(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=5)
return tokenizer.decode(outputs[0])
A few things clicked for me here:
-
gpu="T4"is all it takes to request a GPU-backed container. No CUDA driver wrangling, no Dockerfile. -
timeout=1800matters because the first call has to download and load model weights — that cold start can take minutes. - Everything inside the function body that needs the GPU (the
transformersimports, tokenizer, model) is imported inside the function, so it only happens in the cloud container, not on my laptop.
From Ephemeral Apps to a Deployed Pricer Service
The really important conceptual jump on Day 1 was going from an ephemeral app (with app.run(): ..., which spins up and tears down for a single call) to a deployed app:
uv run modal deploy -m pricer_service
Once deployed, the service runs independently of my notebook, and I can call it from anywhere just by referencing it by name:
import modal
Pricer = modal.Cls.from_name("pricer-service", "Pricer")
pricer = Pricer()
reply = pricer.price.remote("Quadcast HyperX condenser mic, connects via usb-c to your computer for crystal clear audio")
print(reply)
This is essentially how you'd put a fine-tuned model "behind an API" for a production system — and it's the foundation for the Specialist Agent, which wraps this exact deployed pricer.
There's also a nice optimization here: by default a Modal container scales down to zero when idle, so the first call after inactivity can take ~30 seconds to wake up. If you're willing to spend a few extra credits, you can keep a container warm:
import modal
Pricer = modal.Cls.from_name("pricer-service", "Pricer")
pricer = Pricer()
pricer.update_autoscaler(scaledown_window=1200) # stay warm for 20 minutes
Takeaway: Modal turns "deploy a fine-tuned model as a microservice" into a one-line decorator and a one-line CLI command. The mental model — write a normal Python function, decorate it, deploy it, call it like a remote object — is something I'll reuse for any future "specialist model as a service" project.
Day 2: RAG, the Frontier Agent, and an Ensemble of Pricers
Day 2 was about a different way to make a frontier model (GPT-5.1) better at a narrow task — Retrieval Augmented Generation (RAG) — and then about combining multiple pricing strategies into one.
Building the Vector Store
The first ingredient is a local, open-source sentence embedding model, which turns text into a 384-dimensional vector capturing its meaning:
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Pass in a list of texts, get back a numpy array of vectors
vector = encoder.encode(["A proficient AI engineer who has almost reached the finale of AI Engineering Core Track!"])[0]
print(vector.shape) # (384,)
These vectors get stored — along with the product description and metadata (category, price) — in a Chroma vector database, batched 1,000 items at a time across hundreds of thousands of products:
collection_name = "products"
existing_collection_names = [collection.name for collection in client.list_collections()]
if collection_name not in existing_collection_names:
collection = client.create_collection(collection_name)
for i in tqdm(range(0, len(train), 1000)):
documents = [item.summary for item in train[i: i+1000]]
vectors = encoder.encode(documents).astype(float).tolist()
metadatas = [{"category": item.category, "price": item.price} for item in train[i: i+1000]]
ids = [f"doc_{j}" for j in range(i, i+1000)]
ids = ids[:len(documents)]
collection.add(ids=ids, documents=documents, embeddings=vectors, metadatas=metadatas)
collection = client.get_or_create_collection(collection_name)
Visualizing the Vector Space
One of the most satisfying moments was reducing those 384-dimensional vectors down to 3D with t-SNE and seeing the products cluster by category — electronics in one corner, musical instruments in another:
from sklearn.manifold import TSNE
import plotly.graph_objects as go
tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)
fig = go.Figure(data=[go.Scatter3d(
x=reduced_vectors[:, 0],
y=reduced_vectors[:, 1],
z=reduced_vectors[:, 2],
mode='markers',
marker=dict(size=2, color=colors, opacity=0.7),
text=[f"Category: {c}<br>Text: {d[:50]}..." for c, d in zip(categories, documents)],
hoverinfo='text'
)])
fig.update_layout(
title='3D Chroma Vector Store Visualization',
scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
width=1200, height=800,
margin=dict(r=20, b=10, l=10, t=40)
)
fig.show()
It's one thing to be told "embeddings capture semantic similarity" — it's another to literally see the same product categories form tight clusters in 3D space.
Using Retrieval to Ground a Frontier Model
The actual RAG technique is simple once the vector store exists: for a new product, find its 5 nearest neighbours, and stuff their descriptions and prices into the prompt as context before asking GPT-5.1 to estimate the price:
def find_similars(item):
vec = vector(item)
results = collection.query(query_embeddings=vec.astype(float).tolist(), n_results=5)
documents = results['documents'][0][:]
prices = [m['price'] for m in results['metadatas'][0][:]]
return documents, prices
def make_context(similars, prices):
message = "For context, here are some other items that might be similar to the item you need to estimate.\n\n"
for similar, price in zip(similars, prices):
message += f"Potentially related product:\n{similar}\nPrice is ${price:.2f}\n\n"
return message
def messages_for(item, similars, prices):
message = f"Estimate the price of this product. Respond with the price, no explanation\n\n{item.summary}\n\n"
message += make_context(similars, prices)
return [{"role": "user", "content": message}]
def gpt_5__1_rag(item):
documents, prices = find_similars(item)
response = completion(model="gpt-5.1", messages=messages_for(item, documents, prices), reasoning_effort="none", seed=42)
return response.choices[0].message.content
This became the heart of the Frontier Agent.
The Ensemble: Combining Three Very Different Models
The biggest "aha" of Day 2 was realizing that three completely different approaches to the same problem — estimate a product's price — could be blended into something better than any one of them alone:
-
RAG + GPT-5.1 (
gpt_5__1_rag) — frontier model with retrieved context -
The fine-tuned Specialist running on Modal (
specialist) — small model, fine-tuned specifically on this task -
A classic Deep Neural Network trained on the embeddings from Week 6 (
deep_neural_network)
def get_price(reply):
reply = reply.replace("$", "").replace(",", "")
match = re.search(r"[-+]?\d*\.\d+|\d+", reply)
return float(match.group()) if match else 0
def specialist(item):
return pricer.price.remote(item.summary)
def ensemble(item):
price1 = get_price(gpt_5__1_rag(item))
price2 = specialist(item)
price3 = deep_neural_network(item)
return price1 * 0.8 + price2 * 0.1 + price3 * 0.1
The weighting (0.8 / 0.1 / 0.1) was chosen because, when evaluated against held-out test data, the RAG-based frontier model was the strongest individual predictor — but the other two still nudged the final estimate in a useful direction. This is essentially a tiny, hand-tuned mixture-of-experts, and it generalizes to most "estimate a number from text" problems: get a few independent estimators, then blend.
By the end of Day 2, all three of these had been wrapped into proper agent classes — FrontierAgent, NeuralNetworkAgent, and EnsembleAgent — each exposing a simple .price(description) method, ready to be called by higher-level orchestration.
Day 3: Scanning the Web and Pushing Notifications
Day 2 answered "given a deal, how much is it really worth?". Day 3 answered the question that has to come first: "where do the deals come from in the first place, and how do I find out about a good one without staring at a screen?"
The Scanner Agent
The Scanner Agent subscribes to deal RSS feeds, scrapes the raw listings, and then asks a cheap LLM (openai/gpt-oss-20b:free via OpenRouter) to pick the 5 best-described deals — specifically ones where the price is unambiguous, since deal sites love phrases like "$50 off" which describe the discount, not the price.
The prompt design here was a small lesson in itself — being explicit about edge cases massively improves reliability:
SYSTEM_PROMPT = """You identify and summarize the 5 most detailed deals from a list, by selecting deals that have the most detailed, high quality description and the most clear price.
Respond strictly in JSON with no explanation, using this format. You should provide the price as a number derived from the description. If the price of a deal isn't clear, do not include that deal in your response.
Most important is that you respond with the 5 deals that have the most detailed product description with price. It's not important to mention the terms of the deal; most important is a thorough description of the product.
Be careful with products that are described as "$XXX off" or "reduced by $XXX" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price.
"""
USER_PROMPT_PREFIX = """Respond with the most promising 5 deals from this list, selecting those which have the most detailed, high quality product description and a clear price that is greater than 0.
You should rephrase the description to be a summary of the product itself, not the terms of the deal.
Remember to respond with a short paragraph of text in the product_description field for each of the 5 items that you select.
Be careful with products that are described as "$XXX off" or "reduced by $XXX" - this isn't the actual price of the product. Only respond with products when you are highly confident about the price.
Deals:
"""
USER_PROMPT_SUFFIX = "\n\nInclude exactly 5 deals, no more."
Combined with structured output (Pydantic models via .chat.completions.parse(... response_format=DealSelection ...)), this guarantees the agent returns exactly the shape of data the rest of the pipeline expects — no brittle JSON-parsing of free text required.
The Messaging Agent and Pushover
The last piece of Day 3 was closing the loop with the outside world: push notifications. Pushover makes this almost embarrassingly easy — register an app, get a user key and an API token, and send a notification with a single HTTP POST:
pushover_user = os.getenv('PUSHOVER_USER')
pushover_token = os.getenv('PUSHOVER_TOKEN')
pushover_url = "https://api.pushover.net/1/messages.json"
def push(message):
print(f"Push: {message}")
payload = {"user": pushover_user, "token": pushover_token, "message": message}
requests.post(pushover_url, data=payload)
This got wrapped into a MessagingAgent with a .notify(description, deal_price, estimated_value, url) method — turning "we found a great deal" into "your phone buzzes."
Takeaway: Agentic systems feel magical, but a lot of the magic is just plumbing — RSS feeds in, structured LLM output, push notifications out. Getting the plumbing rock-solid (and the prompts very explicit about edge cases) is what makes the "intelligent" part trustworthy.
Day 4: The Autonomous Planning Agent — Teaching an LLM to Use Tools
This was, for me, the most conceptually important day. Up to this point, every agent was called explicitly by my code: "now run the scanner," "now run the ensemble," "now send a notification." Day 4 flips that around — the LLM itself decides what to do and in what order, by calling tools.
Step 1: Fake Tools, Real Concepts
Before wiring up the real agents, the notebook builds three fake functions just to understand the tool-calling loop:
def scan_the_internet_for_bargains() -> str:
""" This tool scans the internet for great deals and gets a curated list of promising deals """
print("Fake function to scan the internet - this returns a hardcoded set of deals")
return test_results.model_dump_json()
def estimate_true_value(description: str) -> str:
"""
This tool estimates the true value of a product based on a text description of it
"""
print(f"Fake function to estimating true value of {description[:20]}... - this always returns $300")
return f"Product {description} has an estimated true value of $300"
def notify_user_of_deal(description: str, deal_price: float, estimated_true_value: float, url: str) -> str:
"""
This tool notifies the user of a great deal, given a description of it, the price of the deal, and the estimated true value
"""
print(f"Fake function to notify user of {description} which costs {deal_price} and estimate is {estimated_true_value}")
return "notification sent ok"
Each tool also needs a JSON Schema describing its name, description, and parameters — this is what actually gets sent to the LLM so it knows what's available and how to call it:
scan_function = {
"name": "scan_the_internet_for_bargains",
"description": "Returns top bargains scraped from the internet along with the price each item is being offered for",
"parameters": {
"type": "object",
"properties": {},
"required": [],
"additionalProperties": False
}
}
notify_function = {
"name": "notify_user_of_deal",
"description": "Send the user a push notification about the single most compelling deal; only call this one time",
"parameters": {
"type": "object",
"properties": {
"description": {"type": "string", "description": "The description of the item itself scraped from the internet"},
"deal_price": {"type": "number", "description": "The price offered by this deal scraped from the internet"},
"estimated_true_value": {"type": "number", "description": "The estimated actual value that this is worth"},
"url": {"type": "string", "description": "The URL of this deal as scraped from the internet"}
},
"required": ["description", "deal_price", "estimated_true_value", "url"],
"additionalProperties": False
}
}
tools = [{"type": "function", "function": scan_function},
{"type": "function", "function": estimate_function},
{"type": "function", "function": notify_function}]
Step 2: The Agent Loop
The real magic is this loop. The LLM is given the tools and a goal; if it decides to call a tool, the code executes the real Python function and feeds the result back in — and this repeats until the model is satisfied:
def handle_tool_call(message):
"""
Actually call the tools associated with this message
"""
results = []
for tool_call in message.tool_calls:
tool_name = tool_call.function.name
raw_args = json.loads(tool_call.function.arguments)
tool = globals().get(tool_name)
if tool:
# Some models (especially smaller free ones) sometimes return
# stray/invalid keys (like "") in the arguments JSON, even for
# functions that take no parameters. Filter to only the keys
# the function actually accepts.
valid_params = set(inspect.signature(tool).parameters.keys())
arguments = {k: v for k, v in raw_args.items() if k in valid_params}
result = tool(**arguments)
else:
result = {}
results.append({"role": "tool", "content": json.dumps(result), "tool_call_id": tool_call.id})
return results
system_message = "You find great deals on bargain products using your tools, and notify the user of the best bargain."
user_message = """
First, use your tool to scan the internet for bargain deals. Then for each deal, use your tool to estimate its true value.
Then pick the single most compelling deal where the price is much lower than the estimated true value, and use your tool to notify the user.
Then just reply OK to indicate success.
"""
messages = [{"role": "system", "content": system_message}, {"role": "user", "content": user_message}]
done = False
while not done:
response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)
if response.choices[0].finish_reason == "tool_calls":
message = response.choices[0].message
results = handle_tool_call(message)
messages.append(message)
messages.extend(results)
else:
done = True
response.choices[0].message.content
A subtlety that's easy to miss but really matters in practice: smaller, free-tier models sometimes hallucinate extra arguments in their tool calls (like an empty-string key "" for a function that takes no parameters at all). The fix — filtering raw_args down to only the parameters the function's signature actually accepts via inspect.signature — is the kind of defensive coding that's invisible until you're debugging a mysterious TypeError at 11pm.
Step 3: Swap Fakes for Reality
Once the loop works with fake functions, the swap to the real AutonomousPlanningAgent is almost anticlimactic — same loop, same tool schemas, but scan_the_internet_for_bargains now really calls the ScannerAgent, estimate_true_value really calls the EnsembleAgent, and notify_user_of_deal really calls the MessagingAgent:
DB = "products_vectorstore"
client = chromadb.PersistentClient(path=DB)
collection = client.get_or_create_collection('products')
from agents.autonomous_planning_agent import AutonomousPlanningAgent
agent = AutonomousPlanningAgent(collection)
agent.plan()
Takeaway: Tool/function calling turns an LLM from "a thing that writes text" into "a thing that orchestrates other systems." The hard part isn't the API call — it's (a) writing tight descriptions so the model picks the right tool, and (b) writing tolerant glue code, because the model will occasionally send malformed arguments.
Day 5: Bringing It All Together — Framework, Memory, and a Live UI
The final day was about productionizing: wrapping everything in a reusable framework with persistent memory, colored logs, and a Gradio dashboard that updates in real time.
The Deal Agent Framework
DealAgentFramework is the top-level orchestrator. It owns the Chroma client, lazily creates the PlanningAgent, and — critically — persists discovered deals to memory.json so the system remembers what it's already found across restarts:
import os
import sys
import logging
import json
from typing import List
from dotenv import load_dotenv
import chromadb
from agents.planning_agent import PlanningAgent
from agents.deals import Opportunity
from sklearn.manifold import TSNE
import numpy as np
load_dotenv(override=True)
# Colors for logging
BG_BLUE = "\033[44m"
WHITE = "\033[37m"
RESET = "\033[0m"
# Colors for plot
CATEGORIES = [
"Appliances",
"Automotive",
"Cell_Phones_and_Accessories",
"Electronics",
"Musical_Instruments",
"Office_Products",
"Tools_and_Home_Improvement",
"Toys_and_Games",
]
COLORS = ["red", "blue", "brown", "orange", "yellow", "green", "purple", "cyan"]
def init_logging():
root = logging.getLogger()
root.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
formatter = logging.Formatter(
"[%(asctime)s] [Agents] [%(levelname)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S %z",
)
handler.setFormatter(formatter)
root.addHandler(handler)
class DealAgentFramework:
DB = "products_vectorstore"
MEMORY_FILENAME = "memory.json"
def __init__(self):
init_logging()
client = chromadb.PersistentClient(path=self.DB)
self.memory = self.read_memory()
self.collection = client.get_or_create_collection("products")
self.planner = None
def init_agents_as_needed(self):
if not self.planner:
self.log("Initializing Agent Framework")
self.planner = PlanningAgent(self.collection)
self.log("Agent Framework is ready")
def read_memory(self) -> List[Opportunity]:
if os.path.exists(self.MEMORY_FILENAME):
with open(self.MEMORY_FILENAME, "r") as file:
data = json.load(file)
opportunities = [Opportunity(**item) for item in data]
return opportunities
return []
def write_memory(self) -> None:
data = [opportunity.model_dump() for opportunity in self.memory]
with open(self.MEMORY_FILENAME, "w") as file:
json.dump(data, file, indent=2)
@classmethod
def reset_memory(cls) -> None:
data = []
if os.path.exists(cls.MEMORY_FILENAME):
with open(cls.MEMORY_FILENAME, "r") as file:
data = json.load(file)
truncated = data[:2]
with open(cls.MEMORY_FILENAME, "w") as file:
json.dump(truncated, file, indent=2)
def log(self, message: str):
text = BG_BLUE + WHITE + "[Agent Framework] " + message + RESET
logging.info(text)
def run(self) -> List[Opportunity]:
self.init_agents_as_needed()
logging.info("Kicking off Planning Agent")
result = self.planner.plan(memory=self.memory)
logging.info(f"Planning Agent has completed and returned: {result}")
if result:
self.memory.append(result)
self.write_memory()
return self.memory
@classmethod
def get_plot_data(cls, max_datapoints=2000):
client = chromadb.PersistentClient(path=cls.DB)
collection = client.get_or_create_collection("products")
result = collection.get(
include=["embeddings", "documents", "metadatas"], limit=max_datapoints
)
vectors = np.array(result["embeddings"])
documents = result["documents"]
categories = [metadata["category"] for metadata in result["metadatas"]]
colors = [COLORS[CATEGORIES.index(c)] for c in categories]
tsne = TSNE(n_components=3, random_state=42, n_jobs=-1)
reduced_vectors = tsne.fit_transform(vectors)
return documents, reduced_vectors, colors
if __name__ == "__main__":
DealAgentFramework().run()
A few patterns I want to remember from this file:
-
Lazy initialization (
init_agents_as_needed) — spinning up the full agent stack (which includes loading models and connecting to vector stores) is expensive, so it only happens once, on first use. -
Memory as a flat JSON file — no database needed for a project at this scale.
memory.jsonis literally a list ofOpportunityobjects (a deal + an estimated value + a discount), serialized via Pydantic'smodel_dump(). -
reset_memoryas a classmethod — a clean way to "rewind" the demo back to a known state (2 saved deals) without touching the running instance. -
Colored terminal logging via raw ANSI escape codes (
BG_BLUE,WHITE,RESET) — a small touch, but it makes the live log stream from multiple agents much easier to scan visually.
Reformatting ANSI Colors for the Browser
Speaking of colors — the terminal uses ANSI escape codes, but the Gradio UI renders HTML. log_utils.py is a tiny but clever bridge between the two: it maps each ANSI color combination to a CSS hex color and swaps the escape codes for <span style="color: ..."> tags:
# Foreground colors
RED = '\033[31m'
GREEN = '\033[32m'
YELLOW = '\033[33m'
BLUE = '\033[34m'
MAGENTA = '\033[35m'
CYAN = '\033[36m'
WHITE = '\033[37m'
# Background color
BG_BLACK = '\033[40m'
BG_BLUE = '\033[44m'
# Reset code to return to default color
RESET = '\033[0m'
mapper = {
BG_BLACK+RED: "#dd0000",
BG_BLACK+GREEN: "#00dd00",
BG_BLACK+YELLOW: "#dddd00",
BG_BLACK+BLUE: "#0000ee",
BG_BLACK+MAGENTA: "#aa00dd",
BG_BLACK+CYAN: "#00dddd",
BG_BLACK+WHITE: "#87CEEB",
BG_BLUE+WHITE: "#ff7800"
}
def reformat(message):
for key, value in mapper.items():
message = message.replace(key, f'<span style="color: {value}">')
message = message.replace(RESET, '</span>')
return message
Every agent in the system logs its activity with a different color (set in its own __init__), so when this gets rendered in the browser, you can visually tell at a glance which agent is talking — the planner, the scanner, the frontier agent, etc. — without reading a single word.
The Gradio UI: From Static Mockup to Live Dashboard
The UI was built up in layers, which I think is a great way to learn Gradio:
Layer 1 — just get something on screen:
with gr.Blocks(title="The Price is Right", fill_width=True) as ui:
with gr.Row():
gr.Markdown('<div style="text-align: center;font-size:24px">The Price is Right - Deal Hunting Agentic AI</div>')
with gr.Row():
gr.Markdown('<div style="text-align: center;font-size:14px">Autonomous agent framework that finds online deals, collaborating with a proprietary fine-tuned LLM deployed on Modal, and a RAG pipeline with a frontier model and Chroma.</div>')
ui.launch(inbrowser=True)
Layer 2 — add a live data table backed by application state:
with gr.Blocks(title="The Price is Right", fill_width=True) as ui:
initial_deal = Deal(product_description="Example description", price=100.0, url="https://cnn.com")
initial_opportunity = Opportunity(deal=initial_deal, estimate=200.0, discount=100.0)
opportunities = gr.State([initial_opportunity])
def get_table(opps):
return [[opp.deal.product_description, opp.deal.price, opp.estimate, opp.discount, opp.deal.url] for opp in opps]
with gr.Row():
opportunities_dataframe = gr.Dataframe(
headers=["Description", "Price", "Estimate", "Discount", "URL"],
wrap=True,
column_widths=[4, 1, 1, 1, 2],
row_count=10,
col_count=5,
max_height=400,
)
ui.load(get_table, inputs=[opportunities], outputs=[opportunities_dataframe])
ui.launch(inbrowser=True)
A small but important Gradio version note from this layer: in Gradio v5, the height parameter for Dataframe was renamed to max_height — exactly the kind of breaking change that's easy to lose an hour to if you don't know to look for it.
Layer 3 — wire up real agents and make rows clickable:
agent_framework = DealAgentFramework()
agent_framework.init_agents_as_needed()
with gr.Blocks(title="The Price is Right", fill_width=True) as ui:
...
def do_select(opportunities, selected_index: gr.SelectData):
row = selected_index.index[0]
opportunity = opportunities[row]
agent_framework.planner.messenger.alert(opportunity)
...
opportunities_dataframe.select(do_select, inputs=[opportunities], outputs=[])
ui.launch(inbrowser=True)
The Final Application: Streaming Logs + Background Agent Run + 3D Plot
The fully assembled price_is_right.py brings everything together: a background thread runs the agent framework's run() loop, a queue.Queue-based logging handler streams log lines into the UI in (near) real time, and a 3D Plotly visualization of the product vector store sits alongside the deal table:
import logging
import queue
import threading
import time
import gradio as gr
from deal_agent_framework import DealAgentFramework
from log_utils import reformat
import plotly.graph_objects as go
from dotenv import load_dotenv
load_dotenv(override=True)
class QueueHandler(logging.Handler):
def __init__(self, log_queue):
super().__init__()
self.log_queue = log_queue
def emit(self, record):
self.log_queue.put(self.format(record))
def html_for(log_data):
output = "<br>".join(log_data[-18:])
return f"""
<div id="scrollContent" style="height: 400px; overflow-y: auto; border: 1px solid #ccc; background-color: #222229; padding: 10px;">
{output}
</div>
"""
def setup_logging(log_queue):
handler = QueueHandler(log_queue)
formatter = logging.Formatter(
"[%(asctime)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S %z",
)
handler.setFormatter(formatter)
logger = logging.getLogger()
logger.addHandler(handler)
logger.setLevel(logging.INFO)
class App:
def __init__(self):
self.agent_framework = None
def get_agent_framework(self):
if not self.agent_framework:
self.agent_framework = DealAgentFramework()
return self.agent_framework
def run(self):
with gr.Blocks(title="The Price is Right", fill_width=True) as ui:
log_data = gr.State([])
def table_for(opps):
return [
[
opp.deal.product_description,
f"${opp.deal.price:.2f}",
f"${opp.estimate:.2f}",
f"${opp.discount:.2f}",
opp.deal.url,
]
for opp in opps
]
def update_output(log_data, log_queue, result_queue):
initial_result = table_for(self.get_agent_framework().memory)
final_result = None
while True:
try:
message = log_queue.get_nowait()
log_data.append(reformat(message))
yield log_data, html_for(log_data), final_result or initial_result
except queue.Empty:
try:
final_result = result_queue.get_nowait()
yield log_data, html_for(log_data), final_result or initial_result
except queue.Empty:
if final_result is not None:
break
time.sleep(0.1)
def get_plot():
documents, vectors, colors = DealAgentFramework.get_plot_data(max_datapoints=800)
fig = go.Figure(
data=[
go.Scatter3d(
x=vectors[:, 0],
y=vectors[:, 1],
z=vectors[:, 2],
mode="markers",
marker=dict(size=2, color=colors, opacity=0.7),
)
]
)
fig.update_layout(
scene=dict(
xaxis_title="x", yaxis_title="y", zaxis_title="z",
aspectmode="manual",
aspectratio=dict(x=2.2, y=2.2, z=1),
camera=dict(eye=dict(x=1.6, y=1.6, z=0.8)),
),
height=400,
margin=dict(r=5, b=1, l=5, t=2),
)
return fig
def do_run():
new_opportunities = self.get_agent_framework().run()
return table_for(new_opportunities)
def run_with_logging(initial_log_data):
log_queue = queue.Queue()
result_queue = queue.Queue()
setup_logging(log_queue)
def worker():
result_queue.put(do_run())
thread = threading.Thread(target=worker)
thread.start()
for log_data, output, final_result in update_output(initial_log_data, log_queue, result_queue):
yield log_data, output, final_result
def do_select(selected_index: gr.SelectData):
opportunities = self.get_agent_framework().memory
row = selected_index.index[0]
opportunity = opportunities[row]
self.get_agent_framework().planner.messenger.alert(opportunity)
with gr.Row():
opportunities_dataframe = gr.Dataframe(
headers=["Deals found so far", "Price", "Estimate", "Discount", "URL"],
wrap=True, column_widths=[6, 1, 1, 1, 3],
row_count=10, col_count=5, max_height=400,
)
with gr.Row():
with gr.Column(scale=1):
logs = gr.HTML()
with gr.Column(scale=1):
plot = gr.Plot(value=get_plot(), show_label=False)
ui.load(run_with_logging, inputs=[log_data], outputs=[log_data, logs, opportunities_dataframe])
timer = gr.Timer(value=300, active=True)
timer.tick(run_with_logging, inputs=[log_data], outputs=[log_data, logs, opportunities_dataframe])
opportunities_dataframe.select(do_select)
ui.launch(share=False, inbrowser=True)
if __name__ == "__main__":
App().run()
The two patterns I most want to carry forward from this file:
-
Background work + streaming UI via a generator.
run_with_loggingis a Python generator hooked up as a Gradio event handler. It kicks off a worker thread, then repeatedlyyields updated state — so the UI refreshes live while a slow agentic process runs, instead of freezing for the whole duration. -
gr.Timerfor autonomous operation. ATimerset to 300 seconds means the whole "scan → estimate → notify" cycle re-runs automatically every 5 minutes — turning a notebook experiment into something that genuinely behaves like a background agent.
What I'd Tell Past-Me Before Starting This
A few cross-cutting lessons that apply far beyond this specific project:
- Specialize, then orchestrate. Each agent (Scanner, Specialist, Frontier, Ensemble, Messaging, Planner) does one narrow thing well. The "intelligence" of the overall system comes from composition, not from any single giant prompt.
- RAG is cheap leverage. Adding 5 similar examples with known prices to a prompt turned out to meaningfully improve a frontier model's estimates — for the cost of a vector lookup.
-
Ensembling beats picking a favorite. Rather than agonizing over which pricing approach is "best," a weighted blend of three approaches (
0.8 / 0.1 / 0.1) outperformed any single one on the held-out test set. - Tool calling needs defensive code. The model will send slightly malformed arguments sometimes, especially smaller/free models. Filter arguments against the real function signature before calling it.
-
Memory is just a JSON file (for now). Don't reach for a database until you actually need one — persisting a
List[Opportunity]tomemory.jsonwas enough to give the system continuity across restarts. - Generators + threads make agentic UIs feel alive. A background worker thread plus a generator-based Gradio handler is a lightweight way to show "the agent is thinking" in real time.
- Small UX details (colored logs, a live 3D plot, clickable rows) make an agentic system feel trustworthy — you can see what it's doing and why, which matters enormously when the system is making autonomous decisions on your behalf.
Putting it all together, DealAgentFramework().run() now quietly: scans deal feeds, filters to the 5 best-described deals, estimates each one's true value via an ensemble of three models, picks the single best opportunity, saves it to memory, and — if it's a great deal — buzzes my phone. All while a live dashboard shows exactly what's happening and why.
Top comments (0)