Isaac Hagoel

Posted on Mar 23

Read This Before Building AI Agents: Lessons From The Trenches

#programming #ai #typescript #python

Key Takeaways

🛠️ Hybridize: Combine LLMs with traditional code for reliability and creativity.

🧩 Specialize: Use multiple agents to avoid complexity thresholds.

📐 Structure: Enforce outputs with Zod/Pydantic schemas to reduce hallucinations.

🔍 Agentic RAG: Let agents control retrieval for dynamic workflows.

⚡ Optimize: Balance token usage, speed, and quality with parallel tool calls.

Over the last few months, I've been diving deep into the world of AI agents. What started as side projects and general curiosity has evolved into actual work projects. This means I'm in the process of crossing over from hobbyist to pro (by definition, you're not a pro until you get paid to do whatever it is you're doing!) and from toy apps to ones with real users.

I'm quite early in my journey and still have so much to learn, yet I'm surprised by how many challenges I've encountered despite reading blogs, watching videos, etc. There are insights that aren't widely shared yet, and this post aims to fill that gap.

My Agent Building Journey (A Brief Overview)

For context, here's what I've worked on so far:

Toy Projects

LinkedIn Job Finder: Split tasks between Playwright scripts (link scraping) and agents (job rating). Initially tried one agent for everything, then realized specialized agents worked better.
QA Automation: Separated test planning (agent) from execution (code). One agent creates test plans by analyzing web pages; code then spawns a second agent to execute tests.
Custom Framework: Built after exploring existing frameworks like CrewAI and finding their abstractions didn't match my needs. This exploration helped me discover which abstractions actually make sense for my use cases.

Work Projects (Limited Details for Confidentiality)

Integration Into Pre-existing Codebases: Several POCs integrating LLMs into existing apps.
Product Analytics Assistant: An internal tool leveraging agentic RAG and other tools, implemented from scratch. Now launching as a closed beta.

What is an AI Agent?

An AI agent orchestrates multiple LLM calls to make decisions, use tools, and achieve goals—beyond single prompts. It's code that wraps around LLM calls, allowing the AI to determine its own path toward a goal rather than just generating a response to a single prompt.

With that said, in real-world applications, I've found that making single LLM calls for specific tasks is often quite practical. I prefer to think of these as 'single-step, tool-less agents' since this mental model is more useful than drawing an artificial distinction between agents and LLM calls. Most apps need a mix of both approaches.

Why and When Agents?

When Agents Excel

LLMs can do things that normal code simply can't - tasks for which there is no conventional algorithm:

Generating creative content ("make this text children friendly")
Making subjective judgments with nuance (e.g., grading job postings based on fuzzy preferences)
Extracting meaning from unstructured data (e.g., key takeaways from documents)
Adaptive control flow - Instead of coding rigid "if" conditions for every situation, you provide guidelines and the LLM adapts

They also bring unique benefits when used to mimic traditional ML systems, like recommenders, because they can handle any text or images without predefined patterns and without needing to train on your data.

The Golden Rule: Code When Possible

Always ask: "Can I code this without losing functionality?"

✅ Traditional code for mechanical tasks (scraping, loops).

🤖 Agents for reasoning/adaptation.

Whenever you consider giving a task to an agent, first ask yourself: "Can I code this whole thing or some part of it without losing functionality?"

If the answer is yes, code it and leave to the agent only what normal code cannot do. Traditional code is orders of magnitude more:

Performant
Accurate
Predictable
Testable
Cost-effective (no token charges)

I learned this lesson the hard way with my LinkedIn job finder. Initially, I asked an agent with browser capability to visit LinkedIn and collect job links. The performance was poor and the agent got confused by the virtual list within a scrolling container. Eventually, I replaced this with a simple Playwright script for link collection, making the system much faster, more accurate, and cheaper.

The tradeoff? The Playwright approach might break if the page markup changes. But for mechanical tasks like data collection, web scraping, or file operations, traditional code is almost always superior.

Similarly, for control flow, if you need to loop through documents, don't tell an agent to "for each document in this list do X." Instead, use a normal loop and spawn an agent for each document (potentially in parallel for efficiency).

Hybrid Is Best

The most powerful agentic applications combine LLMs with conventional code in a synergistic way:

Traditional code handles deterministic tasks, data processing, and integration
LLM agents handle understanding, reasoning, creativity, and adaptation

Our Hypothetical Example: A Marketing Email System

Throughout this post, I'll use a hypothetical marketing email platform as an example. This system creates personalized product recommendations and illustrates many key patterns.

The architecture consists of four specialized agents:

Data Collector Agent - Gathers customer information from databases and public sources
Product Selector Agent - Analyzes customer data to recommend relevant products
Writer Agent - Creates personalized email content using brand templates
Reviewer/Editor Agent - Ensures quality control and requests revisions

This structure demonstrates how agents can collaborate while maintaining focused responsibilities, which brings us to our first critical insight.

Note on Code Examples: All code examples in this post are highly simplified for clarity and illustrative purposes. They're meant to convey concepts rather than provide production-ready implementations.

Critical Insights for Building Effective Agents

1. Respect Complexity Thresholds

Every model fails past a certain complexity threshold. When you cross this threshold, the model struggles to follow instructions and hallucinations increase exponentially.

When developers ask "Why do I need multiple agents?" I know they haven't built real agent systems yet. Once you hit a complexity threshold, you have three options:

Reduce requirements (simplify the task)
Upgrade to a better model (usually more expensive plus the ceiling is the best model available)
Split the task across specialized agents

It's not unlike people - there's a limit to how many instructions a person can follow effectively and how many tools they can wield.

Example from our marketing system:

In our marketing email system, a single agent trying to handle data collection, product selection, writing, and reviewing would struggle. Breaking this into specialized agents creates much more reliable results.

Before (Single Agent):

// Prone to inconsistency and hallucinations
const marketingAgent = new Agent({
  name: "Marketing Email Agent",
  instructions: `Handle collecting customer data, selecting products, writing emails, 
                and reviewing content quality, maintaining brand voice...`,
  tools: [customerDataTool, linkedinTool, interactionHistoryTool, 
          productCatalogTool, templateTool, emailSender],
});

After (Specialized Agents):

// Data collection is now focused and structured
const dataCollectorAgent = new Agent({
  name: "Data Collector Agent",
  instructions: `Gather relevant customer data from internal systems and public sources.
                Focus on professional background, interests, and past interactions...`,
  tools: [customerDbTool, linkedinTool, interactionHistoryTool],
});

// Additional specialized agents would follow a similar pattern

This approach succeeds because:

Each agent excels at a narrower, well-defined task
Prompts can be shorter and more focused
Error recovery is simpler (one malfunctioning agent doesn't derail everything)
Each step can be optimized, tested, and reused independently

Consider using multiple specialized agents when:

The task can be naturally broken into subtasks
Different tasks require different types of reasoning
The prompt would otherwise become unwieldy
You need to maintain different states for different parts of the process

2. Structured Outputs Are Non-Negotiable

Structured outputs (JSON schemas) completely transform agent development. This feature allows you to specify an exact output format and guarantees the model returns data as specified.

The benefits are both obvious and subtle:

Expected benefit: No more begging the model for correct formatting or retrying on malformed outputs.
Unexpected benefit: Schemas can force the model to follow specific reasoning patterns and make better decisions.

Example: Product Selector Agent Schema

const productRecommendationSchema = z.object({
  customerSummary: z.string().describe("Brief summary of customer needs based on data"),
  recommendedProducts: z.array(z.object({
    productId: z.string(),
    productName: z.string(),
    category: z.string(),
    relevanceScore: z.number().min(1).max(10).describe("How relevant for this customer (1-10)"),
    justification: z.string().describe("Detailed reasoning for why this product matches customer needs"),
    sellingPoints: z.array(z.string()).describe("Key points to emphasize in marketing")
  })),
  fallbackRecommendations: z.array(z.object({
    productId: z.string(),
    productName: z.string(),
    category: z.string(),
    reasonForInclusion: z.string().describe("Why this is included as a fallback option")
  })).optional().describe("Secondary recommendations if primary ones don't resonate")
});

This schema does more than format data—it forces the agent to think deeply about product selection:

The justification field requires detailed reasoning for each recommendation
The relevanceScore field forces ranking and prioritization
The sellingPoints array ensures usable content for the Writer Agent

By requiring this structured approach, we get more thoughtful, consistent recommendations rather than superficial matches. The model must actually think through its choices and we make it easy to JSON.parse and continue processing it using code (or other agents).

Key takeaway: 📐Structured outputs make calling a LLM / Agent akin to calling any other remote API

3. Language Choices: TypeScript vs. Python

	TypeScript	Python
Static Type System	✅ First-class, enforced at compile time	🟡 Optional type hints, not enforced
Runtime Validation	✅ Zod	✅ Pydantic
Async Support	✅ Native	🟡 Requires asyncio
ML Libraries	🟡 Growing	✅ Dominant
JSON Handling	✅ Native	🟡 Import json
Developer Tools	✅ Excellent	✅ Good
Package Management	✅ npm/yarn with package.json and lock files	🟡 pip with requirements.txt (no lock by default) or Poetry/Pipenv (with locks)

Both languages are excellent choices for agent development, with different strengths:

Python has established itself as the primary language in the AI/ML ecosystem:

Rich ecosystem of ML and AI libraries with first-class support
Most agent frameworks and tutorials are Python-first
Excellent data processing capabilities
Familiar to data scientists and ML practitioners
Strong support through libraries like Pydantic for structured outputs

TypeScript offers compelling advantages for software engineers:

Static typing helps prevent runtime errors and enables better tooling
First-class support for structured outputs via Zod integration
Native JSON handling simplifies working with API responses
Robust async/await pattern for managing concurrent operations
Unified language for both frontend and backend development

My personal preference leans toward TypeScript because:

// Example: Type safety with runtime validation using Zod
const productRecommendationSchema = z.object({
  customerSummary: z.string(),
  recommendedProducts: z.array(z.object({
    productId: z.string(),
    relevanceScore: z.number().describe("a number between one 1 and 10),
    justification: z.string()
  }))
});

// TypeScript automatically infers the correct type
type ProductRecommendation = z.infer<typeof productRecommendationSchema>;

// You get autocompletion and type checking when working with validated data
function processRecommendation(rec: ProductRecommendation) {
  // Access properties with confidence - TypeScript knows the structure
  const topProduct = rec.recommendedProducts.sort((a, b) => 
    b.relevanceScore - a.relevanceScore
  )[0];

  return `Top recommendation: ${topProduct.justification}`;
}

This combination of static typing, runtime validation, and excellent tooling significantly improves developer confidence when working with complex agent systems.

Choose based on your team's expertise and specific requirements rather than following any single recommendation.

4. Prompt Engineering Is Real Engineering

Most developers who haven't built agent systems joke about prompt engineering not being "real" engineering. This misconception disappears quickly once you try to build production-grade agents.

Unlike casual ChatGPT conversations where you can refine through back-and-forth clarification, production agents need carefully crafted prompts that handle diverse situations without human intervention. As your apps get more ambitious, your prompts will inflate to monster length that dwarfs the actual user query.

Your prompt must include:

Domain terminology definitions
Tool usage guidelines
Strategies you want the agent to follow
Output format requirements
Error handling instructions

Example: Writer Agent Prompt (Partial)

### WRITER AGENT INSTRUCTIONS ###

Your task is to create highly personalized marketing emails that convert. You will be provided with:
1. Customer profile data
2. Product recommendations with relevance scores and selling points
3. Brand voice guidelines and templates

## BRAND VOICE RULES:
- Friendly but professional, never pushy
- Avoid hyperbole ("best ever", "amazing") in favor of specific benefits
- Use active voice and concise sentences
- Address the customer by name at least once, but no more than twice
- Each paragraph should be 2-3 sentences maximum for readability

## EMAIL STRUCTURE REQUIREMENTS:
- Subject line: Clear value proposition, 30-60 characters, no exclamation points
- Opening: Acknowledge a specific detail from customer data to establish relevance
- Body: Focus on 1-2 top products only, emphasizing only the 3 most relevant selling points
- Call to action: ONE clear next step, using benefit-focused language
- Signature: Include personalized note if customer has history with specific representative

## PROCESS STEPS:
1. Review customer data completely before writing anything
2. Select template that best matches product category
3. Customize template with specific customer details and product benefits
4. Review against brand voice rules
5. If customer is enterprise-level (>250 employees), emphasize ROI and strategic benefits
6. If customer is SMB (<250 employees), emphasize ease of implementation and quick wins

## CRITICAL GUIDELINES:
- NEVER mention pricing unless specifically included in the product recommendation
- ALWAYS check that product names are correctly used (exact spelling and capitalization)
- If customer has previously purchased from us, acknowledge this with gratitude
- NEVER exceed 200 words total for the email body

My advice:

Be explicit and detailed - spell out everything, don't assume the model knows your preferences
Iterate through testing - refine based on agent behavior across diverse queries
Structure prompts logically - separate sections for terminology, process, examples, etc.
Include diverse examples - covering edge cases and common scenarios

This careful crafting of prompts takes significant time and iteration—real engineering work that directly affects system performance.

5. Context Window as a Whiteboard

Most people don't realize that LLMs are stateless. Each call to an LLM is entirely new—the model has no memory of previous interactions. The illusion of continuous conversation in tools like ChatGPT comes from including the entire conversation history with each new request.

This has major implications for agent development:

The context window is the maximum text the model can "see" at once (tokens). Modern models have large windows (128K-1M tokens), but you face several challenges:

Statelessness: Each time you call an LLM, you must resend the entire history, not just the latest message
Performance overhead: Larger context = slower processing and higher costs
Token-per-minute (TPM) limits: API rate limits often restrict how much text you can send per minute

For example, OpenAI's 30,000 tokens-per-minute limit for tier 1 customers means you'll never utilize more than ~25% of a 128K token context window, even if you only make one request per minute.

Strategies for token management:

🧼 Prune: Remove irrelevant history.
🎯 Focus: Only include critical tool outputs.
🔀 Parallelize: Batch tool calls to reduce roundtrips.

// Example: Optimizing token usage with parallel tool calls
async function researchCustomer(customerId) {
  // Request multiple tool calls in one go (in reality - the agent will provide this array if instructed to do so)
  const toolCalls = [
    { tool: "fetchCustomerData", args: { customerId } },
    { tool: "getInteractionHistory", args: { customerId } },
    { tool: "analyzeIndustryTrends", args: { industry: "retail" } }
  ];

  // Execute all tool calls in parallel
  const toolResults = await Promise.all(toolCalls.map(call => 
    executeToolCall(call.tool, call.args)
  ));

  // Send all results to agent at once, reducing round-trips
  return await customerAnalysisAgent.invoke({
    task: "Analyze customer for product recommendations",
    toolResults
  });
}

This approach significantly reduces the number of back-and-forth exchanges, improving performance and latency while potentially making more efficient use of your tokens-per-minute (TPM) quota by bundling multiple operations into fewer API calls.

6. Advanced RAG: Beyond Basic Retrieval

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for agents with access to external knowledge. However, there's a significant gap between basic implementations and truly effective RAG systems.

Traditional RAG vs. Agentic RAG

	Traditional RAG	Agentic RAG
Control	Code-driven	Agent-driven
Flexibility	Static queries	Dynamic, multi-step retrieval
Use Case	Predictable needs	Exploratory tasks

When I first started with RAG, I held two misconceptions:

Embeddings are just fancy keyword matching - I thought embeddings were simple hash functions for basic text matching. In reality, they capture complex semantic relationships between concepts.
Just stuff everything in the context window - I believed that if my knowledge base could fit in the context window, I should include everything. This degrades performance by forcing the model to filter signal from noise.

Traditional RAG is implemented like this:

  // 1. Code creates search query from customer data
const searchQuery = `${customerData.industry} email templates for ${productRecommendations[0].category}`;

// 2. Search for relevant email templates
const searchResults = await searchMarketingEmails(searchQuery, 3);

// 3. Add results to the agent's prompt
const writerAgent = new Agent({
instructions: `Create personalized emails based on customer data and product recommendations.

                Here are some successful examples to draw inspiration from:
                ${formatSearchResults(searchResults)}`,
outputSchema: emailSchema
});

This works when retrieval needs are predictable. But it has limitations:

The retrieval happens once, before the agent starts working
The search query is predetermined by your code
The agent can't request more information as it discovers new directions

Agentic RAG addresses these limitations by giving the search capability directly to the agent:

// Example: Agent with search tool
const writerAgent = new Agent({
  name: "Writer Agent",
  instructions: `Create personalized marketing emails based on customer data and product recommendations.
                Use the emailSearch tool to find inspiration from successful past campaigns.`,
  tools: [MarketingTools.emailSearch], // Agent can search whenever it wants
  outputSchema: emailSchema
});

This allows the agent to:

Make multiple searches with different queries as understanding evolves
Refine searches based on intermediate results
Search for different aspects (industry language, effective CTAs, etc.)
Decide when enough information has been gathered

Hybrid Analytics: RAG + SQL

Even agentic RAG has limitations. Vector similarity can't:

Aggregate information across documents
Detect patterns and trends
Answer questions requiring numerical analysis

To overcome these limitations, I combine RAG with analytical tools—particularly SQL access to structured versions of the same data:

// SQL tool for product performance analytics
const sqlQueryTool = createTool({
  name: "runSqlQuery",
  description: "Run SQL queries against marketing performance database",
  argsObject: z.object({
    query: z.string().describe("SQL query to execute")
  }),
  execute: async ({ query }) => {
    // Safety checks would happen here
    return await executeQueryAgainstDatabase(query);
  }
});

// Agent with both RAG and SQL capabilities
const productSelectorAgent = new Agent({
  instructions: `Analyze customer data to recommend products.
                Use SQL for trend analysis across segments.
                Use productSearch for detailed product information.
                Combine insights from both for optimal recommendations.`,
  tools: [MarketingTools.productSearch, sqlQueryTool],
  outputSchema: productRecommendationSchema
});

With this combination, the agent might:

Use SQL to analyze which product categories perform best for healthcare companies with 100-500 employees:

   SELECT 
     product_category,
     AVG(conversion_rate) as avg_conversion,
     COUNT(*) as purchase_count
   FROM purchase_history
   WHERE customer_industry = 'Healthcare' 
     AND customer_size BETWEEN 100 AND 500
   GROUP BY product_category
   ORDER BY avg_conversion DESC
   LIMIT 5

Then use RAG to find specific products within those categories that match the customer's unique needs

This hybrid approach is particularly valuable for large product catalogs where browsing everything would be impractical.

7. The Four Control Flow Patterns in Agentic Applications

After building several agent systems, I've discovered four distinct control flow patterns, each with different implications:

1. Code → Code (Traditional Programming)

Standard function calls with predetermined inputs and outputs. Predictable, testable, efficient, but lacks adaptability.

2. Code → Agent (Outsourcing Decisions)

Code invokes an agent, temporarily handing over control. The agent performs multiple reasoning steps before returning.

// Example: Code calling an agent for a specific decision
async function generateMarketingCampaign(targetAudience, products) {
  // Control passes to the agent until it returns
  const emailTemplate = await marketingAgent.invoke({
    targetAudience,
    products,
    goal: "Generate a personalized marketing email"
  });

  // Control returns to our code
  return {
    template: emailTemplate,
    timestamp: new Date(),
    audience: targetAudience
  };
}

This pattern works well for discrete tasks where the agent makes complex decisions but doesn't need ongoing dialogue. It's like calling a specialized API.

3. Agent → Code (Tool Use)

An agent controls the flow and calls code functions (tools) as needed. The agent decides which tools to use and how to interpret results.

// Example: Agent using tools
const researchAgent = new Agent({
  instructions: `Analyze customer data and recommend products.`,
  tools: [
    {
      name: "fetchCustomerData",
      description: "Retrieve customer purchase history",
      execute: async (customerId) => {
        // Regular code fetching from database
        return await database.customers.findById(customerId);
      }
    },
    {
      name: "analyzeSpendingPatterns",
      description: "Analyze spending patterns by category",
      execute: async (purchaseHistory) => {
        // Regular code performing analysis
        return calculateSpendingBreakdown(purchaseHistory);
      }
    }
  ],
  outputSchema: recommendationSchema
});

This pattern enables the agent to leverage capabilities beyond its training data. The agent remains in control but can delegate specific tasks to conventional code.

4. Agent → Agent (Delegation & Orchestration)

One agent delegates sub-tasks to other agents, creating complex feedback loops and agent hierarchies.

// Example: Reviewer agent with delegation to writer
const reviewerAgent = new Agent({
  instructions: `Review marketing emails for quality and brand consistency.
                 If changes are needed, use writerAgent to request revisions.`,
  tools: [complianceCheckerTool],
  delegates: [
    {
      name: "writerAgent",
      description: "Revises emails based on feedback",
      agent: writerAgent
    }
  ],
  outputSchema: reviewSchema
});

This enables iterative refinement with agents working together. In our marketing system, the Reviewer can identify issues and delegate revisions to the Writer, potentially going through several rounds before approval.

An advanced version is the "orchestrator" pattern, where a high-level agent coordinates multiple specialized agents. While powerful, I recommend using this pattern sparingly as each delegation level increases complexity.

The right control flow depends on your specific needs:

Code-to-code for deterministic logic
Code-to-agent for discrete decisions/ operations
Agent-to-code for flexible execution with tools
Agent-to-agent for creative processes requiring feedback

In practice, sophisticated applications often combine these patterns strategically.

Conclusion & Next Steps

After several months of building AI agents, I've come to appreciate both their transformative potential and the practical challenges they present. The insights shared in this post represent hard-won lessons that have dramatically improved the quality, reliability, and performance of my agent-based systems.

Building effective agents isn't just about having access to powerful LLMs - it's about thoughtful architecture, careful prompt design, and strategic combination of AI with traditional software engineering principles. The most successful agentic applications aren't those that rely solely on the intelligence of the models, but those that create synergistic systems where conventional code and AI complement each other's strengths.

Key takeaways for anyone embarking on their agent-building journey:

Respect complexity thresholds - Use multiple specialized agents rather than one that tries to do everything
Leverage structured outputs - They transform reliability and enable sophisticated reasoning patterns
Design thoughtful tool ecosystems - Simple, composable tools enable flexible agent workflows
Invest time in prompt engineering - The quality of your prompts directly impacts agent performance
Balance speed vs. quality - Understand the tradeoffs and optimize for your specific use case
Master RAG techniques - Move beyond basic retrieval to agentic RAG and hybrid analytical approaches
Choose control flow patterns wisely - Match patterns to your application's needs and complexity level

The field of AI agents is evolving rapidly, and we're still in the early days of understanding best practices. What's clear is that building effective agents isn't just about the AI models - it's about the entire system architecture and how we combine AI capabilities with traditional software engineering.

As LLMs continue to advance, I anticipate that the line between conventional code and agent-based systems will blur further. The distinctions between the four control flow patterns may become less pronounced as we develop new paradigms that seamlessly integrate deterministic and AI-driven components.

For developers approaching this space, I encourage you to start small, focus on well-defined problems, and iterate rapidly. The most valuable insights come from deploying systems that tackle real problems and observing how they perform in practice.

I hope the lessons shared in this post help you avoid some of the pitfalls I encountered and accelerate your journey toward building powerful, reliable agent-based applications. The road ahead is exciting, and I look forward to seeing how collectively we'll push the boundaries of what's possible with AI agents.

This post is a living document. As I learn more, I'll update it with new insights or write a follow-up. Have tips to share? Let's collaborate!

5 Playwright CLI Flags That Will Transform Your Testing Workflow

--last-failed: Zero in on just the tests that failed in your previous run
--only-changed: Test only the spec files you've modified in git
--repeat-each: Run tests multiple times to catch flaky behavior before it reaches production
--forbid-only: Prevent accidental test.only commits from breaking your CI pipeline
--ui --headed --workers 1: Debug visually with browser windows and sequential test execution

Learn how these powerful command-line options can save you time, strengthen your test suite, and streamline your Playwright testing experience. Practical examples included!

Watch Video 📹️

Top comments (1)

Mobisoft Infotech • Mar 27 • Edited

I really appreciate the valuable insights shared in this blog. The practical lessons and tips you’ve provided are incredibly helpful, especially for someone like me who is just starting out with AI agent development. I’ve learned so much from your experience, and it’s great to see how the real-world challenges you’ve faced are addressed in this post.

As I’m eager to begin building my own AI agents, I came across this resource: mobisoftinfotech.com/resources/blo... . The information there seems promising, and I’m curious to know your opinion on whether it’s a good fit for someone looking to dive into AI agent frameworks and understand how to build AI agents effectively. Do you think this service could be helpful for me in my journey, or should I look into other options?

Thanks again for sharing such an informative post! I’m looking forward to hearing your thoughts.

DEV Community