DEV Community: Aimen Kerrour

I Built an AI Marketing Ads Generator with VEO3 That Creates Videos in Minutes

Aimen Kerrour — Sun, 27 Jul 2025 12:40:31 +0000

If you’ve been following the AI space or just scrolling through Twitter in the last couple of days, you’ve probably seen those Google VEO3-generated ads popping up everywhere. And for good reason. These things look amazing — and if no one told you, you’d never guess they were made by AI. Marketing videos that used to cost hundreds of thousands of dollars are now being made for literally under a dollar using AI.

As someone who doesn’t know much about marketing or design 😅, I was curious to see what I could come up with myself. So I built a simple app that lets anyone start generating these AI videos in seconds.

▶️ See the ad I generated in seconds for “Mercedes F1 Car Launch” — made entirely with Google VEO3. You won’t believe it’s AI:

So if you've been curious about the AI video generation space but haven't had a chance to build with it yet, this tutorial is for you 😅. I'll walk you through exactly how to built an AI video generation system that creates professional marketing ads in minutes

Here's what you'll learn:
• VEO3 prompt engineering - The secret structured formats that unlock Google's VEO3 hidden potential
• Cost-effective AI ad videos - How to create $10,000+ quality videos for under $1 each
• Automated marketing ads workflow - Build your own app that generates marketing ads without the hassle

The best part? You don't need to be a video expert or have a massive budget. Just willingness to experiment with this new AI tools that are literally changing the game right now.

Let's dive in 🚀

🚀 Want to try it out? Test the live demo: VEO3 Ads Generator

🤔 New to Google's VEO3 video generation? Check my previous post: Turn Any Topic into Viral AI Videos Using Google's VEO3 Model

🎬 What Are These Viral VEO3 Ads?

If you've been on Twitter lately, you've probably seen these incredibly realistic marketing videos popping up everywhere:

IKEA furniture assembly ads that look professionally shot // Detect dark theme var iframe = document.getElementById('tweet-1946530151434428639-463'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1946530151434428639&theme=dark" }
Tesla car commercials with cinematic quality // Detect dark theme var iframe = document.getElementById('tweet-1947055581778563570-383'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1947055581778563570&theme=dark" }
Sneaker launch videos that rival Nike's production value // Detect dark theme var iframe = document.getElementById('tweet-1947007723024928771-568'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1947007723024928771&theme=dark" }

The secret? These aren't traditional video productions. They're AI-generated using Google's VEO3 model, and they're created using specially formatted prompts that unlock VEO3's hidden potential.

🔍 The Magic Behind Structured Prompts

What makes these ads so effective isn't just VEO3 itself - it's the structured prompt format that creators discovered. Instead of simple text descriptions, these prompts use markup languages like YAML and JSON:

# YAML Format Example
scene: "Close-up of hands assembling IKEA furniture"
character: "Person in casual clothing, focused expression"
environment: "Modern living room, natural lighting"
camera: "Handheld, intimate perspective"

{
  "scene": "Luxury car driving through mountain roads",
  "character": "Confident driver, hands on steering wheel",
  "lighting": "Golden hour, dramatic shadows",
  "audio": "Engine purr, wind through windows"
}

This structured approach works because VEO3 was trained on the entire internet - and the internet runs on structured data formats like these. When you feed VEO3 a properly formatted prompt, it understands exactly what you want.

Why Choose Google VEO3 for Marketing Ad Generation?

Cost-effective: Create professional ads for under $1 vs $10,000+ traditional production
Speed: Generate marketing videos in minutes instead of weeks
Scalability: Produce multiple video variations instantly
Accessibility: No video editing skills required - though you need to be a little creative 😅

🛠️ Building The Ads Video Generation

I built a simple system that automates the entire video generation workflow. Here's how it works:

1️⃣ Prompt Library Management

I started by diving deep into Twitter to collect any good prompts I could find from the community (Thanks to everyone for sharing them publicly!). There are tons of people experimenting with VEO3 designs right now, so I probably missed a lot of amazing prompts out there!

To make them all work together in a standard format, I converted everything into JSON and compiled a small prompt library that we can use as reference for our ad creative generation:

# Each prompt is a Python dict with structured data
amazon_gamer_room = {
    "name": "Amazon Gamer Room Setup",
    "scene": "Close-up of hands unboxing gaming gear",
    "character": "Gamer in casual hoodie, excited expression",
    "environment": "Modern bedroom, RGB lighting, gaming setup",
    "camera": "Handheld, first-person perspective",
    "lighting": "Colorful RGB ambiance, dramatic shadows",
    "audio": "Unboxing sounds, excited breathing"
}

📚 Want to see all the prompts? Check out the complete VEO3 prompt library on GitHub - there are prompts for IKEA, Tesla, Nike, Jeep, and many more!

2️⃣ AI-Powered VEO3 Prompt Generation

The core of the system is an AI agent that takes a brand brief and transforms it using inspiration from the prompt library:

async def generate_veo3_video_prompt(ad_idea: str, inspiration_prompt: str):
    """Generate a VEO3-optimized prompt based on user idea and inspiration"""

    system_prompt = """
    Edit a structured prompt object (JSON) based on the provided user creative idea.  

    # Instructions:
    - Adapt the inspiration prompt to the user creative direction including style, room, background, elements, motion, ending, text, keywords
    - Do not modify any part of the structure unless the user explicitly requests it.
    - Maintain original structure and intention unless asked otherwise.

    # **Output**: Your response should be in the following structure:

    {
        "title": "Title of the video",
        "prompt": "Prompt for the video"
    }
    """

    user_message = f"""
    Create a VEO3 prompt for this ad idea: {ad_idea}

    Follow and get instructions directly from this prompt: 
    {inspiration_prompt}
    """

    result = await ainvoke_llm(
        model="gpt-4.1",
        system_prompt=system_prompt,
        user_message=user_message,
        temperature=0.3,
        response_format=VideoDetails  # Structured output
    )

    return result

💡 Pro tip: I use structured output to make the LLM return both the ad title and the video prompt in a consistent format:

class VideoDetails(TypedDict):
    title: str = Annotated[str, "Title of the video"]
    prompt: str = Annotated[str, "Prompt for the video"]

This ensures we always get clean, structured data that's easy to work with.

3️⃣ VEO3 Integration via Kie AI

There are lots of ways to use Google's VEO3 model currently, but the best one I found is Kie AI. They offer great pricing and don't require a subscription, so it's perfect for testing:

def start_video_generation(prompt: str, aspect_ratio: str = "16:9", model: str = "veo3_fast"):
    """Submit video generation request to Kie AI"""

    url = "https://api.kie.ai/v1/video/generate"

    payload = {
        "prompt": prompt,
        "model": model,
        "aspect_ratio": aspect_ratio
    }

    headers = {
        "Authorization": f"Bearer {os.environ['KIE_API_TOKEN']}",
        "Content-Type": "application/json"
    }

    response = requests.post(url, json=payload, headers=headers)

    if response.status_code == 200:
        return response.json().get("taskId")
    else:
        raise Exception(f"Failed to start generation: {response.text}")

📌 What we need to provide to Kie AI:

Prompt: The structured video description we generated
Model: Either veo3_fast (cheaper/faster) or veo3 (higher quality/more expensive)
Aspect ratio: 16:9 for horizontal or 9:16 for vertical videos

4️⃣ Automated Video Generation

The system handles the complete workflow automatically:

async def run_workflow(inputs):
    """Complete video generation workflow"""
    try:
        # 1. Generate optimized prompt
        video_details = await generate_veo3_video_prompt(
            inputs["ad_idea"], 
            inputs["inspiration_prompt"]
        )

        # 2. Submit to VEO3
        taskid = start_video_generation(
            video_details.prompt,
            inputs["aspect_ratio"],
            inputs["model"]
        )

        # 3. Wait for completion
        result = wait_for_completion(taskid)

        # 4. Log and return results
        if result.get("status") == "completed":
            video_url = result.get("response", {}).get("resultUrls", [])[0]
            return {
                "title": video_details.title,
                "prompt": video_details.prompt,
                "video_url": video_url
            }

    except Exception as e:
        print(f"Workflow error: {str(e)}")
        return None

🎨 Streamlit App

I know some people just want to test the power of the model and don't care about the code 😄, so I built a Streamlit web app that allows to get started easily, you just need to setup your API keys and you're good to go:

# Streamlit app for easy video generation
st.title("🎬 AI Video Generator")

# User inputs
video_idea = st.text_area("Describe your video concept:")
selected_prompt = st.selectbox("Choose prompt inspiration:", prompt_options)
aspect_ratio = st.selectbox("Aspect ratio:", ["16:9", "9:16"])
model = st.selectbox("Model:", ["veo3_fast", "veo3"])

if st.button("🎬 Generate Video"):
    with st.spinner("Generating your video..."):
        result = asyncio.run(run_workflow({
            "ad_idea": video_idea,
            "inspiration_prompt": selected_prompt['prompt'],
            "aspect_ratio": aspect_ratio,
            "model": model
        }))

        if result:
            st.success("✅ Video generated successfully!")
            st.video(result["video_url"])

💰 Cost Breakdown

The economics are incredible:

VEO3 Fast: ~$0.40 per video (or $2 per video with the higher quality VEO3)
GPT-4 for prompt generation: ~$0.02 per request
Total cost per video: under $0.50

Compare this to traditional video production costs of $10,000-$100,000+ and the value proposition is obvious.

📊 Keeping Track of Everything

One thing I found super helpful was that all the videos and prompts generated are automatically saved to an Excel file, so you can review them whenever you want. This creates a nice history of:

Video titles and prompts that were generated
Direct links to the video files
Generation timestamps for tracking
Model settings used (VEO3 fast vs VEO3)
Aspect ratios chosen

This way, you can easily go back and see what worked well, reuse successful prompts, or share specific videos with others. It's like having a personal library of all your AI-generated content!

🔗 Try It Yourself

If you want to try it right now, you can use the free app I deployed here:
👉 Try AI Marketing video generation with VEO3

But if you prefer to check out the code and run it on your own machine, here’s how to get started:

1- Clone the repository from 🌐 GitHub

2- Install the required dependencies:

cd viral-ai-vids/video-ads-generation
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3- Create a .env file and add your API keys:

First, create an account on Kie AI. as I said before, their platform currently offers the best pricing options for accessing the VEO3 model. Once registered, generate your API key from your Kie AI dashboard.

For AI models, I usually use OpenRouter to access various LLMs in one place — but you can use any other provider like OpenAI, Claude, or DeepSeek, since LangChain supports them all.

Now, create a .env file and add your API keys:

KIE_API_TOKEN=your_kie_ai_token_here
OPENROUTER_API_KEY=your_openrouter_key_here  # For LLM access

4- Run the script directly:

python main.py

5- Or if you prefer a more user-friendly interface, run the Streamlit app:

streamlit run streamlit_app.py

The app will open in your browser where you can:

📝 Add your API keys
💡 Enter your video idea
🎨 Choose prompt inspiration from the library
⚙️ Select VEO3 model and aspect ratio
🎬 Generate professional marketing videos
📥 Download the video

🚀 What This Means for Everyone

We're witnessing a massive shift in content creation. The ability to create professional-quality video content for under a dollar changes everything:

Small businesses can now compete with major brands without huge budgets
Content creators can experiment freely without financial constraints
Marketing agencies can scale video services without massive overhead

And we're just getting started! Right now we're limited to 8-second videos, but that's temporary. The models are improving rapidly, and the community keeps discovering new creative techniques daily.

The real exciting part? This is just VEO3. Imagine what's coming next!

🎬 Ready to Get Started?

The barrier to entry is incredibly low - just a laptop, some API credits, and curiosity. While others are still figuring out how to use these tools manually, you could be generating professional marketing videos at scale.

The question isn't whether AI will transform marketing - it's whether you'll be part of that transformation from the beginning.

What do you think about this approach? Have you experimented with Google VEO3? Share your experiences in the comments below!

Turn Any Topic into Viral AI Videos Using Google’s VEO3 model

Aimen Kerrour — Sun, 22 Jun 2025 23:55:13 +0000

I've been hearing a lot about these new AI video models that look incredibly realistic lately. People are actually creating entire TikTok channels based on them, and they look surprisingly good! I never really tried them myself, so I wanted to check them out. I decided to start with Google's VEO3 model that everyone seems to be talking about.

Instead of manually creating prompts and videos one by one, I developed an AI automation that will take any topic you have and turn it into banger videos. This way, I could quickly test the model's capabilities and see if it lives up to the hype.

By the end of this tutorial, you'll learn how to:

Access and use Google’s VEO3 model
Craft optimized prompts for Google VEO3
Turn a simple topic into high-quality AI videos!

Let's get started! 🚀

🔥 Get Code from Github now!

🤖 What is Google VEO3?

VEO3 is Google's latest text-to-video AI model that's been creating waves across the internet. Released just a few weeks ago, it represents a giant leap forward in AI-generated video quality. The videos it produces look shockingly realistic—to the point where they're often indistinguishable from actual footage at first glance.

What makes VEO3 special compared to earlier models?

Realistic human characters with natural expressions, movements, and speech patterns
Consistent environments that maintain physical properties throughout the video
Proper lighting and physics that create a believable scene
High-resolution output with smooth motion and transitions

The current limitation is that videos are limited to 8 seconds in length, but even within that constraint, the results are impressive. You've probably seen some VEO3-generated videos on social media already — those viral clips feature crazy realistic characters like the Yeti vlogger look almost too good to be AI-generated.

Don’t take my word for it — check it out yourself: 👉 @yetivloglife on TikTok

YEP!, that’s an AI-generated Yeti running a vlog, with over 356K followers and videos reaching 16 million+ views.

⚙️ How It Works?

1️⃣ Generate Video Ideas

The first step in our automation pipeline is to generate creative video ideas from a simple topic input.
Instead of manually brainstorming concepts, I created an AI agent that handles this heavy lifting.

It takes a topic (like "Alien food critic reviewing Earth cuisine") and transforms it into structured, creative video concepts, complete with a catchy caption and environmental context.

GENERATE_IDEAS_PROMPT = """
You are an AI designed to generate 1 immersive, realistic idea based on a user-provided topic. Your output must be formatted as a JSON array (single line) and follow all the rules below exactly.

## RULES:

- Only return 1 idea at a time.
- The user will provide a key topic (e.g. "urban farming," "arctic survival," "street food in Vietnam").

### The Idea must:
- Be under 13 words.
- Describe an interesting and viral-worthy moment, action, or event related to the provided topic.
- Can be as surreal as you can get, doesn't have to be real-world!
- Involves a character.

...
"""

async def generate_video_ideas(topic: str, count: int = 1):
    print(f"Generating ideas for topic: '{topic}'...")
    user_message = f"Generate {count} creative video ideas about: {topic}"

    # Use the AI invocation function with structured output
    result = await ainvoke_llm(
        model="gpt-4.1-mini",
        system_prompt=GENERATE_IDEAS_PROMPT,
        user_message=user_message,
        response_format=IdeasList,
        temperature=0.7
    )
    return result.ideas

The LLM returns these ideas in a structured format using Pydantic models, which makes it easy to process the results

For example, when I input the topic "Alien food critic reviewing Earth cuisine" it might generate an idea like:

Idea: "Tentacled alien grimaces tasting hot sauce for first time"
Environment: "Retro diner, neon lights, zoomed close-up, documentary style"
Caption: "Alien reviewer tries Earth's spiciest sauce! 👽 #alien #foodcritic #spicyfood #tastetest"

This approach gives us a continuous stream of fresh, creative ideas that would be perfect for viral short-form videos.

2️⃣ Generate VEO3 Video Prompt

Once we have a creative idea, we need to transform it into a specialized prompt that works well with VEO3. This isn't as simple as passing the raw idea to the model — VEO3 performs best with highly detailed, structured prompts that follow certain patterns.

That's why I created another AI agent specifically designed to craft optimized VEO3 prompts:

async def generate_veo3_video_prompt(idea: str, environment: str):
    user_message = f"""
    Create a V3 prompt for this idea: {idea}
    Environment context: {environment}
    """

    # Use the AI invocation function
    result = await ainvoke_llm(
        model="gpt-4.1-mini",
        system_prompt=GENERATE_VIDEO_SCRIPT_PROMPT,
        user_message=user_message,
        temperature=0.7
    )
    return result

The GENERATE_VIDEO_SCRIPT_PROMPT contains detailed instructions for creating cinematic, hyper-realistic video prompts. Here's a snippet of what it tells the AI:

## REQUIRED STRUCTURE (FILL IN THE BRACKETS BELOW):

[Scene paragraph prompt here]

- **Main character:** [description of character]
- **They say:** [insert one line of dialogue, fits the scene and mood].
- **They** [describe a physical action or subtle camera movement, e.g. pans the camera, shifts position, glances around].
- **Time of Day:** [day / night / dusk / etc.]
- **Lens:** [describe lens]
- **Audio:** (implied) [ambient sounds, e.g. lion growls, wind, distant traffic, birdsong]
- **Background:** [brief restatement of what is visible behind them]

The prompt engineering is quite specific and follows patterns that work well with VEO3. For example, it instructs the AI to create selfie-style framing, include just one character (never named), specify a single line of dialogue, and describe physical actions and camera movements.

3️⃣ Video Generation with Kie AI

For the actual video generation, there are multiple providers out there, but in this tutorial, I'm using Kie AI, which offers pay-as-you-go access to the VEO3 model at very reasonable prices. This is perfect for testing without committing to Google's expensive subscription.

The Kie AI API generates videos in a three-step process:

📤 Submit a request including the AI model and video prompt
⏳ Wait for the video to be generated (which can take several minutes)
🔗 Retrieve the result with the video URL

Their API docs makes it very easy to use:

def start_video_generation(prompt: str):
    try:
        # Prepare the payload for VEO3
        payload_json = json.dumps({"prompt": prompt, "model": "veo3"})

        # Set up headers with API token
        headers = {
            'Content-Type': 'application/json',
            'Accept': 'application/json',
            'Authorization': f'Bearer {os.getenv("KIE_API_TOKEN")}'
        }

        # Make the API request
        url = f"https://api.kie.ai/api/v1/veo/generate"
        response = requests.post(url, headers=headers, data=payload_json)

        # Extract task ID from the response
        if response.status_code == 200:
            data = response.json()
            if "taskId" in data.get("data", {}):
                task_id = data["data"]["taskId"]
                print(f"Successfully submitted to Kie AI. Task ID: {task_id}")
                return task_id

        print(f"Error submitting to Kie AI: {response.text}")
        return None

    except Exception as e:
        print(f"Error submitting to Kie AI: {str(e)}")
        return None

After submitting the request, we receive a request id and we need to wait for the video to be generated. VEO3 can take several minutes to render a video, so I created a wait function that periodically checks the status:

def wait_for_completion(task_id: str, timeout_minutes: int = 10):
    start_time = time.time()
    timeout_seconds = timeout_minutes * 60

    while True:
        # Check if we've exceeded the timeout
        elapsed = time.time() - start_time
        if elapsed > timeout_seconds:
            print(f"Timeout reached after {elapsed:.1f} seconds")
            return {"error": "Timeout reached", "status": "timeout"}

        # Get the current status
        result = get_video_status(task_id)
        print(result)
        status = result.get("status", "").lower()

        # If completed or error, return the result
        if status == "completed":
            print(f"Kie AI video generation completed successfully")
            return result
        elif status == "failed":
            print(f"Kie AI video generation failed: {result.get('error')}")
            return result

        # Wait before checking again (adjust as needed)
        time.sleep(30)

Finally, once the video is generated, the Kie AI API returns a direct URL to the generated video, which can be viewed in any browser or embedded in a website.

4️⃣ Save Generated Videos

To keep track of all the videos we generate, I used a simple Excel sheet. It includes the video ideas, captions, prompts, and of course, the video links.
This makes it easy to review everything later and share the videos with others.

Now that we've explained how everything works, I'm sure you want to test it out and generate your own crazy ideas 🤯!
But first, let's talk quickly about the cost 💰 of using the VEO3 model.

💰 Cost of Using VEO3

The VEO3 model is seriously impressive and delivers insane outputs, but all that magic usually comes with a hefty price tag 💸.

To use it directly with a Google account, you'd need to pay a steep $200/month subscription fee 😬.

Luckily, Kie AI is currently the best platform for using VEO3 — offering both access and affordability. They give you two model options:

VEO3 Full Quality — $0.25/sec → about $2 for an 8-second video
VEO3 Fast — just $0.05/sec → a crazy low $0.40 for 8 seconds 😍

We’ll be using the VEO3 Fast model — it’s perfect for testing, playing around, and unleashing creative ideas without breaking the bank.

The quality is slightly lower than the full model, but honestly, it still looks really damn good.

Hopefully, as the tech continues to evolve and competition ramps up, we’ll see even better pricing!

🚀 Try It Out

If you want to try this out for yourself, here's how to get started:

1- Clone the repository from 🌐 GitHub

2- Install the required dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3- Create a .env file and add your API keys:

First, create an account on Kie AI. Their platform currently offers the best pricing options for accessing the VEO3 model. Once registered, generate your API key from your Kie AI dashboard.

For AI models, I usually use OpenRouter to access various LLMs in one place — but you can use any other provider like OpenAI, Claude, or DeepSeek, since Langchain supports them all.

Now, create a .env file and add your API keys:

KIE_API_TOKEN=your_kie_ai_token_here
OPENROUTER_API_KEY=your_openrouter_key_here  # For LLM access

4- Finally, update the main script with your topic of interest and set the number of videos you want to generate (start with 1 to test things out):

async def main():
    # Change this to whatever topic you'd like to explore
    topic = "Dog playing piano in a jazz club"  # Your creative topic here
    await run_workflow(topic, count=1)

5- Run the script:

python main.py

The script does all the work for you — it will:

💡 Generate video ideas automatically
✍️ Create VEO3 prompts based on those ideas
🚀 Submit prompts to Kie AI video generation
⏳ Wait for the videos to be generated
📁 Save everything — including video URLs — to an Excel file (videos.xlsx)
🔗 Click the URLs to view your videos in any web browser

So with just a single run, you’ll get a bunch of high-quality VEO3 videos ready to go — no back-and-forth, no wasted time.

▶️ Here’s an example I got for “meteorologist woman chasing tornado live on air” — generated entirely through the automation:
📽️ Watch the video

🔧 Improvements

I found this AI video generation field really fascinating and plan to continue exploring it. Here are some improvements I'm considering for future versions:

Experimenting with different VEO3 model variants for optimal quality/cost balance
Adding a way to directly upload the generated videos to YouTube or TikTok

And who knows — maybe I’ll even launch my own viral channel with some crazy, original concept like the Yeti vlogger, but with my own twist! 🔥😄

🌎 Use Cases

The output from VEO3 model is really impressive and will be huge for many applications:

Content creation: Generate videos for social media, websites, and presentations
Marketing materials: Create promotional videos and advertisements
Educational content: Produce instructional and explanatory videos
Prototyping: Rapid video concept development and testing
Creative projects: Artistic and experimental video generation
Business presentations: Professional video content for meetings and pitches

Thanks to platforms like Kie AI offering more affordable access, we're already seeing wider adoption. Early adopters who master these tools now will have a significant advantage as the technology continues to evolve.

🎯 Conclusion

Google’s VEO3 is a major leap in AI video generation 🎥. The videos it produces are incredibly realistic and open the door to a whole new level of creativity for content creators, marketers, educators, and beyond.

While it used to be pretty expensive for casual use 💸, platforms like Kie AI are making it way more accessible — especially with the VEO3 Fast model that delivers solid quality at a fraction of the cost.

The automation I shared makes it even easier to experiment with VEO3 — handling everything from idea generation to prompt writing to video creation in one streamlined workflow ⚙️.

Have you tried VEO3 or any other AI video tools?
💬 What kind of videos would you create with this tech? Let me know in the comments!

P.S. If you found this helpful, consider following me for more AI tutorials and experiments.

How to Scrape Unlimited Google Maps Leads Using AI

Aimen Kerrour — Mon, 16 Jun 2025 12:25:50 +0000

Finding quality leads for your business is often expensive and time-consuming. Commercial lead generation tools can cost hundreds of dollars monthly, and manual research takes forever.

In this guide, I'll walk you through building an AI-powered lead generation tool that scrapes business data from Google Maps at a fraction of the cost of commercial solutions. You'll learn how to extract business information, enhance it with AI scraping, and compile everything into ready-to-use lead lists — all for about $0.2 per 1000 leads! 💰

💡 See it in action!

By the end, you'll have a powerful tool to generate leads for any business niche in any location. Whether you're running a marketing agency, doing sales outreach, or conducting market research, this tool will save you both time and money.

Let's dive in! 🔥

💡 Access the project in my Github repository now!

🤔 Why Build Another Lead Generation Tool?

When I decided to start selling AI solutions for businesses, I needed up-to-date contact information and business details to conduct a successful outreach campaign. During my first look, I immediately noticed two clear options:

Having a subscription to a database provider like Apollo or Zoominfo, but for me this was too expensive as most of them charge $50-300/month
The obvious second option is Manual research which is completely free but for a good outreach at scale, it's incredibly time-consuming

I made some research and found some good scrapers on Apify like Apify's Google Maps scrapers or even Apollo Lead scraper which offer a pay-per-use more affordable pricing model costing from $2-10 per 1000 results.

This seemed good enough for me, but when I looked at the Apify Google Maps scraper, I noticed they were using a simple logic in the background - they get business details from the Google Maps API and then crawl each business details page to get more contact information. As I had experience with the Serper API, I thought why not try to build my own local business scraper tool.

That's why I created this tool — to give you:

🔹 Comprehensive data extraction - Not just names and addresses, but emails and social profiles too
🔹 Cost-efficiency - About $0.2 per 1000 leads (50x cheaper than alternatives!)
🔹 Full customization - Search any business type in any location
🔹 AI-powered enhancement - Uses LLMs to intelligently extract contact information or other business details
🔹 Ready-to-use format - Export directly to Excel for immediate use in your campaigns

🛠️ How The Tool Works

Our lead generator follows a three-step process that combines traditional API calls with AI-powered data enrichment:

1️⃣ Initial Data Collection

First, we use the Serper Maps API to collect basic business information from Google Maps.

This is handled by our search_places function, which formats and sends the API request:

def search_places(query, coords, num_pages=1):
    payload = []
    lat, lon = coords['lat'], coords['lon']

    # Create payload for each page
    for page in range(1, num_pages + 1):
        payload.append({
            "q": query,
            "ll": f"@{lat},{lon},13z", # Format the location string for Serper API
            "page": page
        })

    headers = {
        'X-API-KEY': os.getenv("SERPER_API_KEY"),
        'Content-Type': 'application/json'
    }

    # Send request and return results
    response = requests.post(
        "https://google.serper.dev/maps", 
        headers=headers, 
        data=json.dumps(payload)
    )

Also as you can notice, the Serper API requires specific longitude and latitude of the place we want to search for, so we can't directly provide a city name like "New York" for example. To get around this, we use the free Nominatim API which returns the coordinates of a place/city from its name. This is done using the get_coordinates function:

def get_coordinates(city):
    """
    Convert a city name to latitude and longitude coordinates.

    Args:
        city (str): Name of the city to geocode

    Returns:
        tuple: (latitude, longitude) if successful, (None, None) if not
    """
    try:
        response = requests.get(
            "https://nominatim.openstreetmap.org/search", 
            params={"q": city, "format": "json"}, 
            headers={"User-Agent": USER_AGENTS[2]}
        )
        data = response.json()
        if data:
            return {"lat": data[0]['lat'], "lon": data[0]['lon']}
        else:
            return None
    except Exception as e:
        print(f"Error getting coordinates: {e}")
        return None

Up to this point We get the basic business data - name, address, phone number, website URL, and reviews. But we need more information for effective outreach...

2️⃣ AI-Powered Data Enrichment

The real magic happens in the data enrichment phase. For each business with a website, we:

Scrape the website content using Playwright
Use AI (LLMs) to extract and analyze key contact information mainly email and social media links

Here's how the business information processing works:

async def process_businesses(excel_file):
    # Load the Excel file into a DataFrame
    df, file_path = load_excel_data(excel_file)

    # Process each business
    for index, row in tqdm(df.iterrows(), desc="Processing businesses", unit="business"):
        name = row.get("name", "")
        url = row.get("website", "")
        location = row.get("address", "")

        if not url:
            continue

        # Get business info
        info = await get_business_info(url, name, location)

        # Update the business information
        update_business_data(df, index, info)

For each business, we extract detailed information with our AI analysis function:

async def get_business_info(business_url, business_name, business_location):
    # Scrape the main website
    content, links = await scrape_website(business_url, extract_links=True)
    if not content:
        return {}

    social_links = find_relevant_links(links)
    emails = extract_emails_from_content(content)

    # Analyze the identified links using AI
    links_result = await analyze_business_links(
        social_links, business_name, business_location, business_url
    )

    if emails:
        emails_result = await analyze_business_emails(
            emails, business_name, business_location, business_url
        )

    # If no email found but we have a contact link, try scraping that too
    if (not emails) and social_links.get('contact'):
        contact_url = social_links['contact'][0]
        if contact_url != business_url:
            contact_content, _ = await scrape_website(contact_url, extract_links=False)
            emails = extract_emails_from_content(contact_content)
            if emails:
                # Re-analyze with new emails
                emails_result = await analyze_business_emails(
                    emails, business_name, business_location, business_url
                )

    return {
        'facebook': links_result.get('facebook', ''),
        'twitter': links_result.get('twitter', ''),
        'instagram': links_result.get('instagram', ''),
        'contact': links_result.get('contact', ''),
        'email': " || ".join(emails_result.get('emails', '')),
    }

🚀 What this function does:

Scrape the main website page to extract all relevant links and the page content.
Analyze the identified links using AI to determine the correct social media and contact links.
Extract email addresses from the page content using regex.
Analyze the extracted emails using AI to identify the valid email addresses.
If no email is found and there's a contact page link, scrape that page as well and extract any email addresses from it.
Return a dictionary containing the social media links, contact links, and the email addresses found.

💡 Pro Tip

Instead of providing the full scraped page content to the AI, we first use regex to extract all the page links and email addresses. Then we instruct an AI agent to identify which are the correct social media and contact links, and which are the valid email addresses.

This approach helps save money on LLM calls and provides more accurate results by reducing the chance of hallucinations.

For example, for the email extraction we use the following regex pattern to identify all email addresses within the page content:

EMAIL_PATTERN = re.compile(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
def extract_emails_from_content(content: str):
    """
    Extracts email addresses from content.
    """
    emails = set(email.lower() for email in EMAIL_PATTERN.findall(content))
    return list(emails)

Then we use AI analysis to intelligently find the correct email addresses that match the business:

async def analyze_business_emails(
    emails: List[str], 
    business_name: str, 
    business_location: str, 
    business_url: str
):
    system_prompt = f"""
Identify all relevant business contact emails. Prioritize general contact addresses (such as info@ or contact@) and emails of key personnel that use the business's domain. Exclude department-specific ones (e.g., press, events) unless no main contact is available.

If no domain-based business emails are found, provide any available emails, including personal or free-domain addresses (e.g., Gmail, Yahoo) as fallback contacts.

## Business Information
- Business Name: {business_name}
- Business Location: {business_location}
- Business Website URL: {business_url}

**If only a single valid email is found, just return it.**
"""

    # Create user message with all the context
    user_message = f"Potential emails: {list(emails)}"

    # Invoke LLM to get structured response
    response = await ainvoke_llm(
        model=os.getenv("LLM_MODEL", "gpt-4.1-mini"),
        system_prompt=system_prompt,
        user_message=user_message,
        response_format=EmailsResponse,
        temperature=0.1
    )

    return response

3️⃣ Data Export & Organization

Finally, all the enhanced business data is saved back to the Excel file in a clean, organized format ready for your outreach campaigns.

The whole flow is orchestrated by the main function:

async def main(location, search_query, num_pages):
    print(f"\n🔍 Starting lead generation for '{search_query}' in '{location}'")

    # Step 1: Get coordinates from location
    coords = get_coordinates(location)
    if not coords:
        print("Could not get coordinates for the location. Exiting.")
        return

    # Step 2: Search for places using Serper Maps API
    places_data = search_places(search_query, coords, num_pages)
    if not places_data:
        print("No places found. Exiting.")
        return

    # Step 3: Save places data to Excel
    excel_filename = f"data_{search_query}_{location}_{get_current_date()}.xlsx"
    save_places_to_excel(places_data, excel_filename)

    # Step 4: Process businesses to get detailed information and update Excel file
    await process_businesses(excel_filename)

📌 Serper Maps API Pagination

When using Serper Maps API, each page of results contains 20 places, so to extract more than 20 places, we need to use the num_pages parameter to specify the number of pages to fetch. For example, if we want to extract 100 places, we can set num_pages=5.

🚀 Features That Set This Tool Apart

Multi-LLM Support - Using OpenRouter for LLM calls gives you the freedom to use any AI model you prefer (OpenAI, Claude, DeepSeek) through a simple interface
Full Control and Customization - By default, the tool scrapes only the landing and contact page for each business, but you can change this behavior to crawl the entire website if you want and even extract other specific business details like products or services

💰 Cost Breakdown

This tool is incredibly cost-efficient:

Service	Cost Per 1000 Leads
Serper Maps API	~$0.15 (with free credits!)
LLM API (OpenAI/Claude mini)	~$0.05
Total Cost	~$0.20

Compare that to Apify's Google Maps scrapers at $5-10 per 1000 results, and you're saving up to 50x on costs!

🛠️ Setting Up Your Own Lead Generator

Prerequisites

Python 3.8+ installed
Serper API key (Get one here)
OpenRouter API key (Get one here) or your preferred LLM API key

💡 Serper API

When you register for a Serper API key, you get 2500 free credits. This will allow to test without any cost.

Step 1: Clone the Repository

git clone https://github.com/kaymen99/google-maps-lead-generator
cd google-maps-lead-generator
pip install -r requirements.txt

Step 2: Set Up Your Environment Variables

Create a .env file in the project root with your API keys:

SERPER_API_KEY="your_serper_api_key"
OPENROUTER_API_KEY="your_openrouter_api_key"

📌 AI Model Provider

You are not forced to use OpenRouter, you can use any LLM provider you want, just change the invoke_llm function in src/utils.py to use your preferred LLM provider.

Step 3: Customize and Run

Edit the parameters in main.py to specify your target location, search query, and number of pages:

if __name__ == "__main__":
    location = "Toronto"  # Change to your target location
    search_query = "Realtors"  # Change to your target business type
    num_pages = 1  # Each page contains 20 results, increase for more leads

    # Run main function
    asyncio.run(main(location, search_query, num_pages))

Then run the script:

python main.py

Step 4: Streamlit UI

To make the tool more accessible for non-technical users, I've created a simple Streamlit interface. This provides a clean, user-friendly way to generate leads without needing to touch any code.

To use the Streamlit app, just run:

streamlit run app.py

This will open a browser window with an interface that includes:

A sidebar where you can set your Serper API key and OpenRouter API key
Options to select which LLM model you want to use for data enrichment
The main panel where you can input your location, business type, and select the number of places to scrape

After configuring your settings and clicking "Start Lead Generation", the tool will collect and enrich the data. When complete, you'll see a preview of the results and have the option to download the Excel file.

🌍 Real-World Use Cases

This tool is perfect for:

🔹 Sales Teams - Generate qualified lead lists for targeted outreach
🔹 Real Estate Agents - Find property managers, contractors, or competing agents
🔹 Marketing Agencies - Build prospect lists for client acquisition
🔹 Recruiters - Identify potential companies to place candidates
🔹 Market Researchers - Analyze business density and distribution in specific areas

🎯 Final Thoughts

Building your own lead generation tool gives you complete control over the process and drastically reduces costs. The combination of Serper Maps API and AI-powered enrichment provides a powerful, flexible solution that can adapt to your specific needs.

I'd love to hear how you use this tool and what enhancements you'd like to see! Feel free to contribute to the project on GitHub or reach out with questions.

💡 Want to learn more? Follow my blog and check out my Github for more AI project & tutorials!

📖 Interested in AI scraping? Check out my other tutorial: Scrape Any Website Fast and Cheap with Crawl4AI to learn how to build your own custom scrapers in minutes!

Happy lead hunting! 🚀

AI Web Scraping Without Limits—Scrape Anything using Crawl4AI

Aimen Kerrour — Tue, 11 Feb 2025 00:46:15 +0000

Web scraping is one of the most in-demand skills in AI and tech today. Companies are actively seeking professionals who can extract valuable data efficiently.

In this guide, I’ll walk you through building an AI-powered web scraper using Crawl4AI, leveraging LLMs to intelligently extract and process structured data from any website—whether you’re scraping leads, gathering research data, or building a custom dataset, this will save you time and effort ⏳.

By the end, you’ll have a fully functional web scraper that can extract leads from YellowPages, process them with AI, and save the results to a CSV file—all with minimal effort.

And the best part? It costs practically nothing to run! 💡

Let’s dive in! 🔥

💡 Access the project in my Github repository now!

🤖 What is Crawl4AI?

Web scraping has come a long way, and Crawl4AI is here to take it to the next level⚡

Crawl4AI is an open-source web crawling and scraping framework designed for speed, scalability, and seamless integration with LLMs (e.g., GPT-4o, Claude). It combines traditional scraping methods with AI-driven data extraction, making it ideal for data pipelines, automation workflows, and AI agents.

🔑 Key Features

✅ LLM-Friendly Output – Generates clean Markdown-formatted data, perfect for retrieval-augmented generation (RAG) and direct ingestion into LLMs.
✅ Smart Data Extraction – Combines AI-powered parsing with traditional methods (CSS, XPath) for maximum versatility.
✅ Advanced Browser Control – Handles JavaScript-heavy websites with proxy support, session management, and stealth scraping.
✅ High Performance – Supports parallel crawling and chunk-based extraction for efficient, scalable data collection.
✅ Cost-Effective & Open-Source – Eliminates the need for costly subscriptions or expensive APIs, offering full customization and scalability without breaking the bank.

Crawl4AI empowers you to extract data intelligently, efficiently, and at scale—unlocking new possibilities for automation and AI-driven workflows. 💡

🛠️ What We Will Build

Businesses and agencies often need access to local business information to find potential clients, analyze competitors, or generate leads. Instead of manually searching directories, an AI scraper can automate this process—saving time and effort.

To showcase the power of AI web scraping with Crawl4AI, I chose to built a scraper that extracts local businesses information from Yellowpages. 🏢📊

This scraper automatically navigates through listings, collecting key details like:

🔹 Business Name

🔹 Address

🔹 Phone Number

🔹 Website & Additional Info

Once extracted, the data is structured and saved into a CSV file, making it easy to use for lead generation, market research, or business analytics.

This project demonstrates how Crawl4AI + AI models can quickly extract and process web data with minimal effort. Let’s break down how it works! 🚀

⚙️ How It Works

Before running our scraper, let's break down how the code works and give you a quick overview of how LLM-powered scraping with Crawl4AI functions.

(Don’t worry—I’ll keep it short and focus on the essential parts! 😉)

1️⃣ Browser Configuration

First, we need to configure the browser settings. Crawl4AI uses Playwright under the hood, so we get full control over how the browser behaves, including:

Headless mode (whether the browser runs in the background)
Proxy settings (to avoid getting blocked)
User agents & timeouts (to mimic real users)

Here’s how we define our browser configuration:

from crawl4ai import BrowserConfig

def get_browser_config() -> BrowserConfig:
    return BrowserConfig(
        browser_type="chromium",  # Simulate a Chromium-based browser
        headless=True,  # Run in headless mode (no UI)
        verbose=True,  # Enable detailed logs for debugging
    )

2️⃣ Defining the LLM Extraction Strategy

Now comes the AI part! 🚀 Crawl4AI allows us to use an LLM extraction strategy to tell the model exactly what to extract from each page.

Here’s how we define our strategy:

llm_strategy = LLMExtractionStrategy(
    provider="gemini/gemini-2.0-flash",  # LLM provider (Gemini, OpenAI, etc.)
    api_token=os.getenv("GEMINI_API_KEY"),  # API key for authentication
    schema=BusinessData.model_json_schema(),  # JSON schema of expected data
    extraction_type="schema",  # Use structured schema extraction
    instruction=(
        "Extract all business information: 'name', 'address', 'website', "
        "'phone number' and a one-sentence 'description' from the content."
    ),
    input_format="markdown",  # Define input format
    verbose=True,  # Enable logging for debugging
)

📌 Structuring the Output

To ensure the extracted data follows a consistent structure, we use a Pydantic model:

from pydantic import BaseModel, Field

class BusinessData(BaseModel):
    name: str = Field(..., description="The business name.")
    address: str = Field(..., description="The business address.")
    phone_number: str = Field(..., description="The business phone number.")
    website: str = Field(..., description="The business website URL.")
    description: str = Field(..., description="A short description of the business.")

💡 Why use Pydantic?

It ensures the LLM returns structured data that we can easily validate and process.

📍 Multiple LLM Choices

You noticed that in the LLM strategy, I chose to use the Gemini 2.0 Flash LLM. However, as Crawl4AI is built on LiteLLM, you can swap it with OpenAI, Claude, DeepSeek, Groq, or any other supported LLM! (See the full list here).

3️⃣ Scraping the web page

Now that we have the browser and LLM strategy set up, we need a function to scrape each page and extract business details:

async def fetch_and_process_page(
    crawler: AsyncWebCrawler,
    page_number: int,
    base_url: str,
    css_selector: str,
    llm_strategy: LLMExtractionStrategy,
    session_id: str,
    seen_names: Set[str],
) -> Tuple[List[dict], bool]:

    url = base_url.format(page_number=page_number)
    print(f"Loading page {page_number}...")

    # Fetch page content with the extraction strategy
    result = await crawler.arun(
        url=url,
        config=CrawlerRunConfig(
            cache_mode=CacheMode.BYPASS,  # No cached data
            extraction_strategy=llm_strategy,  # Define extraction method
            css_selector=css_selector,  # Target specific page elements
            session_id=session_id,  # Unique ID for the session
        ),
    )

    # Parse extracted content
    extracted_data = json.loads(result.extracted_content)

    # Process extracted businesses
    all_businesses = []
    for business in extracted_data:
        if is_duplicated(business["name"], seen_names):
            print(f"Duplicate business '{business['name']}' found. Skipping.")
            continue  # Avoid duplicates

        seen_names.add(business["name"])
        all_businesses.append(business)

    if not all_businesses:
        print(f"No valid businesses found on page {page_number}.")
        return [], False

    print(f"Extracted {len(all_businesses)} businesses from page {page_number}.")
    return all_businesses, False  # Continue crawling

This function:

Includes the necessary LLM strategy and CSS selector in the crawler config.
Loads the webpage by calling the arun method.
Extracts business details using the LLM strategy.
Filters duplicates to prevent redundant data.
Returns a list of all collected local businesses.

💡 Pro Tip

The session_id helps maintain consistent browsing behavior across pagination - crucial for websites that track user sessions!

🔍 Targeting the Right Data with CSS Selectors

To help the LLM focus on the relevant sections of the page, we use CSS selectors. These selectors allow us to pinpoint specific HTML elements that contain the desired data, ensuring a smoother and cleaner extraction for the LLM.

📸 The screenshot below shows the HTML structure of a Yellow Pages listing, where we can use CSS selectors to extract business details precisely.

4️⃣ Putting Everything Together

Scraping just one page isn’t enough—we want to crawl all businesses listings. So let’s tie everything together into a full web crawling workflow! 🎯

async def crawl_yellowpages():
    """
    Main function to scrape business data.
    """
    # Initialize configurations
    browser_config = get_browser_config()
    llm_strategy = get_llm_strategy(
        llm_instructions=SCRAPER_INSTRUCTIONS,  # Extraction instructions
        output_format=BusinessData  # Output schema
    )
    session_id = "crawler_session"

    # Initialize state variables
    page_number = 1
    all_records = []
    seen_names = set()

    # Start the web crawler session
    async with AsyncWebCrawler(config=browser_config) as crawler:
        while True:
            records, no_results_found = await fetch_and_process_page(
                crawler,
                page_number,
                BASE_URL,
                CSS_SELECTOR,
                llm_strategy,
                session_id,
                seen_names,
            )

            if no_results_found:
                print("No more records found. Stopping crawl.")
                break

            if not records:
                print(f"No records extracted from page {page_number}.")
                break  

            all_records.extend(records)
            page_number += 1  # Move to the next page

            # Stop after a maximum number of pages
            if page_number > MAX_PAGES:
                break

            # Pause to prevent rate limits
            await asyncio.sleep(2)

    # Save extracted data
    if all_records:
        save_data_to_csv(records=all_records, data_struct=BusinessData, filename="businesses_data.csv")
    else:
        print("No records found.")

    # Show LLM usage stats
    llm_strategy.show_usage()

🚀 What this function does:

Sets up the browser & LLM strategy.
Invokes fetch_and_process_page to scrape the local businesses data for each page
Runs a while loop and uses pagination to scrape multiple pages.
Saves all extracted businesses data to a CSV file.
Displays LLM usage statistics to track input/output tokens count for cost estimate.

With just a few steps, we've built a powerful AI scraper that can extract local business listings! Now, let’s put it to the test and see it in action. ⚡🔍

🔥 Try It Out!

🛠 Step 1: Clone the Project

To get started, clone the GitHub repository and install the necessary dependencies (preferably in a virtual environment):

# Clone the repository from GitHub  
git clone https://github.com/kaymen99/llm-web-scraper  
cd llm-web-scraper  

# Create a virtual environment to manage dependencies  
python -m venv venv  

# Activate the virtual environment  
source venv/bin/activate # On macOS/Linux
# On Windows: venv\Scripts\activate  

# Install the required dependencies from the requirements.txt
pip install -r requirements.txt 

# Install playwright browsers 
playwright install

🛠 Step 2: Set Up Your Environment Variables

Create a .env file in the root directory with the following content:

GEMINI_API_KEY=your_gemini_api_key_here

💡 Tip: You can use any LLM supported by LiteLLM—just ensure you provide the correct API key!

🛠 Step 3: Customize the Scraper (Optional)

Inside the project directory, you'll find a config.py file where you can modify key settings, such as:

The website URL to scrape.
The LLM provider being used.
The maximum number of pages to crawl.
Scraper instructions.

For example, to scrape different types of businesses, update the BASE_URL:

# - Plumbers in Vancouver: "https://www.yellowpages.ca/search/si/{page_number}/Plumbers/Vancouver+BC"
# - Restaurants in Montreal: "https://www.yellowpages.ca/search/si/{page_number}/Restaurants/Montreal+QC"
BASE_URL = "https://www.yellowpages.ca/search/si/{page_number}/Dentists/Toronto+ON"

To switch to a different LLM provider, update these lines:

LLM_MODEL = "gpt-4o-mini"
API_TOKEN = os.getenv("OPENAI_API_KEY")

🛠 Step 4: Run the Scraper

Start the crawler with:

python main.py

The program will:

Scrape local businesses listings page by page.
Save all extracted data to businesses_data.csv.
Display LLM tokens usage statistics after completion.

📸 See the results:

🚀 Go on, give it a spin and watch it in action!

💰 Cost Breakdown

I chose the Gemini-2.0 Flash LLM from Google for my AI scraper. Let's take a look at the token usage:

📸 Screenshot:

From the usage data, we can see that the scraper processes approximately 13,000 input tokens and 2,000 output tokens per page. Let’s calculate how much it costs to scrape a single Yellow Pages entry using our AI scraper:

Usage	Tokens Used	Pricing (per 1M tokens)	Cost
Input Tokens	13,000	$0.10	$0.0013
Output Tokens	2,000	$0.40	$0.0008
Total Cost	15,000	—	$0.0021 (≈ $0.002)

So, the cost for scraping a single page is only ~$0.002—practically free! 💸

🌍 Use Cases

Our AI web scraper isn’t just a tool—it’s a game-changer for automating web data collection. Here’s how it can be applied:

🎯 Lead Generation – Extract business details like emails, phone numbers, and addresses to build targeted outreach lists effortlessly.
📊 Market Research – Analyze trends, customer behavior, and industry insights by gathering real-time data from various sources.
⚔️ Competitor Analysis – Monitor pricing, services, and customer reviews to stay ahead in your industry.
🤖 AI Data Enrichment – Leverage LLMs to clean, categorize, and enhance scraped data for deeper insights.
📚 Research & Analysis – Extract structured data from directories, reports, and publications to fuel business or academic studies.

Whether you’re a marketer, researcher, or developer, this AI scraper streamlines data extraction—fast, efficient, and automated! 🚀

🎯 Final Thoughts

🎉 Congrats! You've successfully built your own AI-powered scraper using Crawl4AI, giving you the ability to collect as many potential leads as you need for your business or clients.

This scraper is highly adaptable—just plug in the website and specify the data you want to extract, then let it do the rest! 🚀

💡 Got ideas to improve it? Drop them in the comments!

👉 Want to learn more? Follow my blog and check out my GitHub for more AI projects & tutorials.

Happy scraping! 🔥

Build a Local RAG Researcher with DeepSeek R1

Aimen Kerrour — Fri, 07 Feb 2025 00:13:37 +0000

You’ve probably heard of DeepSeek R1, the powerful Chinese reasoning LLM that’s detouring OpenAI’s dominance in the AI world. And let’s be honest—you’ve likely seen tons of tutorials on chatting with PDFs using its local versions. But… that’s where most of them stop.

So, I thought—why not go beyond that? 🚀

What if you had a local RAG researcher that:

Questions your local documents 📄
Performs web searches 🌍
Generates structured reports in your format 📝

💡 See for yourself!

By the end of this guide, you’ll learn:

🔹 Why you need a local RAG researcher

🔹 How to use local DeepSeek R1 models with Ollama

🔹 How to run your own fully local RAG researcher

So… are you ready to level up your AI research game? Let’s dive in! ⏳💡

🔥 Explore the project in my Github repository now!

Do You Really Need a Local RAG Researcher? 🤔

You already know the perks of local LLMs—privacy, zero ongoing costs, offline access, and full control over the models. But why take it a step further with a local RAG researcher?

Let’s look at two simple cases:

📌 Your boss asks for a report on the company’s progress using internal documents.

📌 You want to analyze your personal finances and get a tailored report with insights.

In both cases, sending sensitive data to a cloud AI isn’t an option unless you want your finances leaked 💸 or your boss furious 😡.

Sure, you could chat with your documents using a local model, but you'd still waste time manually compiling a final report.

🔹 That's where your local RAG researcher comes in.

Just tell it what you need and how the report should be structured, and it will:

✅ Generate relevant research queries

✅ Retrieve key insights from your docs

✅ Perform web searches for real-time data (if enabled!)

✅ Summarize everything into a structured report

🚀 And it does all this entirely on your local machine.

Why Use DeepSeek R1? 🧠

When DeepSeek released their top-tier DeepSeek R1 model, they didn't stop at just one breakthrough. Instead, they took it a step further by developing a series of distilled models—smaller, open-source LLMs built on top of existing architectures like LLaMA and Qwen. These models were fine-tuned using reasoning datasets generated by the larger DeepSeek R1 model.

In other words, the big 671b parameters DeepSeek R1 model served as a teacher model, transferring its reasoning abilities to these smaller versions through carefully curated training datasets. This method allowed them to retain strong reasoning capabilities while being much more lightweight and efficient.💡

And the results? 🔥

📊 Just look at the table above—a trained LLaMA 70B model from DeepSeek already outperforms OpenAI’s o1-mini in reasoning tasks. That’s huge!

But what’s even more exciting?

If you don’t have a powerful GPU, these smaller models change everything. You can now run the DeepSeek-R1-14B model on your own machine and still achieve reasoning performance comparable to OpenAI’s O1-mini, while also being much more powerful than the latest GPT-4o and Claude Sonnet-3.5 models.

This means powerful, private, and cost-free reasoning models—all accessible from your local setup. 🚀

💡 The possibilities are endless, and they’re now in your hands.

Researcher Agent Architecture

(If you can't wait to test it out, skip to setup—we won't tell! 😉)

Before setting up the RAG researcher, let’s take a quick look at how it’s built 🔍.

Local ChromaDB Vector Store

As with any RAG application, we need a vector database for search. Once you upload your files (PDFs, TXT, MD, or CSV) from the UI, we follow these standard steps:

1. Load the Documents: We use LangChain loaders to handle different file formats easily.

# Import libraries
def process_uploaded_files(uploaded_files):
    ...

    # Choose the appropriate loader
    if file_extension == "csv":
        loader = CSVLoader(temp_file_path)
    elif file_extension in ["txt", "md"]:
        loader = TextLoader(temp_file_path)
    elif file_extension == "pdf":
        loader = PDFPlumberLoader(temp_file_path)
    else:
        continue
    ...

💡 Tip: Experimenting with different file types or chunking strategies? Just swap in a new loader or splitter logic to suit your needs!

2. Chunk the Documents: We split each document into smaller, semantically meaningful chunks.
3. Create the Vector Database: We use a local HuggingFace embedding model to convert document chunks into vector representations, then store them in the local Chroma database.

def add_documents(documents):
    embeddings = HuggingFaceEmbeddings()

    # Process the new documents
    semantic_text_splitter = SemanticChunker(embeddings)
    documents = semantic_text_splitter.split_documents(documents)

    # Create embeddings and store in vector DB
    vectorstore = Chroma(embedding_function=embeddings)
    vectorstore.add_documents(documents)

    return vectorstore

🔑 Key Point: Semantic chunking splits text based on meaning rather than just character count, improving accuracy when fetching relevant snippets later.

Researcher Agent

(Don’t worry—we’ll keep this quick. But trust me, you’ll want to see how it works. ✨)

Our researcher agent is an adaptive RAG pipeline built using LangGraph. Let’s break down how it transforms your instructions into polished reports.

Core Workflow

The agent functions as a state machine that manages research generation, document retrieval, and synthesis.

Step 1: Generate Search Queries

The agent starts by analyzing the provided instructions and generating precise research queries:

def generate_research_queries(...):  
    query_writer_prompt = RESEARCH_QUERY_WRITER_PROMPT.format(
        max_queries=max_queries,
        date=datetime.datetime.now().strftime("%Y/%m/%d %H:%M")
    )

    # Using a local DeepSeek R1 model with Ollama
    result = invoke_ollama(
        model='deepseek-r1:7b',
        system_prompt=query_writer_prompt,
        user_prompt=f"Generate research queries for this user instruction: {user_instructions}",
        output_format=Queries
    )

We provide the reasoning model with the current date and the maximum number of queries, letting it handle the rest:

RESEARCH_QUERY_WRITER_PROMPT = """You are an expert Research Query Writer specializing in designing precise and effective queries to fulfill user research tasks.

Your goal is to generate the necessary queries to complete the user's research goal based on their instructions. Ensure the queries are concise, relevant, and avoid redundancy.

Your output must be a JSON object containing a single key "queries":
{{ "queries": ["Query 1", "Query 2",...] }}

# NOTE:
* You can generate up to {max_queries} queries, but only as many as needed to effectively address the user's research goal.
* **Today is: {date}**
"""

💡 Tip: You can adjust the number of search queries directly from the UI.

Step 2: Process Queries

Once generated, queries are processed in parallel for speed. Each query runs through a mini-research subgraph:

query_search_subgraph.add_edge(START, "retrieve_rag_documents")
query_search_subgraph.add_edge("retrieve_rag_documents", "evaluate_retrieved_documents")
query_search_subgraph.add_conditional_edges("evaluate_retrieved_documents", route_research)
query_search_subgraph.add_edge("web_research", "summarize_query_research")
query_search_subgraph.add_edge("summarize_query_research", END)

Here’s how it works:

Retrieve Documents

Fetches the top 3 most relevant snippets from ChromaDB using similarity search.

def retrieve_rag_documents(...):  
    vectorstore_retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})  
    return {"retrieved_documents": documents}

Relevance Check

A secondary reasoning agent evaluates whether the retrieved documents actually answer the query (relevant or not).

def evaluate_retrieved_documents(...):  
    evaluation_prompt = RELEVANCE_EVALUATOR_PROMPT.format(...)
    evaluation = invoke_ollama(...)  # "Are these docs useful? Yes/No"  
    return {"are_documents_relevant": evaluation.is_relevant}

Adaptive Routing

✅ Relevant? → Move to summarization.
❌ Irrelevant? → Launch web search (if enabled).

😔 No web search? → Skip the query.

def route_research(...):  
    if state["are_documents_relevant"]:  
        return "summarize_query_research"  
    elif enable_web_search:  
        return "web_research"  
    else:
        return "__end__"

⚠️ Note: Web research uses the Tavily API to gather high-quality sources (academic papers, verified blogs).

Summarize Research Findings

A summarizer agent compiles a concise, focused summary based on retrieved documents or web search results for each query.

def summarize_query_research(...):  
     summary_prompt = SUMMARIZER_PROMPT.format(...)

     summary = invoke_ollama(
         model='deepseek-r1:7b',
         system_prompt=summary_prompt,
         user_prompt=f"Generate a research summary for this query: {query}"
     )
     return {"search_summaries": [summary]}

We instruct the summarizer agent to craft an objective summary of the key findings while skipping irrelevant facts:

 SUMMARIZER_PROMPT="""Your goal is to generate a focused, evidence-based research summary from the provided documents.

 KEY OBJECTIVES:
 1. Extract and synthesize critical findings from each source
 2. Present key data points and metrics that support main conclusions
 3. Identify emerging patterns and significant insights
 4. Structure information in a clear, logical flow

 REQUIREMENTS:
 - Begin immediately with key findings - no introductions
 - Focus on verifiable data and empirical evidence
 - Keep the summary brief, avoid repetition and unnecessary details
 - Prioritize information directly relevant to the query

 Query:
 {query}

 Retrieved Documents:
 {docmuents}
 """

Step 3: Generate Final Report

The researcher compiles all findings into a well-structured report following the user-defined format.

def generate_final_answer(...):  
    report_structure = config["configurable"].get("report_structure", "")
    answer_prompt = REPORT_WRITER_PROMPT.format(
        instruction=state["user_instructions"],
        report_structure=report_structure,
        information="\n\n---\n\n".join(state["search_summaries"])
    )
    result = invoke_ollama(
        model='deepseek-r1:7b',
        system_prompt=answer_prompt,
        user_prompt=f"Generate a research summary using the provided information."
    )
    return {"final_answer": parse_output(result)["response"]}

This is the prompt that the report writer agent must follow:

REPORT_WRITER_PROMPT = """Your goal is to use the provided information to write a comprehensive and accurate report that answers all the user's questions. 
The report must strictly follow the structure requested by the user.

USER INSTRUCTION:
{instruction}

REPORT STRUCTURE:
{report_structure}

PROVIDED INFORMATION:
{information}

# **CRITICAL GUIDELINES:**
- Adhere strictly to the structure specified in the user's instruction.
- Start IMMEDIATELY with the summary content - no introductions or meta-commentary
- Focus ONLY on factual, objective information
- Avoid redundancy, repetition, or unnecessary commentary.
"""

The crafted report appears in the UI, ready for you to review or copy.

Excited to see it in action? Let’s fire up the agent! 🔥

How to Run It Yourself?

Tech Stack Used

Ollama: Runs the DeepSeek R1 model locally.
LangGraph: Builds AI agents and defines the researcher's workflow.
ChromaDB: Local vector database for RAG-based retrieval.
Streamlit: Provides a UI for interacting with the researcher.

🛠 Step 1: Install Ollama

Ollama is available for macOS, Linux, and Windows. Follow these steps to install it:

1️⃣ Visit the official Ollama download page.

🔗 Download Ollama

2️⃣ Select your operating system (macOS, Linux, or Windows).

3️⃣ Click the Download button.

4️⃣ Follow the system-specific installation instructions.

📸 Screenshot:

🛠 Step 2: Run DeepSeek R1 Locally

Once Ollama is installed, you can run DeepSeek R1 models. There are multiple options depending on your system’s capabilities.

For this researcher agent, I used the 7B model, but you can choose the 1.5B model or even larger ones if your machine can handle them!

🔹 Pull the DeepSeek R1 Model

To download the 1.5B parameter model, run:

ollama pull deepseek-r1:1.5b

🔹 Run DeepSeek R1

Once downloaded, you can interact with the model using:

ollama run deepseek-r1:1.5b

🔑 Key Point: Larger models (e.g., 32B, 70B) provide better reasoning and research quality but require significant RAM.

🛠 Step 3: Set Up the RAG Researcher with Streamlit

1️⃣ Clone the Project

To run the researcher agent, clone the GitHub repository and install the required dependencies (preferably in a virtual environment):

git clone https://github.com/kaymen99/local-rag-researcher-deepseek
cd local-rag-researcher-deepseek
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt

2️⃣ Set Up Web Search API (Optional)

(Skip this if you don't want to use web search.)

The researcher agent uses the Tavily API for web search. You’ll need an API key (they give a lot of free credit—basically FREE). Add it to your .env file:

# Tavily API key for SearchTool
TAVILY_API_KEY="tvly-..."

3️⃣ Launch the Streamlit App

I've already built a Streamlit UI for the researcher agent (check app.py). It allows you to configure the agent, upload documents, and see real-time updates.

To launch the app, run:

streamlit run app.py

📸 Screenshot:

🔹 Visualize in LangGraph Studio

Since the researcher is built with LangGraph, you can visualize its workflow in LangGraph Studio. (You'll need a free LangSmith account for this)

pip install -U "langgraph-cli[inmem]"
langgraph dev

📸 Screenshot:

🚀 Test It Out!

Upload your documents and provide research instructions—the agent will handle the rest!

Real-World Use Cases 🌍

A local RAG researcher with reasoning power isn’t just a cool project—it’s a game-changer for handling private, sensitive data. Here’s how it can be used in real scenarios:

🏥 Healthcare & Medical Research: Analyze patient files, summarize research papers, and generate reports—all while keeping data secure.
🔐 Cybersecurity & Threat Analysis: Process vulnerability reports, security logs, and risk assessments without exposing sensitive information.
💰 Financial Insights & Business Reports: Review personal or company financials, uncover spending patterns, and generate structured reports privately.
🚀 Product Development & Innovation: Summarize internal R&D docs, analyze customer feedback, and generate insights for new product strategies.
🏢 Corporate Management: Retrieve and summarize internal reports, meeting notes, and technical documents for faster decision-making.

And the best part? Everything stays local. No cloud processing, no data leaks—just secure, AI-powered research. 🚀

Customizations 🛠️

🔹 Add Custom Report Structures

By default, the researcher uses a "standard" report structure, but you can customize it.

Add new structures inside the report_structures folder in the project root directory.
Select your preferred structure from the UI.

🔹 Use external LLM Providers

By default, the researcher uses Ollama and the local DeepSeek R1 models, but you can easily switch to external LLM providers like OpenAI, Claude, or even the larger Cloud-based DeepSeek R1 models.

To change providers:

1️⃣ Uncomment the relevant code in assistant/graph.py.

2️⃣ Add the necessary API keys to your .env file.

Final Thoughts

✅ You’ve successfully set up your local RAG researcher with DeepSeek R1 & Ollama!

✅ Your AI researcher is up and running—ready to analyze private data and generate your structured reports!

✅ Built with LangGraph, it’s fully customizable—add a reflection step, connect it to databases, or extend its capabilities to fit your needs!

💡 Want to learn more? Follow my blog and check out my Github for more AI project & tutorials!

📖 Also, don’t miss my guide to reasoning LLMs! Check out Understanding Reasoning LLMs: DeepSeek R1 & OpenAI o1 Demystified to learn how these models really work!

Understanding Reasoning AI: How DeepSeek R1 & OpenAI o3/o1 Work

Aimen Kerrour — Sun, 26 Jan 2025 14:18:35 +0000

The AI landscape is evolving at breakneck speed. We’ve grown accustomed to chat models like GPT-4o and Claude Sonnet 3.5—masters of fast, intuitive responses. They write emails, crack jokes, and explain quantum physics in plain English. But when faced with debugging a 500-line script or optimizing a supply chain, their limits become clear. These models excel at conversational tasks but struggle with problems that demand slow, deliberate reasoning.

With the recent release of DeepSeek’s R1 and OpenAI’s o1/o3 models, we are witnessing the rise of reasoning LLMs—a new class of AI designed not just to chat but to think. These models take a methodical, step-by-step approach to tackle complex math problems, debug code, and analyze medical data with precision. They represent a shift from “autocomplete on steroids” to deliberate problem-solving machines.

Let’s explore how these reasoning models work, why they’re different, and how they’re reshaping AI’s role in solving the toughest challenges.

Understanding Reasoning LLM Models

To grasp why reasoning models like DeepSeek R1 and OpenAI’s o1/o3 are revolutionary, we need to start with the foundation: next-word prediction, the engine powering most chat models today.

The Power (and Limits) of Next-Word Prediction

Models like GPT-4o and Claude Sonnet 3.5 are trained to predict the next token (word or subword) in a sequence. This simple objective forces them to learn grammar, world knowledge, and even basic reasoning—all at once.

This approach scales remarkably well. As models grow larger and datasets expand, they unlock emergent abilities—skills like translation or arithmetic that weren’t explicitly programmed. GPT-3, for instance, couldn’t reliably solve math problems, but GPT-4 improved significantly.

But there’s a catch: next-word prediction is System 1 thinking—fast, intuitive, but shallow. It uses the same computational effort for easy tasks (“What’s 2+2?”) and hard ones (“Prove Fermat’s Last Theorem”). When faced with complex reasoning, it often guesses plausibly rather than calculating methodically.

The Chain-of-Thought Workaround

To address this limitation, researchers introduced the concept of chain-of-thought (CoT) prompting in 2022. This technique leverages prompt engineering to guide models into explicitly revealing their intermediary reasoning steps or internal thoughts, nudging them toward System 2 thinking—slower, more deliberate, and logical reasoning.

By generating intermediate tokens (like algebraic steps), the model “stores” partial solutions, mimicking human problem-solving.

This hack improved performance on math, coding, and logic tasks—but it was still a band-aid. Models weren’t trained to reason this way; they were just prompted to.

The New Paradigm: Reinforcement Learning Meets Chain-of-Thought

Reasoning LLMs like DeepSeek R1 and OpenAI’s o1/o3 go further. Instead of relying on prompting tricks, they’re trained from the ground up using reinforcement learning over chain-of-thought trajectories.

Let's explain how it works!

First, let’s introduce Reinforcement Learning (RL) for those who might be unfamiliar with it. It’s a machine learning paradigm where an agent learns to make decisions through trial and error, receiving rewards for good actions and penalties for bad ones. Think of it like training a dog—you reinforce the behaviors you want to encourage with rewards.

In the world of conversational AI, RL plays a pivotal role. You’ve likely come across terms like RLHF (Reinforcement Learning from Human Feedback) and RLAIF (Reinforcement Learning from AI Feedback). These techniques fine-tune models like GPT-4o or Claude to:

Align outputs with human preferences (e.g., polite, helpful).
Avoid harmful or nonsensical responses.

For example, RLHF might nudge a model toward drafting concise, well-structured email responses while steering it away from rambling or off-topic content. However, it’s important to note that these techniques focus on style and safety, not necessarily on improving reasoning correctness.

How RL Transforms Reasoning Models

With reasoning LLMs, RL is applied differently. Instead of aligning with human preferences, it’s used to train models to generate verified, correct solutions. Here’s how:

Training Data with Ground Truth:
- Datasets contain problems with explicitly correct answers (e.g., solved math equations, bug-free code).
- Example: A Python script that calculates Fibonacci numbers, verified by unit tests.
Generate Multiple Trajectories:
- For each problem, the model produces dozens of potential solutions (trajectories), including intermediate reasoning steps.
- Example: 10 different ways to derive the solution to “Solve 3x + 5 = 20”, some correct, some flawed.
Grader/Verifier:
- A verifier (e.g., code executor, equation solver) checks each trajectory.
- Correct solutions get a reward; incorrect ones get nothing or get a penalty.
Policy Optimization:
- The ultimate goal of RL is to learn the best policy, so the model’s weights are adjusted to favor high-reward trajectories.
- Over time, it internalizes patterns that lead to verified answers which reinforces correct reasoning.

The Process in Action

Let’s break down a training step for a coding problem:

Input: “Write a Python function to calculate Fibonacci numbers.”
Model Generates: 20 potential solutions with different algorithms (recursive, iterative).
Grader Tests: Each solution against unit tests (e.g., fib(5) must return 5).
Reward: Only bug-free code that passes tests gets a reward.
Policy Update: The model learns to prioritize efficient, test-passing code.

This method, described in OpenAI’s Learning to Reason with LLMs blog post and Nathan Lambert’s breakdown, turns the model into a self-improving problem-solver.

A New Scaling Law: Quality Over Quantity

In the past 2 years, LLM progress followed a predictable formula:

bigger models + more data = better performance

But this paradigm is hitting diminishing returns. Training data is becoming scarce (e.g., high-quality text is finite), and doubling model size no longer guarantees proportional gains.

Reasoning models like OpenAI’s o1 and DeepSeek R1 rewrite this playbook. Their performance scales not with raw data volume, but with how compute is allocated to verify reasoning quality. This introduces a new scaling law:

What the Data Shows

As shown from OpenAI's o1 model training, reasoning models improve steadily with more compute allocated to verifying trajectories. Unlike traditional training (where gains plateau), iterative refinement of reasoning paths yields continuous improvements.

Additionally at test time, allowing models like DeepSeek R1 to use more tokens for “thinking” (e.g., generating longer chain-of-thought steps) directly boosts accuracy. This mirrors human problem-solving—more time spent deliberating often leads to better solutions.

Why This Matters

Smaller Models, Bigger Impact: A 7B-parameter reasoning model can outperform a 70B-parameter generalist on niche tasks (e.g., medical coding analysis) by focusing compute on verification, not memorization.
Escape the Data Bottleneck: Instead of scraping the internet for more text, these models extract value from smaller, high-quality datasets (e.g., verified math problems).
Flexible Compute Tradeoffs: Users can “dial up” accuracy by allowing more reasoning steps (tokens) or “dial down” for faster, cheaper outputs.

Future of Reasoning models

Today’s reasoning models are still in their infancy, yet they’re already achieving what took traditional LLMs years to master. Benchmarks like GPQA—a grueling test of complex, “Google-proof” reasoning—are being saturated 10x faster than older models could manage.

And this is just the beginning!

With the recent surge of DeepSeek R1 and a multitude of open-source reasoning models like Llama-3-Reason and Mistral-Math, more researchers and developers are diving into this field. This collective momentum promises rapid advancements—not in years, but potentially within months. The era of deliberate, step-by-step reasoning has just begun, and its potential to reshape problem-solving is boundless.

How to Use Reasoning LLM Models

(Inspired by Latent Space’s “Missing Manual” for OpenAI o1)

Reasoning models like OpenAI’s o1 or DeepSeek R1 demand a fundamentally different prompting than chat-first models like Claude or GPT-4o. Here’s how to use them effectively:

1. Provide Extensive Context

Think of the model as a new hire. It won’t ask for more details, so you need to give it everything upfront. Share as much context as possible—even more than you think is necessary. For example:

Explain what you’ve already tried and why it didn’t work.
Include relevant data, like database schemas or company-specific terms.
Describe the problem space in detail.

The more context you provide, the better the model can deliver accurate results without unnecessary back-and-forth.

2. Focus on the Outcome, Not the Process

Instead of telling the model how to solve a problem, focus on what you want as the final result. Be specific about your goals:

Do you need a complete file, a list of options, or a detailed explanation?
Let the model handle the reasoning and planning—it’s designed to do that for you.

3. Understand Their Strengths and Limitations

✅ What They Can Do

Generate Complete Files: They can one-shot entire files (or multiple files) with minimal errors, following your existing patterns.
Explain Complex Concepts: They excel at breaking down difficult topics with clear examples, almost like writing an article.
Provide Structured Outputs: They can create detailed comparisons, pros/cons lists, or multiple plans for architectural decisions.
Hallucinate Less: They are generally more accurate, especially with niche tasks like bespoke query languages or medical differential diagnoses.

❌ What They Can’t Do (Yet)

Write in Specific Styles: They tend to default to an academic or corporate tone, struggling to adapt to unique voices.
Build Entire Applications: While great at one-shotting features, they require significant iteration to create a full SaaS or complex app.

By following these tips, you can unlock the full potential of reasoning models and achieve faster, more accurate results.

Practical Use Cases

Reasoning models excel at tasks that require deep thinking and structured problem-solving. Here are some of their applications:

Coding

As mentioned earlier, these models are able to generate multiple code files in a single step with minimal errors. For example, Cole Madin demonstrated how DeepSeek R1 was used within Bolt.diy to build a complete chat interface for an AI agent, delivering outstanding results compared to Claude-Sonnet-3.5 on the first attempt.

Planning in Workflows and Agents

Reasoning models are highly effective in upfront planning for agent-based workflows, outlining detailed steps for execution by traditional agents.

Unify used OpenAI's o1 model to create their AI agents for account qualification using LangGraph, the model was used to generate detailed, step-by-step plans and identify potential challenges. It effectively expanded on user queries, even with minimal input, making it ideal for strategic planning and further research tasks.

Unify Launches Agents for Account Qualification using LangGraph and LangSmith

blog.langchain.dev

Deep Research and Reflection

As research assistants, these models excel at analyzing academic papers, summarizing key findings, and generating comprehensive reports. They are also highly effective at reflecting on and analyzing search results or large sets of documents, enabling them to extract critical insights or identify overlooked details.

Langchain has posted an excellent tutorial on how to set up your own local research assistant using DeepSeek-R1, making it easier than ever to leverage these powerful tools for academic and research purposes.

Data Analysis

Advanced reasoning models have proven to be invaluable for analyzing complex datasets, including those in highly specialized fields like medicine and finance. These models uncover patterns, extract actionable insights, and streamline decision-making processes.

Enhancing Financial and Media Analysis Using OpenAI’s o1 Model in AiReportPro: explores how the o1 model transforms financial and media analysis, empowering businesses with real-time intelligence for better strategic decisions.
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?: investigates the potential of OpenAI’s o1 model in medical applications, assessing its capability to support healthcare professionals through diagnostic and predictive analytics.

Newsfeeds Analysis

Reasoning models are powerful tools for analyzing news and social media, identifying trends, and surfacing relevant information. For example, the Firecrawl team developed the o1 Trend Finder, a tool powered by the o1 model. It efficiently filters critical insights from news feeds, providing users with targeted updates that matter most.

// Detect dark theme var iframe = document.getElementById('tweet-1874145916116222198-230'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1874145916116222198&theme=dark" }

Conclusion

We’ve seen how reasoning LLMs like DeepSeek R1 and OpenAI o1/o3 are flipping the script. No longer just "fast talkers" these models slow down, break problems into steps, and verify solutions like meticulous problem-solvers. By combining chain-of-thought logic with reinforcement learning, they’re tackling tasks that once stumped traditional chatbots—debugging code, cracking math proofs, or untangling complex data.

This isn’t just an upgrade—it’s a fundamental shift! Reasoning models prove that AI can do more than mimic human conversation; it can mirror our deliberate, step-by-step thinking. But here’s the kicker:

Does this mean we’re edging closer to AGI?

While they’re still far from human-like general intelligence, their ability to learn verified reasoning patterns hints at a future where AI doesn’t just answer—it understands.

So, where do you think we’re headed with ever smarter LLMs? Share your thoughts below! 🌟🤔

AI Voice Agents in 2025: A Comprehensive Guide

Aimen Kerrour — Thu, 09 Jan 2025 10:20:27 +0000

Did you know that 62% of potential customers are lost before they even hear a response?

Studies reveal that most callers won’t try again if their first attempt goes unanswered. Businesses across industries face this challenge—whether it's handling high call volumes, managing inquiries, or qualifying leads. The result? Missed opportunities, lost revenue, and weakened customer trust.

But what if you could answer every call, every time?

The good news is, you can. In 2025, AI voice agents are revolutionizing customer interactions. These intelligent systems ensure no call goes unanswered, providing instant support, 24/7 availability, and seamless customer experiences.

In this post, we'll explore the world of AI voice agents: what they are, how they work, the benefits they offer, and how you can build and deploy them for your business.

What Are Voice AI Agents?

Voice AI agents are systems that use artificial intelligence (AI) to listen, understand, and respond to people in a natural and conversational way. These agents can do tasks like answering questions, giving information, booking appointments, and triggering actions—all in real-time.

How Do They Work?

To achieve this, voice AI agents rely on one of two main design architectures:

1. Modular Architecture (Traditional Method)

This method breaks down the interaction into separate components, each handling a specific task:

Speech-to-Text (STT): Translates the caller’s voice into written text.
Text Processing by an AI Chat Agent: A text-based AI agent, powered by a large language model (LLM), processes the text input. It uses integrated tools to fetch information, manage tasks like booking appointments, or triggering actions.
Text-to-Speech (TTS): Converts the agent's generated text response back into natural-sounding speech, which is then sent to the caller.

2. Unified Architecture (Direct Method)

Introduced in late 2024 with OpenAI's Realtime API, this method combines everything into one step:

A standalone AI agent handles everything from speech input to speech output.
The agent is backed by an LLM that directly processes audio input, performs the required analysis (e.g., calling tools or retrieving data), and generates an appropriate audio response without intermediate text-based steps.

Which One Is Better?

The modular architecture is cheaper but can be slower because different parts need to work together. On the other hand, using OpenAI's Realtime API offers a faster, more seamless experience, though it comes at a higher cost.

Types of Voice AI Agents

AI voice agents are designed for different purposes, and understanding these differences is key. There are two main types: inbound and outbound.

Inbound Agents

These agents are like your virtual receptionists, handling incoming calls and requests. They're designed to field inquiries, provide information, and assist customers who are reaching out to a business. Here's how they're being used:

Customer Support: Handling common issues like password resets and order tracking, while seamlessly directing complex problems to human agents.
Appointment Scheduling: Allowing customers to easily book, reschedule, or cancel appointments for services like medical consultations or car repairs.
Answering FAQs: Providing instant answers to frequently asked questions about business hours, locations, product details, or return policies.
Order Taking/Processing: Guiding customers through placing new orders or making reservations, simplifying the process.

Outbound Agents

These agents are proactive, reaching out to customers on behalf of a business. They're used for various purposes, from marketing and sales to reminders and follow-ups. Here are some examples:

Lead Qualification: Contacting potential customers to gauge interest, gather information, and schedule appointments for qualified leads with sales reps.
Reminders: Sending automated reminders to customers about upcoming appointments or overdue payments, reducing no-shows and improving cash flow.
Surveys and Feedback: Conducting customer satisfaction surveys to gather feedback and identify areas for improvement.
Reactivating Cold Leads: Reaching out to old leads that have gone cold, rekindling interest and identifying potential opportunities.
Promotional Calls: Informing customers about special offers, discounts, or new product launches.

By understanding the different types of AI voice agents and their capabilities, you can start to see the many ways they can be used to improve efficiency and customer experience, creating new opportunities for both businesses and developers.

Building AI Voice Agents: Tools and Approaches

Creating your own voice AI agent might seem daunting at first, but it has become surprisingly accessible. There are different approaches to suit various skill levels and project needs, and you don't even need deep coding knowledge to build a fully custom and highly reliable voice agent.

Let's explore the options you can use!

No-Code Platforms: The Fastest Way to Get Started

No-code platforms are revolutionizing the way voice AI agents are built, offering the quickest and easiest path to development.

These platforms act as an orchestration layer, seamlessly handling the complex background processes of your agent. This includes connecting to various voice service providers like:

Deepgram for highly accurate speech-to-text (STT).
ElevenLabs for realistic text-to-speech (TTS), generating natural-sounding voices.

They also simplify integration with leading:

Large Language Models (LLMs): such as OpenAI's GPT models, Claude, Google's Gemini, or open source ones like those from Meta LLAMA or Mistral, to power your agent's understanding and responses.
Telephony Providers: Enabling your agent to make and receive phone calls through services like Twilio or Telnyx.

By managing these connections, no-code platforms provide a visual interface where you can focus on what matters most:

Designing your agent's personality and conversation flow.
Defining the specific actions your agent should take.
Adding the tools that your agent will need.

Instead of getting bogged down in technical complexities, you can concentrate on crafting the user experience and ensuring your agent meets your specific goals.

Core Features

No-code voice AI platforms typically offer a set of essential features that make them powerful tools for building voice agents:

Multi-Language Support: Supports multiple languages, making it perfect for businesses that serve diverse customer bases.
Knowledge Base Creation: Allowing you to upload custom files with domain-specific information, enabling the agent to provide more accurate and contextually relevant responses.
Tool Integrations: Offering built-in tools like call transfers and end-call functionality, as well as supporting external integrations through webhooks to connect with custom servers or third-party services like Make.com.
Endpointing: Accurately detecting when a user has finished speaking to manage turn-taking smoothly.
Interruptions (Barge-in): Handling interruptions intelligently, differentiating between genuine interjections and simple acknowledgments.
Background Noise Management: Filtering out background noise to ensure clear communication and focus on the primary speaker.
Backchanneling: Incorporating subtle cues like "uh-huh" and "got it" to create a more natural and engaging conversation flow.
Call Transcriptions & Post-call Analysis: Providing detailed call transcripts and often generating summaries, determining call success, or extracting key information.

Popular No-Code Platforms: Vapi and Retell AI

Several platforms are at the forefront of no-code voice AI development, with Vapi and Retell AI being two of the most prominent.

VAPI

Vapi offers a highly customizable platform for building voice AI agents, where you can select your preferred providers for STT, LLM, and TTS, offering flexibility and control.

In addition to the core features, Vapi boasts:

Wide LLM Support: You assistant can use a vast range of LLMs, including OpenAI models, Claude, Gemini, Groq, and others.

Multiple Voice Providers: has integrations with ElevenLabs, Deepgram, Cartesia, OpenAI voices, etc.

GHL/Make Tools Integration: Vapi allows direct integration with GoHighLevel workflows and Make.com scenarios, enabling you to trigger these automations using voice commands within your agent.

Squads: The ability to create teams of specialized agents that can handle different parts of a complex workflow and easily transfer calls between them.

Conversation Flow—Blocks (Beta): A new feature designed to improve conversation flow by breaking it down into smaller, manageable prompts, reducing errors and hallucinations. This provides more control and reliability, like a "checklist for conversations."

Retell AI

Retell AI prioritizes creating highly responsive voice agents with minimal latency, making them ideal for real-time conversations.

While similar to VAPI in its capabilities, Retell AI's current LLMs model support is limited to OpenAI's GPT-4o and Anthropic's Claude.

OpenAI Realtime API Support

Both VAPI and Retell AI allow you to use the OpenAI Realtime API (speech-to-speech model) directly without having to interact with OpenAI, which simplify further the development of low latency voice agents.

Advantages & Disadvantages of No-Code

Advantages:

Speed: Launch your agent in minutes or hours, not days or weeks.
Ease of Use: Minimal to no coding experience is required, and server management is handled by the platform, making it very user-friendly.
Accessibility: Ideal for entrepreneurs, marketers, and anyone without a strong technical background.

Disadvantages:

Latency & Reliability: Both Vapi.ai and Retell AI rely on external API providers, which can result in delays of 3–4 seconds, affecting call quality. This is especially problematic for enterprise use.
Limited Customization: You may encounter restrictions based on the platform's built-in features.
Platform Cost: Subscription fees or usage-based pricing are common, so factor these costs into your budget.

Code-Based Solutions

For developers seeking full control over their voice agents and requiring highly customized solutions, code-based approaches are ideal. These involve writing code to manage every aspect of the voice agent, from natural language processing (NLP) to handling voice input and output.

There are two primary approaches:

Building your agent from scratch.
Using frameworks like LiveKit to simplify development.

Option 1: Build From Scratch

For building from scratch one would be use programming languages like Python or Node.js. This allows you to create highly customized voice agents tailored to specific needs.

You'll be responsible for handling all aspects of the agent's logic, including:

Connecting to speech-to-text (STT) and text-to-speech (TTS) providers.
Communicating with large language models (LLMs) via APIs and managing conversational state in memory.
Integrating with telephony providers like Twilio or Telnyx for inbound and outbound calls.
Implementing advanced features such as background noise removal, interruption handling, and backchanneling.

This list is not exhaustive—many additional aspects must be considered.

For instance, when interacting with different APIs and providers managing latency is critical since high delays can severely impact user experience. No one wants a voice agent with a 10-second delay!

If you are interested in using the OpenAI Realtime API, multiple tutorials are available for developing low-latency voice agents.

For example, Twilio provides various articles and videos about building inbound and outbound AI callers using the OpenAI Realtime API with Python or Node.js.

If you prefer Python for example, you can explore:

Option 2: Using LiveKit

If you want to avoid managing the complexities of real-time voice communication and provider integrations, consider using LiveKit, a framework for building programmable, multimodal AI agents in Python or Node.js.

LiveKit simplifies development by handling much of the heavy lifting. It operates as a stateful, long-running process, connecting to the LiveKit network via WebRTC for ultra-low-latency, real-time communication.

LiveKit is the equivalent of VAPI or Retell AI in the coding world! It offers capabilities similar to those no-code platforms, including:

Managing STT and TTS conversions.
Connecting with LLMs and handling turn detection and interruptions via Voice Activity Detection (VAD).
Easy integration with telephony providers like Twilio and Telnyx.
Supporting the use of OpenAI Realtime API.

Plus many others features which you can see for yourself in the their extensive Documentation

Advantages & Disadvantages of Code-Based Solutions

Advantages:

Ultimate Flexibility: Design your agent exactly how you want, without limitations.
Full Control: Manage every aspect of functionality and data handling.
Innovative Solutions: Build unique and differentiated voice experiences.

Disadvantages:

Steep Learning Curve: Requires strong programming skills and a deep understanding of AI and APIs.
Time-Consuming: Building from scratch demands significant time and effort.
Maintenance Overhead: You’re responsible for ongoing code maintenance and updates.

4. The Future of AI Voice Agents in 2025

2025 is shaping up to be a pivotal year for AI voice agents, as they become essential tools for businesses of all sizes. Industries such as retail, healthcare, finance, and real estate are leading the way, leveraging voice agents for customer support, lead generation, and operational efficiency.

Surveys suggest that 70% of businesses plan to adopt voice AI technology by the end of 2025. This growth is fueled by rapid advancements in AI models, which promise reduced costs and improved performance, enabling more powerful and accessible AI agents.

The widespread adoption of voice AI is creating a surge in demand for experts who can build, deploy, and optimize these systems. Entrepreneurs and developers who invest in mastering this technology today are well-positioned to tap into a rapidly expanding market.

Beyond offering voice AI as a service, businesses can explore niche opportunities, such as developing industry-specific agents or providing value-added features like analytics, voice personalization, and CRM integrations. The possibilities are vast, and the potential rewards are significant.

5. How to Get Started with AI Voice Agents

Ready to start exploring the power of AI voice agents?

Getting started is easier than you might think. Here's a step-by-step guide to help you launch your first voice agent:

1- Identify a Niche:

Don’t try to boil the ocean. Focus on a specific industry or a clearly defined problem where a voice agent can deliver significant value.

For example, you could start by automating appointment scheduling for a dental practice or creating a lead qualification agent for a real estate agency. Specializing allows you to build expertise and tailor your solution for maximum impact.

Start small, focus on a specific use case, and scale as you gain more experience and confidence.

2- Choose Your Tools:

Select platforms aligned with your technical skills and project requirements.

No-Code Platforms: If you're new to AI or prefer a visual development approach, consider using no-code platforms like Vapi.ai or Retell AI. They offer user-friendly interfaces, pre-built integrations, and handle much of the technical complexity. Start with these to get familiar with the technology and its capabilities.
Code-Based Solutions: For developers who want complete control and customization, building with frameworks like LiveKit offers ultimate flexibility.

3- Design Your Agent Instructions:

As with all AI use cases, you must excel in prompt engineering and providing clear instructions to your agent.

Define the Persona: Give your agent a distinct personality that aligns with your brand and target audience.
Outline the Conversation Flow: Clearly explain the key questions, responses, and decision points the agent should follow.
Specify the Tools: Detail the tools your agent should use, such as CRM integrations, knowledge bases, or external APIs.
Use Examples: Provide sample conversations to demonstrate how the agent should handle different scenarios.

4- Train Your Agent:

Real-World Conversations: Use transcripts of actual customer interactions to make the agent more natural and effective.
FAQs: Train your agent on frequently asked questions to ensure accurate and consistent answers.
Industry-Specific Data: If you're building a niche agent, provide data relevant to that industry to improve its expertise.

5- Deploy and Test:

Integrate: Connect your agent to your existing phone systems, website, or other channels.
Run User Tests: Gather feedback from real users to identify areas for improvement.
Iterate: Continuously refine your agent based on user feedback and performance data. Deploying is not the end; it's an ongoing process of improvement.

By following these steps, you can create a powerful AI voice agent that enhances customer experience, streamlines operations, and drives business growth.

Conclusion

🎉 Congratulations on taking your first step into the exciting world of voice AI agents! Now that you’ve seen their potential, it’s time to take action.

📚 Next Steps:

Check out this YouTube playlist for step-by-step tutorials on building voice AI agents.
Learn how to pitch and sell your AI voice solutions to businesses with this guide.

✨ The future is voice-driven, and you have the tools to shape it. Start building your voice AI projects today and share your journey in the comments below!

What excites you most about voice AI? Let’s discuss! 🔥

Build the Ultimate AI Email Automation with AI Agents, RAG, and LangGraph

Aimen Kerrour — Sun, 05 Jan 2025 16:13:21 +0000

In the age of instant gratification, customer support is no longer a department, it's the entire company.

Did you know that 86% of customers will churn after just one bad experience?

Traditional support teams are overwhelmed, struggling to manage endless email queues.

But the game is changing. Large Language Models (LLMs) and AI Agents are transforming customer service, enabling businesses to meet rising expectations with speed and precision. Studies show that AI can boost customer satisfaction by 15–25% and cut costs by 20–40%.

The era of manual email processing is over. AI automation is here to deliver the fast, high-quality support your customers demand.

In this technical guide, we’ll explore how I built an AI Customer Support Email Automation System using LangGraph, LangChain, and RAG to:

Monitor emails 24/7 and deliver rapid responses to customer inquiries.
Leverage AI agents to intelligently sort emails into categories and craft tailored responses.
Use Retrieval-Augmented Generation (RAG) to provide precise answers to product and service questions.

Let's dive in!

AI Email Automation with LangGraph and RAG

To better understand the AI Email Automation system I built, it's helpful to first introduce the core technologies that power it: LangChain, LangGraph, and Retrieval-Augmented Generation (RAG).

LangChain and LangGraph

LangChain is a framework for developing AI applications using different LLMs (like GPT-4o, Google Gemini, LLAMA3) and LangGraph is built on top of it, extending its capabilities by adding the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.

Think of LangGraph as the workflow orchestrator. It excels at managing complex, multi-step processes by representing them as a graph—a network of interconnected nodes. This allows for flexible and dynamic interactions, enabling the system to adapt to various scenarios and user inputs.

What is RAG?

You’ve likely heard of Retrieval-Augmented Generation (RAG), but let me briefly explain its role. Most LLMs are trained periodically on large public datasets, meaning they often lack recent information or access to private data unavailable during training.

RAG addresses this limitation by connecting LLMs to external data sources. It retrieves relevant information from a knowledge base (such as company documents, websites, or FAQs) and uses that data to generate accurate and contextually appropriate responses. In essence, it’s like equipping the AI with a reference library to consult before answering.

Alright, now that we've covered the basics of LangChain, LangGraph, and RAG, let's see what this system can actually do!

High-Level System Overview

The automation system operates by continuously monitoring the agency's inbox, categorizing emails, generating tailored responses, and ensuring quality before sending them. Below is a visual representation of the process:

A. Email Monitoring and Categorization

Using the Gmail API, the system actively monitors the agency's inbox. New emails are immediately identified and categorized by the AI classification agent into four key classes:

Customer Complaint: Emails detailing dissatisfaction or issues.
Product Inquiry: Emails requesting information about offerings.
Customer Feedback: Emails providing general opinions, suggestions, or praise.
Unrelated: Emails irrelevant to the agency’s operations, which are ignored.

B. Dynamic Response Generation

The response process adapts based on the email's category:

Complaints and Feedback: These emails are routed directly to an AI writer agent, which generates personalized responses acknowledging the customer’s concerns or expressing gratitude for their feedback. RAG (Retrieval-Augmented Generation) is not required for these cases.
Product Inquiries: For these emails, we use RAG to retrieve accurate and up-to-date information from the agency’s knowledge base (e.g., product documentation, FAQs). This information is then used by the AI writer agent to craft a comprehensive and precise response.

C. AI Quality Assurance

Every email undergoes quality checks before being sent. An AI proofreader agent ensures:

Formatting: The email is visually appealing and easy to read.
Relevance: The response directly addresses the customer’s queries or concerns.
Tone: The tone aligns with the agency’s professional and brand-specific standards.

D. Automated Email Dispatch

Once an email passes the quality assurance stage, it is promptly sent to the customer, ensuring swift and efficient communication.

How It Works: Step-by-Step Breakdown

Setting Up the Knowledge Base for RAG

Since our system relies on Retrieval-Augmented Generation (RAG), we need a well-prepared knowledge base containing information about the agency. This will help us provide accurate and relevant answers to customer inquiries about products and services.

For this project, I've already created a sample agency profile using ChatGPT. To use it, we'll follow the standard steps for setting up a RAG knowledge base:

Load the Documents: We'll start by loading the agency documents.
Chunk the Documents: We'll then split the document into smaller, manageable chunks. This is important because large segments can be difficult for LLMs to process effectively.
Generate Embeddings: We'll use an embedding model to convert each chunk into a vector representation.
Create a Vector Database: Finally, we'll create a vector database to store these embeddings.

To keep things simple for this project, I've used Chroma DB, a local vector database, along with Google's embedding model. For production environments, you might consider more scalable solutions like Pinecone or Qdrant.

Here's the code that accomplishes this setup:

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Loading Docs
loader = TextLoader("./data/agency.txt")
docs = loader.load()

# Split document into chuncks
doc_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
doc_chunks = doc_splitter.split_documents(docs)

# Creating vector embeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001") # Updated model name
vectorstore = Chroma.from_documents(doc_chunks, embeddings, persist_directory="db")

# Semantic vector search retriever
vectorstore_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

With the knowledge base set up, we can now visualize the core of the system: the automation graph built with LangGraph. Here's how it looks:

# load inbox emails
workflow.set_entry_point("load_emails")

# check if there are new emails to process
workflow.add_edge("load_emails", "is_email_inbox_empty")
workflow.add_conditional_edges(
    "is_email_inbox_empty",
    nodes.check_new_emails,
    {
        "process": "categorize_email",
        "empty": END
    }
)

# route email based on category
workflow.add_conditional_edges(
    "categorize_email",
    nodes.route_email_based_on_category,
    {
        "product related": "construct_rag_queries",
        "not product related": "email_writer", # Feedback or Complaint
        "unrelated": "skip_unrelated_email"
    }
)

# pass constructed queries to RAG chain to get information
workflow.add_edge("construct_rag_queries", "retrieve_from_rag")
# give information to writer agent to write the email
workflow.add_edge("retrieve_from_rag", "email_writer")
# proofread the generated email
workflow.add_edge("email_writer", "email_proofreader")
# check if email is sendable or not, if not rewrite the email
workflow.add_conditional_edges(
    "email_proofreader",
    nodes.must_rewrite,
    {
        "send": "send_email",
        "rewrite": "email_writer"
    }
)

# check if there are still emails to be processed
workflow.add_edge("send_email", "is_email_inbox_empty")
workflow.add_edge("skip_unrelated_email", "is_email_inbox_empty" )

Let's explore each part of this automation separately!

1. Email Monitoring

The process begins with continuous monitoring of the agency's Gmail inbox. This is achieved using the Gmail API, which allows the system to regularly check for new emails.

I built a special class GmailToolsClass to handle everything related to Gmail. You can check out the full code in my Github. For now, let's focus on how it fetches new emails:

class GmailToolsClass:
    def __init__(self):
        self.service = self._get_gmail_service()

    def fetch_unanswered_emails(self, max_results=50):
        try:
            # 1. Get recent emails
            recent_emails = self.fetch_recent_emails(max_results)
            if not recent_emails: return []

            # 2. Get draft replies to filter out already answered threads
            drafts = self.fetch_draft_replies()
            threads_with_drafts = {draft['threadId'] for draft in drafts}

            # 3. Process each new email
            seen_threads = set()
            unanswered_emails = []
            for email in recent_emails:
                thread_id = email['threadId']
                # Skip if thread was seen before or has a draft reply
                if thread_id not in seen_threads and thread_id not in threads_with_drafts:
                    seen_threads.add(thread_id)
                    email_info = self._get_email_info(email['id'])
                    # Skip if email is sent from the agency (not received)
                    if self._should_skip_email(email_info):
                        continue
                    unanswered_emails.append(email_info)
            return unanswered_emails

        except Exception as e:
            print(f"An error occurred: {e}")
            return []

    def fetch_recent_emails(self, max_results=50):
        """Fetches emails from the last 8 hours."""
        try:
            # Query emails from the last 8 hours
            now = datetime.now()
            delay = now - timedelta(hours=8)
            after_timestamp = int(delay.timestamp())
            before_timestamp = int(now.timestamp())
            query = f"after:{after_timestamp} before:{before_timestamp}"

            results = self.service.users().messages().list(
                userId="me", q=query, maxResults=max_results
            ).execute()
            messages = results.get("messages", [])
            return messages

        except Exception as error:
            print(f"An error occurred while fetching emails: {error}")
            return []

    # ... other methods ...

    def _get_email_info(self, msg_id):
        message = self.service.users().messages().get(
            userId="me", id=msg_id, format="full"
        ).execute()

        payload = message.get('payload', {})
        headers = {header["name"].lower(): header["value"] for header in payload.get("headers", [])}

        return {
            "id": msg_id,
            "threadId": message.get("threadId"),
            "messageId": headers.get("message-id"),
            "references": headers.get("references", ""),
            "sender": headers.get("from", "Unknown"),
            "subject": headers.get("subject", "No Subject"),
            "body": self._get_email_body(payload),
        }

Ok what’s going on inside fetch_unanswered_emails ?

We use fetch_recent_emails to retrieve emails from the past 8 hours.
Draft replies are checked to avoid loading already-answered emails.
We loop over all new emails to gather key details (sender, subject, body), making sure to skip any emails sent by the agency itself.

2. Email Categorization

Once new emails are detected, an AI agent categorize them into four predefined categories: "Customer Complaint", "Product Inquiry", "Customer Feedback" or "Unrelated".

Below are the instructions used for the AI categorization agent:

CATEGORIZE_EMAIL_PROMPT = """
# **Role:**

You are a highly skilled customer support specialist working for a SaaS company specializing in AI agent design. Your expertise lies in understanding customer intent and meticulously categorizing emails to ensure they are handled efficiently.

# **Instructions:**

1. Review the provided email content thoroughly.
2. Use the following rules to assign the correct category:
   - **product_enquiry**: When the email seeks information about a product feature, benefit, service, or pricing.
   - **customer_complaint**: When the email communicates dissatisfaction or a complaint.
   - **customer_feedback**: When the email provides feedback or suggestions regarding a product or service.
   - **unrelated**: When the email content does not match any of the above categories.

# **EMAIL CONTENT:**
{email}

# **Notes:**

* Base your categorization strictly on the email content provided; avoid making assumptions or overgeneralizing.
"""

Then we call this agent in the categorize_email node of our graph:

def categorize_email(self, state: GraphState) -> GraphState:
    """Categorizes the current email using the categorize_email agent."""
    # Get the last email
    current_email = state["emails"][-1]
    result = self.agents.categorize_email.invoke({"email": current_email.body})
    print(f"Email category: {result.category.value}")

    return {
        "email_category": result.category.value,
        "current_email": current_email
    }

3. Response Generation

The system generates responses differently depending on the email's category:

a. Complaints and Feedback:

For emails categorized as complaints or feedback, the AI writer agent is directly tasked with crafting a personalized response. The writer agent is instructed to be empathetic, professional, and to ensure its responses align with the agency's brand voice:

EMAIL_WRITER_PROMPT = """
# **Role:**  

You are a professional email writer working as part of the customer support team at a SaaS company specializing in AI agent development. Your role is to draft thoughtful and friendly emails that effectively address customer queries based on the given category and relevant information.  

# **Tasks:**  

1. Use the provided email category, subject, content, and additional information to craft a professional and helpful response.  
2. Ensure the tone matches the email category, showing empathy, professionalism, and clarity.  
3. Write the email in a structured, polite, and engaging manner that addresses the customer’s needs.  

...

Write the email in the following format:  
   Dear [Customer Name],  

   [Email body responding to the query, based on the category and information provided.]  

   Best regards,  
   The Agentia Team    
   - Replace `[Customer Name]` with “Customer” if no name is provided.  
   - Ensure the email is friendly, concise, and matches the tone of the category.  

3. If a feedback is provided, use it to improve the email while ensuring it still aligns with the predefined guidelines.  

# **Notes:**  

* Return only the final email without any additional explanation or preamble.  
* Always maintain a professional and empathetic tone that aligns with the context of the email.  
* If the information provided is insufficient, politely request additional details from the customer.  
* Make sure to follow any feedback provided when crafting the email.  
"""

Here's how the write_draft_email function uses this prompt to create a draft:

def write_draft_email(self, state: GraphState) -> GraphState:
    """Writes a draft email based on the current email and retrieved information."""
    # Get messages history for current email
    writer_messages = state.get('writer_messages', [])

    # Write email
    draft_result = self.agents.email_writer.invoke({
        "email_category": state["email_category"],
        "email_content": state["current_email"].body,
        "retrieved_documents": state["retrieved_documents"], # Empty for feedback or complaint
        "history": writer_messages
    })
    email = draft_result.email
    trials = state.get('trials', 0) + 1

    # Append writer's draft to the message list
    writer_messages.append(f"**Draft {trials}:**\n{email}")

    return {
        "generated_email": email,
        "trials": trials,
        "writer_messages": writer_messages
    }

We keep track of the writer agent's message history (writer_messages) and the number of drafting attempts (trials) for reasons that will become clear in the next section.

b. Product Inquiries:

For emails categorized as product inquiries, the system will use RAG to provide accurate and detailed answers.

1. Query Construction: The first step is to transform the customer's email into specific queries that can be used to search the internal knowledge base. This is done using the below prompt:

GENERATE_RAG_QUERIES_PROMPT = """
# **Role:**

You are an expert at analyzing customer emails to extract their intent and construct the most relevant queries for internal knowledge sources.

# **Context:**

You will be given the text of an email from a customer. This email represents their specific query or concern. Your goal is to interpret their request and generate precise questions that capture the essence of their inquiry.

# **Instructions:**

1. Carefully read and analyze the email content provided.
2. Identify the main intent or problem expressed in the email.
3. Construct up to three concise, relevant questions that best represent the customer’s intent or information needs.
4. Include only relevant questions. Do not exceed three questions.
5. If a single question suffices, provide only that.

# **EMAIL CONTENT:**
{email}

# **Notes:**

* Focus exclusively on the email content to generate the questions; do not include unrelated or speculative information.
* Ensure the questions are specific and actionable for retrieving the most relevant answer.
* Use clear and professional language in your queries.
"""

The node construct_rag_queries calls this AI agent to generate queries:

def construct_rag_queries(self, state: GraphState) -> GraphState:
    """Constructs RAG queries based on the email content."""
    email_content = state["current_email"].body
    query_result = self.agents.design_rag_queries.invoke({"email": email_content})

    return {"rag_queries": query_result.queries}

2. Information Retrieval and Response Generation:

Next, RAG uses these queries to search through the knowledge base. The retrieved information serves as context for a specialized QA AI agent, which synthesizes accurate responses to each query:

GENERATE_RAG_ANSWER_PROMPT = """
# **Role:**

You are a highly knowledgeable and helpful assistant specializing in question-answering tasks.

# **Context:**

You will be provided with pieces of retrieved context relevant to the user's question. This context is your sole source of information for answering.

# **Instructions:**

1. Carefully read the question and the provided context.
2. Analyze the context to identify relevant information that directly addresses the question.
3. Formulate a clear and precise response based only on the context. Do not infer or assume information that is not explicitly stated.
4. If the context does not contain sufficient information to answer the question, respond with: "I don't know."
5. Use simple, professional language that is easy for users to understand.

# **Question:**
{question}

# **Context:**
{context}

# **Notes:**

* Stay within the boundaries of the provided context; avoid introducing external information.
* If multiple pieces of context are relevant, synthesize them into a cohesive and accurate response.
* Prioritize user clarity and ensure your answers directly address the question without unnecessary elaboration.
"""

The retrieve_from_rag node retrieves information for each query and assembles the answers:

def retrieve_from_rag(self, state: GraphState) -> GraphState:
    """Retrieves information from internal knowledge based on RAG questions."""
    final_answer = ""
    for query in state["rag_queries"]:
        rag_result = self.agents.generate_rag_answer.invoke(query)
        final_answer += query + "\n" + rag_result + "\n\n"

    return {"retrieved_documents": final_answer}

All queries with their respective answers are saved in retrieved_documents which is then used by the AI writer agent as context to craft the final email response, ensuring it is both comprehensive and accurate.

4. Quality Assurance: Making Sure Emails are Perfect

Before any email is sent, it undergoes a thorough quality check by an AI proofreader agent. The agent reviews each email for proper formatting, relevance to customer inquiries, and a professional, brand-consistent tone.

The AI proofreader agent is guided by the below prompt that outlines its responsibilities:

EMAIL_PROOFREADER_PROMPT = """
# **Role:**

You are an expert email proofreader working for the customer support team at a SaaS company specializing in AI agent development. Your role is to analyze and assess replies generated by the writer agent to ensure they accurately address the customer's inquiry, adhere to the company's tone and writing standards, and meet professional quality expectations.

# **Context:**

You are provided with the **initial email** content written by the customer and the **generated email** crafted by the our writer agent.

# **Instructions:**

1. Analyze the generated email for:
   - **Accuracy**: Does it appropriately address the customer’s inquiry based on the initial email and information provided?
   - **Tone and Style**: Does it align with the company’s tone, standards, and writing style?
   - **Quality**: Is it clear, concise, and professional?
2. Determine if the email is:
   - **Sendable**: The email meets all criteria and is ready to be sent.
   - **Not Sendable**: The email contains significant issues requiring a rewrite.
3. Only judge the email as "not sendable" (`send: false`) if lacks information or inversely contains irrelevant ones that would negatively impact customer satisfaction or professionalism.
4. Provide actionable and clear feedback for the writer agent if the email is deemed "not sendable."

---

# **INITIAL EMAIL:**
{initial_email}

# **GENERATED REPLY:**
{generated_email}

---

# **Notes:**

* Be objective and fair in your assessment. Only reject the email if necessary.
* Ensure feedback is clear, concise, and actionable.
"""

The verify_generated_email node calls the proofreader agent to check the email:

def verify_generated_email(self, state: GraphState) -> GraphState:
    """Verifies the generated email using the proofreader agent."""
    print(Fore.YELLOW + "Verifying generated email...\n" + Style.RESET_ALL)
    review = self.agents.email_proofreader.invoke({
        "initial_email": state["current_email"].body,
        "generated_email": state["generated_email"],
    })

    # Store feedback if email is not good
    if not review.send:
        writer_messages = state.get('writer_messages', [])
        writer_messages.append(f"**Proofreader Feedback:**\n{review.feedback}")

    return {
        "sendable": review.send,
        "writer_messages": writer_messages
    }

Here's how it works: The function provides both the customer's original email and the drafted response to the AI proofreader agent, which either approves the email (sendable: True) or rejects it (sendable: False) with feedback. If it's rejected, the feedback is saved to writer_messages list so the AI writer agent can use it to improve the response.

To prevent the system from getting stuck in an infinite loop (which can happen if the necessary information to answer the email is not present in the knowledge base), we only give it three tries (trials = 3) to get it right. If it still can't produce a good email after three attempts, we flag it for a human to review. This makes sure that every email sent out is high quality.

5. Email Dispatch

Finally, once the email has passed all quality checks, it's ready to be sent. The system uses the Gmail API to automatically send the finalized reply to the customer.

There are two options available, the first is to create a draft email for human review:

def create_draft_response(self, state: GraphState) -> GraphState:
    """Creates a draft response in Gmail."""
    print(Fore.YELLOW + "Creating draft email...\n" + Style.RESET_ALL)
    self.gmail_tools.create_draft_reply(state["current_email"], state["generated_email"])

    return {"retrieved_documents": "", "trials": 0}

The second is to send the email directly:

def send_email_response(self, state: GraphState) -> GraphState:
    """Sends the email response directly using Gmail."""
    print(Fore.YELLOW + "Sending email...\n" + Style.RESET_ALL)
    self.gmail_tools.send_reply(state["current_email"], state["generated_email"])

    return {"retrieved_documents": "", "trials": 0}

What happens next?

After sending the email or creating a draft, the system loops back to the beginning to check for any new emails. This keeps the process going, ensuring all customer emails are handled quickly and efficiently, until the inbox is empty.

Benefits and Impact

Improved Customer Experience: Provides quick, tailored responses that make customers feel valued.
Efficiency Boost: Dramatically reduces the time spent on email management.
Team Productivity: Frees up support teams to focus on complex tasks and strategic initiatives.
Accuracy Guaranteed: RAG ensures every response is backed by reliable information.

Real-World Applications

This email automation system offers versatile solutions for various industries:

Marketing Agencies: Handles client emails, automates reporting, and answers questions quickly, letting teams focus on creative work.
E-commerce: Answers product questions, helps with order updates, and collects feedback to improve customer experiences.
Tech Companies: Responds to common technical issues, provides customer support, and organizes feedback from beta testers.
Education: Helps with student inquiries, guides them through enrollment, and supports administrative tasks.

The system is flexible and works well for any business that wants to handle emails more effectively and improve communication with customers.

Conclusion

This AI email automation system, built with LangChain, LangGraph, and RAG, helps businesses improve customer support by:

Responding to emails faster.
Improving customer satisfaction.
Making operations more efficient.

🎯 Want to try it out? Check out the full project on my GitHub for the code, setup instructions, and tips to customize it for your own business.

Have questions or ideas? 💬 Let me know in the comments!

AI Agents + LangGraph: The Winning Formula for Sales Outreach Automation

Aimen Kerrour — Sat, 04 Jan 2025 12:55:53 +0000

Sales success starts with great lead research—but let’s face it, the process can feel like an uphill battle. Sifting through endless tabs, chasing down decision-makers, and piecing together insights from company websites can quickly turn into a time-consuming grind. And after all that effort, many outreach attempts still miss the mark, lacking the personal touch needed to stand out and make a real connection.

In a sales world that moves at lightning speed, efficiency isn’t just a nice-to-have—it’s a must-have. By automating the most tedious tasks, sales teams can reclaim their time and focus on what truly matters: building genuine relationships and closing more deals.

In this technical guide, we’ll explore how I build an AI outreach automation for a marketing agency that:

Gathers insights from LinkedIn, websites, and social media,
Automatically evaluates and qualifies leads, and
Generates tailored outreach materials that help you stand out in crowded inboxes.

Let’s get started!

Why Sales Outreach Needs AI Automation?

Sales outreach is crucial for business success, yet many teams face challenges due to outdated methods. Here are key reasons why automation is essential:

Time-consuming research limits sales professionals to manual data gathering, reducing time for actual lead engagement.
Inconsistent qualification occurs when processes rely on incomplete data or subjective decisions.
Lack of personalization makes generic outreach efforts easy to ignore, diminishing client engagement.
Scalability issues arise as manual processes struggle to keep up with growing outreach demands.
Missed opportunities happen when leads slip through the cracks due to delayed follow-ups or incomplete insights.

Simplifying Lead Outreach with AI Automation

To tackle the complexities of lead research, qualification, and outreach for an AI marketing agency (ElevateAI), I created a fully automated system that use multiple AI agents to accelerate and streamline the entire process. Here’s a high-level breakdown:

High-Level Overview

CRM Integration: Connects with platforms like HubSpot, Airtable, and Google Sheets to streamline lead data access and synchronization.
Lead Research: Gathers insights on leads and their company, including:
- LinkedIn data (profile, roles, company size, industry).
- Company website, blogs, and social media activities.
- Recent company news for relevant outreach.
- Identification of challenges and opportunities.
Analysis Reports: Create detailed analysis reports using AI from all gathered information, helping sales teams better understand the lead and make informed decisions.
Lead Qualification: Scores leads based on factors like online presence, industry fit, and growth potential.
Personalized Outreach:
- Custom Outreach Report: Generate a company audit report that outlines the challenges identified, how your services can address them, and highlights relevant case studies.
- Craft a personalized email using all collected insights, including a link to the custom outreach report, to engage the lead effectively.
- Generate custom interview scripts to facilitate productive calls and meetings with the lead.
Data Management: All generated research reports are saved to Google Drive and CRM is updated with the collected data for easy access.

Key Benefits

Automated Lead Research: Gathers insights from LinkedIn, websites, social media, and more, ensuring thorough lead evaluation.
Higher Reply Rates & Conversions: Attach detailed audit reports to outreach emails, increasing the likelihood of positive responses.
Time-Saving & Improved Efficiency: Automates research and report generation, freeing up time and streamlining team efforts.

Technical Deep Dive

To build this automation, I used Langgraph and LangChain. Langgraph facilitates the development of complex workflows, while LangChain enables integration with different LLMs (Google Gemini, GPT-4o, and LLaMA 3). We use the power of LLMs to interpret insights, generate analytical reports, craft personalized outreach materials, and qualify leads.

Below is a glimpse of the automation graph structure developed using Langgraph:

# Entry point of the graph
graph.set_entry_point("get_new_leads")
graph.add_edge("fetch_linkedin_profile_data", "review_company_website")
graph.add_edge("review_company_website", "collect_company_information")
graph.add_edge("collect_company_information", "analyze_blog_content")
graph.add_edge("collect_company_information", "analyze_social_media_content")
graph.add_edge("collect_company_information", "analyze_recent_news")
graph.add_edge("analyze_recent_news", "generate_digital_presence_report")
graph.add_edge("generate_digital_presence_report", "generate_full_lead_research_report")
graph.add_edge("generate_full_lead_research_report", "score_lead")
graph.add_conditional_edges(
    "score_lead",
    nodes.check_if_qualified,
    {
        "qualified": "generate_custom_outreach_report",  # Proceed if lead is qualified
        "not qualified": "save_reports_to_google_docs"  # Save reports and exit if lead is unqualified 
    }
)
# Outreach material creation
graph.add_edge("generate_custom_outreach_report", "create_outreach_materials")
graph.add_edge("create_outreach_materials", "generate_personalized_email")
graph.add_edge("create_outreach_materials", "generate_interview_script")
graph.add_edge("generate_personalized_email", "save_reports_to_google_docs")
# Save reports and update the CRM
graph.add_edge("save_reports_to_google_docs", "update_CRM")

Now, let’s break down each component of this automation in detail:

Fetching Leads from CRM

The process begins with extracting lead information (e.g., name, email, phone number) from CRM platforms such as Airtable, HubSpot, or Google Sheets. The system connects to these platforms using a standardized class structure, ensuring seamless integration and making it easy to extend support for new CRMs without disrupting the automation.

Here’s an example of how it connects to Airtable:

class AirtableLeadLoader(LeadLoaderBase):
    def __init__(self, access_token, base_id, table_name):
        # Use the access_token instead of api_key
        self.table = Table(access_token, base_id, table_name)

    def fetch_records(self, lead_ids=None, status_filter="NEW"):
        """
        Fetches leads from Airtable. If lead IDs are provided, fetch those specific records.
        Otherwise, fetch leads matching the given status.
        """
        if lead_ids:
            leads = []
            for lead_id in lead_ids:
                record = self.table.get(lead_id)
                if record:
                    # Merge id and fields into a single dictionary
                    lead = {"id": record["id"], **record.get("fields", {})}
                    leads.append(lead)
            return leads
        else:
            # Fetch leads by status filter (based on "Status" field)
            # You can choose your own field for filter with different naming
            records = self.table.all(formula=match({"Status": status_filter}))
            return [
                {"id": record["id"], **record.get("fields", {})}
                for record in records
            ]

    def update_record(self, lead_id, updates: dict):
        """Updates a record in Airtable"""
        # Fetch the current record to ensure it exists and get its fields
        record = self.table.get(lead_id)
        if not record:
            raise ValueError(f"Record with ID {lead_id} not found.")

        # Merge current fields with updates, adding any new fields
        current_fields = record.get("fields", {})
        updated_fields = {**current_fields, **updates}

        # Update the record in Airtable
        return self.table.update(lead_id, updated_fields)

In this example, the fetch_records function retrieves new leads based on their status, while the update_record function saves any insights gathered during the outreach process directly into the CRM.

Enriching Lead Profiles via LinkedIn

Once the basic lead information is retrieved, the system enhances their profiles by gathering insights from LinkedIn. This involves finding and scraping LinkedIn profiles for both the lead and their company.

Here’s how the system retrieves company details:

def research_lead_company(linkedin_url):
    # Scrape company LinkedIn profile
    company_page_content = scrape_linkedin(linkedin_url, True)
    if "data" not in company_page_content:
        return "LinkedIn profile not found"

    # Structure collected information about company
    company_profile = company_page_content["data"]
    return {
        "company_name": company_profile.get('company_name', ''),
        "description": company_profile.get('description', ''),
        "year_founded": company_profile.get('year_founded', ''),
        "industries": company_profile.get('industries', []),
        "specialties": company_profile.get('specialties', ''),
        "employee_count": company_profile.get('employee_count', ''),
        "social_metrics": {
            "follower_count": company_profile.get('follower_count', 0)
        },
        "locations": company_profile.get('locations', [])
    }

What’s happening here? The function scrapes the company’s LinkedIn profile to extracts key details such as industry, company size, and description.

A similar approach is used to gather details about the lead’s professional background, including their roles, skills, and experience.

Scraping the Company Website

The automation goes a step further by scraping the company’s website to gather deeper insights. This is done using the following function, which scrapes a URL and converts its HTML content into markdown, making it more readable for LLMs:

def scrape_website_to_markdown(url: str) -> str:
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate"
    }

    # Make the HTTP request
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch the URL. Status code: {response.status_code}")

    # Parse the HTML
    soup = BeautifulSoup(response.text, "html.parser")
    html_content = soup.prettify() 

    # Convert HTML to markdown
    h = html2text.HTML2Text()
    h.ignore_links = False
    h.ignore_images = True
    h.ignore_tables = True
    markdown_content = h.handle(html_content)

    # Clean up excess newlines
    markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)
    markdown_content = markdown_content.strip()

    return markdown_content

We then use AI to summarize the website content, focusing on key information such as the company's mission, products, and services. Additionally, it extracts relevant links, including those to social media accounts and blogs.

WEBSITE_ANALYSIS_PROMPT = """
The provided webpage content is scraped from: {main_url}.

# Tasks

## 1- Summarize webpage content:
Write a 1500-word comprehensive summary in markdown format about the content of the webpage, 
focusing on relevant information related to the company mission, products, and services.

## 2- Extract and categorize the following links:
1. Blog URL: Extract the main blog URL of the company. 
2. Social Media Links: Extract links to the company's YouTube, Twitter, and Facebook profiles.

# IMPORTANT:
* Ensure the summary is organized in markdown format.
* Ensure that only the specified categories of links are included. 
* If a link is not found, its value is an empty string.
* If the link is relative (e.g., "/blog"), prepend it with {main_url} to form an absolute URL.
"""

If the company operates a blog, we also scrape it and use an AI agent to extract relevant insights, such as:

Total Number of Blog Posts: Measures the volume of published content.
Posting Frequency: Identifies patterns (e.g., consistent, irregular, or inactive).
Key Themes and Topics: Highlights recurring subjects and main themes.

Analyzing Social Media Metrics

To better understand the company’s social media presence, we examine its activity on popular platforms such as Facebook, Twitter, YouTube, and TikTok. This involves scraping key metrics—posts, videos, followers, and more—followed by AI analysis to generate detailed reports on content themes, engagement levels, and actionable opportunities for improvement.

Example: Analyzing YouTube Performance

For YouTube, the following function is used to collect key statistics from a company’s channel:

def get_youtube_stats(channel_url):
    """
    Collects statistics and metadata from a company's YouTube channel.
    """
    channel_name = extract_channel_name(channel_url)
    channel_id = get_channel_id_by_name(channel_name)
    result = get_channel_videos_stats(channel_id)

    return {
        "total_videos": result['total_videos'],
        "subscriber_count": result['subscriber_count'],
        "average_views": result['average_views'],
        "recent_videos": [
            {"title": video["title"], "published_at": video["published_at"]}
            for video in result["last_15_videos"]
        ]
    }

This function retrieves key details, including:

Total number of videos
Subscriber count
Average views per video
Details of recent videos

The gathered data is passed to an AI marketing agent instructed to generate a detailed analysis report:

YOUTUBE_ANALYSIS_PROMPT = """
# **Role:**
You are a Professional Marketing Analyst specializing in evaluating YouTube channel 
performance and identifying actionable insights to improve content strategies.

---

# **Task:**
Analyze the provided YouTube channel's content and generate a detailed performance report. 
This report will evaluate the channel's activity, relevance to the company’s services, 
and opportunities for improvement.

# **Specifics:**
Your report will include the following sections:

## **Channel Summary:**
* **Number of Videos:** Count of videos provided for analysis.  
* **Activity:** Describe the frequency of uploads (e.g., consistent, irregular, or inactive).  
* **Engagement:** Summarize key engagement metrics (e.g., average views, likes, and comments 
per video).  
* **Summary of Topics:** Highlight the main themes and subjects covered.  
* **Examples:** List five representative video titles and descriptions.  

## **Scoring:**
Evaluate the channel in these categories: **Volume of Videos**, **Upload Activity**, 
**Engagement Levels**, **Relevance to the Company’s Services**  
Combine these to calculate a total performance score.

## **Opportunities for Improvement:**
* Identify content gaps or underexplored topics.  
* Suggest new themes and innovative content formats (e.g., shorts, live streams, tutorials).  

## **Action Plan:**
Provide actionable recommendations to boost engagement, relevance, and activity.
"""

This analysis evaluates the channel’s activity, audience engagement, and alignment with the company’s goals. It also provides targeted strategies to enhance visibility, improve content relevance, and boost engagement

Recent News Review

In the final step of our research, we use the Google News API to fetch the latest news and announcements related to the company. This step provides valuable insights into the company’s recent initiatives, achievements, and potential challenges:

def get_recent_news(company: str) -> str:
    # Define the payload and headers for the request
    url = "https://google.serper.dev/news"
    payload = json.dumps({"q": company, "num": 20, "tbs": "qdr:y"})
    headers = {
        'X-API-KEY': os.getenv("SERPER_API_KEY"),
        'Content-Type': 'application/json'
    }

    # Make the POST request to the API
    response = requests.post(url, headers=headers, data=payload)

    # Check if the response is successful
    if response.status_code == 200:
        news = response.json().get("news", [])

        # Prepare the string to return
        news_string = ""
        news.reverse()  # Reverse the list to get the most recent news first

        for item in news:
            title = item.get('title')
            snippet = item.get('snippet')
            date = item.get('date')
            link = item.get('link')
            news_string += f"Title: {title}\nSnippet: {snippet}\nDate: {date}\nURL: {link}\n\n"

        return news_string
    else:
        return f"Error fetching news: {response.status_code}"

Reports Generation

Each step of the analysis culminates in a detailed, AI-generated report. These reports serve two primary purposes:

Empowering Sales Teams: By providing rich insights to facilitate more informed and personalized engagements.
Supporting Outreach Preparation: By offering a clear foundation for crafting tailored materials for outreach.

The reports include the following:

Lead Profile Report: Generated from LinkedIn and company website data, detailing the lead’s professional background and company information.
Blog Analysis Report: Insights extracted from blog content to highlight themes, posting frequency, and engagement.
Social Media Analysis Reports: Detailed evaluations for each social media platform, covering activity, engagement, and strategic opportunities.
Recent News Analysis: A summary of the latest news, emphasizing recent achievements and challenges.
Digital Presence Report: Combines insights from blogs, news articles, and social media to provide a holistic view of the company’s online presence.
Global Research Report: Consolidates findings from all reports into a comprehensive document for internal use.

All reports are saved locally and synchronized to Google Docs, ensuring easy access and collaboration among team members.

Lead Qualification

Once the research process is complete, the next step is to qualify the lead. This is also done using an AI agent, which analyzes the comprehensive reports generated previously and determines if the lead is qualified against predefined criteria.

Given that I built this system for an AI marketing agency, the lead qualification requirements are based on that. Here is a glimpse of the AI lead scoring prompt:

SCORE_LEAD_PROMPT = """
# **Role & Task**  
You are an expert lead scorer for **ElevateAI Marketing Solutions**, tasked with evaluating and scoring leads based on their digital presence, social media activity, industry fit, company scale, and marketing strategy.  

# **Scoring Criteria**  
- **Digital Presence:** Blog activity and website quality.  
- **Social Media Activity:** Platform presence, engagement rates, and posting frequency.  
- **Industry Fit:** Relevance to target industries and use of AI/automation.  
- **Company Scale:** Growth signals and alignment with ElevateAI's focus.  
- **Marketing Strategy:** Use of tools and consistency in messaging.  
- **Pain Points & ROI:** Identifiable challenges and potential ROI from ElevateAI solutions.  

# **Output**  
Provide a final score (1–10) based on these criteria, with no additional explanation.
"""

For this implementation I choose to consider that if a lead gets a score 7.0 or above is considered qualified, marking them as high-priority for our outreach efforts.

Preparing Outreach Materials

For qualified leads, the system crafts personalized outreach materials to maximize engagement:

Custom Outreach Report

For every qualified lead, we task an AI writer agent to generate a Custom Outreach Report—a dynamic, data-driven presentation showcasing how ElevateAI can address their unique challenges.

The outreach report includes:

Business Overview: A concise profile of the lead’s company, detailing their industry, core offerings, and competitive position.
Challenges Identified: A breakdown of specific pain points in their current marketing or business strategy, such as limited digital presence, low engagement rates, or operational inefficiencies.
AI-Powered Solutions: Customized recommendations that leverage ElevateAI’s capabilities, highlighting how our solutions align with their business goals.
Tangible Results: Real-world impact examples, drawn from our knowledge base using Retrieval-Augmented Generation (RAG). These examples feature case studies from similar businesses, showcasing measurable outcomes, such as improved ROI, increased customer retention, or enhanced operational efficiency.
Clear Call-to-Action (CTA): A compelling CTA, such as scheduling a free consultation or demo, encouraging leads to take the next step in the sales funnel.

Here’s a sample of a Custom Outreach Report created for Relevance AI for example:

You can also view the Full Report if you want!

Personalized Email & Interview Script

For each qualified lead, we use another AI email writer agent to craft a highly personalized email designed to address their specific needs and challenges. The email highlights the value ElevateAI can deliver, includes a link to the custom outreach report, and emphasizes measurable benefits such as increased ROI, enhanced customer engagement, or operational efficiencies.

To support sales teams, the system also generates an interview script rooted in SPIN selling principles. This script includes strategic, open-ended questions to guide discussions, uncover the lead’s pain points, and position ElevateAI’s solutions as the ideal fit for their business objectives.

Save to CRM

Once the system processes a lead, all generated data is saved back to the CRM, including report links, the lead's score, and their updated status.

Make It Your Own

While this AI outreach automation was specifically designed and configured for the needs of an AI marketing agency, it's fully customizable to fit any business. Here's the flexibility you get:

Your CRM: Integrates with whatever CRM you use.
Custom Reports: Choose which reports to generate (e.g., prioritize social media, skip blog analysis) and modify their structure to focus on the data you need.
Your Ideal Lead: Adjust lead scoring criteria to match your target audience.
Your Brand Voice: Tailor email templates and interview scripts to your own business.
Expand as Needed: Add data sources, AI models, or outreach channels.

It's a powerful, flexible automation ready to be customized for your unique outreach strategy.

Conclusion

Sales outreach doesn’t have to be exhausting. With AI automation and AI agents, you can move from generic emails and manual research to hyper-personalized, data-driven strategies.

You've just seen how to use LangGraph and LangChain to build a sales automation system that can transform your sales outreach by:

Saving time on manual research.
Boosting conversions with hyper-personalized strategies.
Streamlining workflows for smarter, more efficient processes.

🎯 Want to learn more? Check out the full project on my GitHub and take your sales game to the next level! 🚀

All You Need to Know About AI Agents: A Full Guide

Aimen Kerrour — Sat, 04 Jan 2025 12:37:19 +0000

Generative AI agents are a hot topic, with many claiming they'll revolutionize the world. But what exactly are they?

In this post, we'll demystify generative AI agents and explore real-world examples of their application. By the end, you'll understand:

What a generative AI agent is
How AI agents differ from AI automation
Keys to design effective AI agents
Equipping AI agents with tools

What is a Generative AI Agent?

Generative AI agents combine two key components:

Generative AI (Large Language Models or LLMs e.g. chatGPT, LLAMA3).
Traditional AI Agents (AI-driven decision-making).

Generative AI agents use large language models (LLMs) like GPT-4 or Llama3 as their "brain" to decide which actions to take or tools to use to achieve a specific goal.

AI agents are often confused with AI automation, but they're distinct concepts.

To differentiate between them, ask yourself:

Yes → AI agent
No → AI automation

Can the AI system learn from its interactions with the environment to make better decisions?

Let's compare a standard chatbot with an AI sales agent:

Standard Website Chatbot:

Responds to predefined questions
Provides scripted answers
Can't adapt beyond its programming

AI Sales Agent:

Accesses detailed product information
Uses a recommendation tool based on customer preferences
Handles payment processing
Manages meeting schedules

For example if a customer interacts with the AI sales agent looking for a new laptop. The agent can:

Asks about the customer's needs (e.g., for gaming, work, or casual use)
Uses its product database to find matching laptops
Recommends laptops based on the customer's budget and requirements
Answers questions about specs and features
Processes the payment when the customer decides to buy
Schedules a follow-up for setup assistance

Throughout this interaction, the AI agent adapts its responses based on the customer's feedback, making it more effective than a standard chatbot.

Now that we understand what are AI agents, let's see how we can build them.

How to Design Effective AI Agents

Designing effective AI agents involves several key considerations:

Identity
Narrow scope
Memory
Planning
Access to external tools

We’ll dive into each below.

1. Identity:

The role or persona you assign to an AI agent significantly influences the quality of its responses.

Consider these two prompts and their answers:

What is an LLM?
You are a sarcastic teenager explaining AI to your grandparents. What is an LLM?

In the second prompt, the brief identity assignment led ChatGPT to adopt the persona of a sarcastic teenager explaining AI to their grandparents, transforming its response in several ways:

Content: The information presented was more casual and humorous.
Writing Style: The tone became more conversational and relatable.
Humor: The response included sarcasm to entertain and engage.
Comprehension Level: The explanation was simplified and analogies were used.
Technical Detail: The answer was less formal and more accessible.

Understanding and carefully crafting the identity and context for each AI agent is essential.

Regularly experiment to determine the most effective identity for your specific requirements.

For instance, an AI agent designed as a customer service representative should be friendly and empathetic, while a technical support agent should channel their inner tech wizard, complete with geeky charm.

You wouldn’t want your tech support agent cracking jokes about why the computer crossed the road, would you?

2. Narrow Scope:

Research consistently shows that LLMs excel when given clear, specific tasks rather than broad, open-ended ones.

Overloading an agent with excessive information or context can reduce accuracy and increase the risk of generating false responses or hallucinations.

The secret is to maintain a sharply focused scope for each agent.

Give each agent a single, well-defined objective. Avoid creating a “jack of all trades” agent. Instead, aim for a master of one.

Rather than depending on one agent to tackle multiple complex tasks, build a team of specialized agents, each with a distinct area of expertise.

For example, in an AI-driven customer service system, you could organize your team as follows:

One agent handles initial query classification
Another agent retrieves relevant information from your knowledge base
A third agent crafts personalized responses based on the information gathered

This focused approach not only boosts performance of the agents but also increases output quality and decreases hallucinations.

3. Memory:

Memory is key to making AI agents effective in the real world.

Just like human memory, AI memory lets agents remember past actions and results, think about their performance, and use these insights to make better choices in the future.

Memory helps agents get better over time and adapt to new situations.

Short-term memory acts like a blank slate, starting fresh with each new task.

Long-term memory, stored usually in databases, keeps track of past experiences. After finishing a task, the agent reflects on its work and saves useful information for future use.

When faced with a new challenge, agents can use this stored knowledge to make smarter and more effective decisions.

4. Planning:

Not every task can be broken down into simple steps from the beginning. Some goals need a more flexible approach to handle unexpected issues and adapt to changing circumstances.

With strategic planning, agents can handle complex tasks by figuring out the necessary steps as they go.

For example, a travel booking agent planning a multi-city trip won't give up if a flight is full. The agent will look for other flights, consider different routes or airlines, and even check options like changing travel dates or nearby airports to meet the client's needs.

By letting agents assess their goals, review available options, and plan their actions, you improve their ability to manage complex situations and provide effective solutions.

In practice, adding the simple "Think step by step" statement to the agent will hugely improve its response.

5. Tools

LLMs are limited to the data they were trained on, which can restrict their ability to provide real time information, as many of us have seen with ChatGPT.

To address this, agents can be equipped with tools that allow them to interact with the external world.

These tools might include APIs, databases, and other services that help them search the internet, collect data, or perform specific actions.

Just as with defining an agent’s scope, it’s important not to overwhelm your agents with too many tools.

Provide each agent with only the essential tools needed to achieve its goal. Too many tools can be confusing, making it hard for the agent to know which one to use and leading to errors or mixed-up information.

In the next section, we’ll explore how to integrate these tools with your agents in more detail.

In summary, when creating an agent, consider the following:

Identity: The role you assign to an AI agent affects its response quality, so tailor the persona to match the task.
Narrow Scope: Focus each agent on a single task to improve accuracy and reduce errors, avoiding a “jack of all trades” approach.
Memory: Use memory to help agents remember past actions and learn from them, enhancing their future performance.
Planning: Allow agents to dynamically plan and adapt their approach to complex tasks, improving their flexibility and effectiveness.
Tools: Equip agents with essential tools for their tasks, but avoid overwhelming them with too many options to prevent confusion and errors.

How to Equip AI Agents with Tools

AI agents can use tools by leveraging a concept called function calling.

Function calling means giving an LLM access to external functions to interact with the real world.

It works like this:

The user asks the LLM, "What's the current weather in Barcelona?"
The LLM decides it needs weather data, so it calls a weather data tool.
The weather data tool fetches and returns the current weather information for Barcelona.
The LLM uses this weather information to provide an accurate response to the user.

As the name suggests, tools are normal programming functions being called by the LLM. They can be created in several ways:

Coding them from scratch, for example, using a programming language like Python.
Using builtin tools from established frameworks like Langchain or Llama Index. Below are some of the tools available in Langchain.

Alpha Vantage	Google Finance	OpenWeatherMap
ArXiv	Google Jobs	Passio NutritionAI
AWS Lambda	Google Places	PubMed
Azure Container Apps dynamic sessions	Google Scholar	Python REPL
Shell (bash)	Google Search	Reddit Search
Bearly Code Interpreter	Google Serper	Requests
Bing Search	Google Trends	Tavily Search
Dall-E Image Generator	HuggingFace Hub Tools	SearchApi
DataForSEO	SerpAPI	Semantic Scholar API Tool
DuckDuckGo Search	Ionic Shopping Tool	SQL Database
Eleven Labs Text2Speech	Exa Search	Twilio
File System	NVIDIA Riva: ASR and TTS	Wikipedia
Google Cloud Text-to-Speech	Oracle AI Vector Search: Generate Summary	Wolfram Alpha
Google Drive	Passio NutritionAI	Yahoo Finance News
Google Finance	Polygon Stock Market API Tools	You.com Search
Google Imagen	PubMed	YouTube

Using a no-code platform like Relevance AI, which comes with various built-in tools and simplified templates to develop custom ones.

When designing an AI agent, you must provide it with all the necessary tools to achieve its tasks.

For example, an AI agent tasked with sales prospecting should be equipped with:

Research tools: to enrich lead information by accessing LinkedIn or social media profiles.
Scraping tools: to collect additional details from the lead's company websites.
Emailing tools: to send a personalized email to the lead.
CRM tools: to record the lead in the database for future follow-up.

An important thing to keep in mind while building these tools is error handling.

When an agent calls a tool, it expects to get a correct response. But what if the tool throws an error or exception?

Should the agent stop?
Should the agent try again?

You don't want your entire app to crash because of that! So you need to consider the following:

Your tools must have robust error handling.
They must provide helpful error messages to your agent (who may decide on a new plan).
Enable your agent to perform multiple tries if the first call fails.

In summary, tools are crucial for your AI agent:

They allow him to interact with the real world and perform tasks effectively.
They must be well-designed and robust to ensure your agent can handle errors gracefully.
They should be tailored to the agent's specific tasks to maximize efficiency and effectiveness.

What's Next?

In this article, we explored what AI agents are and how to design them effectively. You also discovered the key differences between AI agents and AI automation and learned how to equip your AI agents with the right tools to boost their performance.

🚀 Ready to take your AI skills to the next level?
Check out my GitHub for hands-on projects, builds, and experiments in AI automation and AI agent development!