ClevAgent

Posted on Apr 4

How to Monitor CrewAI Agents in Production

#ai #programming #productivity #devops

If you're running CrewAI crews in production, you've probably hit this: your cron job exits with code 0, but the crew didn't actually finish its work. The researcher agent got stuck retrying a rate-limited API, the analyst never received input, and nobody noticed until Friday.

Multi-agent orchestration frameworks like CrewAI fail differently from traditional services. A crew can fail without crashing. Here's how to catch those failures with heartbeat monitoring — in about 3 lines of code.

Why CrewAI crews need dedicated monitoring

CrewAI orchestrates multiple agents that call LLMs, use tools, and pass context to each other. Each agent is a potential failure point:

Agent hangs: One agent waits indefinitely for an LLM response. The crew stalls, but the process stays alive.
Infinite loops: An agent retries a failed tool call endlessly. Your token meter spins, but no useful output appears.
Silent quality degradation: The LLM returns garbage, the next agent processes it anyway, and the final output is subtly wrong. No error thrown.
Cost spikes: A single crew run normally costs $0.15. One bad run costs $12 because an agent kept rephrasing the same request.

Traditional process monitoring (systemd, Docker health checks) only tells you the process is alive. It tells you nothing about whether the crew is making progress.

Try it now — monitor your CrewAI agent in 2 lines:

pip install clevagent

import clevagent
clevagent.init(api_key="YOUR_KEY", agent="my-crew")

Free for 3 agents. No credit card required. Get your API key →

Add ClevAgent to your CrewAI crew in 3 lines

ClevAgent monitors your crew at the agent level — heartbeats, loop detection, and per-run cost tracking. Setup takes about 30 seconds.

Step 1: Install

pip install clevagent

Step 2: Initialize before kickoff

import os
import clevagent

clevagent.init(
    api_key=os.environ["CLEVAGENT_API_KEY"],
    agent="my-research-crew",
)

That's it. ClevAgent starts sending heartbeats automatically. If your crew hangs or the process dies, you get alerted within 120 seconds.

Step 3 (optional): Add a step callback for per-agent tracking

CrewAI supports a step_callback on each agent. Wire it to ClevAgent to get visibility into each agent's work:

def track_step(step_output):
    clevagent.ping(
        status="step_complete",
        meta={
            "agent": step_output.agent,
            "output_length": len(str(step_output.output)),
        },
    )

Pass this callback when defining your agents:

researcher = Agent(
    role="Research Analyst",
    goal="Find the latest market data",
    backstory="You are a senior research analyst...",
    llm=llm,
    step_callback=track_step,
)

Now every agent step shows up on your dashboard with timing and metadata.

Complete example: 2-agent crew with monitoring

Here's a full working example — a research crew with two agents, monitored by ClevAgent:

import os
from crewai import Agent, Task, Crew, Process
import clevagent

# Initialize monitoring
clevagent.init(
    api_key=os.environ["CLEVAGENT_API_KEY"],
    agent="daily-research-crew",
)

def track_step(step_output):
    clevagent.ping(
        status="step_complete",
        meta={
            "agent": step_output.agent,
            "output_length": len(str(step_output.output)),
        },
    )

# Define agents
researcher = Agent(
    role="Research Analyst",
    goal="Find the 3 most important tech news stories today",
    backstory="You are a senior research analyst who reads dozens of sources daily.",
    verbose=True,
    step_callback=track_step,
)

writer = Agent(
    role="Report Writer",
    goal="Write a concise morning briefing from the research",
    backstory="You are a technical writer who distills complex topics into clear summaries.",
    verbose=True,
    step_callback=track_step,
)

# Define tasks
research_task = Task(
    description="Search for today's top 3 tech news stories. Include source URLs.",
    expected_output="A list of 3 news items with title, summary, and source URL.",
    agent=researcher,
)

writing_task = Task(
    description="Write a 200-word morning briefing based on the research.",
    expected_output="A formatted briefing email ready to send.",
    agent=writer,
)

# Assemble and run
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()

# Report completion with output metadata
clevagent.ping(
    status="crew_complete",
    meta={
        "output_length": len(str(result)),
        "agents_used": 2,
    },
)

print(result)

The entire monitoring integration is 8 lines — the init(), the track_step callback, and the final ping(). Your existing CrewAI code stays exactly the same.

What ClevAgent catches

Once connected, ClevAgent watches for three categories of problems:

Crew hangs

If no heartbeat arrives for 120 seconds, ClevAgent sends an alert to Telegram or Slack. This catches the most common CrewAI failure: an agent waiting on an LLM call that never returns. Your cron job sees a running process. ClevAgent sees a silent agent.

Agent loops

ClevAgent tracks the frequency and pattern of ping() calls. If an agent sends 50 step completions in 30 seconds with identical metadata, that's a loop. You get a warning before the token bill becomes a problem.

Token cost spikes

Every ping() with metadata feeds into per-run cost estimates. ClevAgent compares the current run against your historical average. A run that's 5x the normal cost triggers a warning. You can set a hard budget ceiling per agent in the dashboard — if exceeded, ClevAgent sends an immediate alert.

Use clevagent.ping() for work-progress tracking

Beyond failure detection, ping() is useful for tracking that your crew is actually doing its job:

result = crew.kickoff()

clevagent.ping(
    status="crew_complete",
    meta={
        "report_date": today,
        "stories_found": len(stories),
        "word_count": len(result.split()),
    },
)

On the ClevAgent dashboard, this creates a timeline of crew runs. You can see at a glance:

Did today's 6 AM run actually complete?
How many stories did it find compared to yesterday?
Is the output length consistent, or did something degrade?

This is the difference between "the process ran" and "the crew did useful work." Process monitoring gives you the first. Ping metadata gives you the second.

DEV Community