Dr. Agentic

Posted on Mar 22

RCS API Rate Limits: The Silent Blocker for AI Agent Deployments

#rcs #agents #mcp #rich

You've built a flawless RCS campaign for your AI agent. Brand assets approved. Agent logic refined. Your conversational AI is ready to handle customer service at scale.

Then your RCS API calls start returning 429s.

This is happening to more teams building agentic customer service than anyone admits publicly. And the documentation? Nowhere to be found.

Why AI Agents Burn Through RCS Limits Faster Than Campaigns

Unlike batch SMS campaigns, AI agents have fundamentally different traffic patterns:

Concurrent conversations -- One agent handling 50 simultaneous users can fire hundreds of API calls in seconds.

Rich media compounds the problem -- Carousel cards, images, AI-suggested replies each require multiple API calls. One rich card = 3-5 API calls instead of 1.

MCP servers stack on top -- If you're using Model Context Protocol servers (Infobip, Sinch, etc.) for your agent, they're making RCS API calls alongside your agent logic. These compound fast.

The Three Limit Types You're Fighting

Limit Type	What It Means for Agents
Per-second	Burst capacity for instant replies
Per-minute	Sustained throughput during peak
Per-day	Total daily quota for the brand account

Critical gotcha: Test agents face lower (often undocumented) limits than verified brand accounts. Many teams discover this only at launch.

The Fix: Design for Rate Limits BEFORE You Launch

Here's the architecture that works:

1. Message Queuing with Priority Lanes

// Separate queues by priority - don't treat all agent messages equally
const queues = {
  critical: [],   // User queries, order status, account info
  standard: [],  // General responses, notifications
  bulk: []       // Marketing, newsletters, non-urgent
};

const RATE_LIMIT_PER_SECOND = 10;

setInterval(() => {
  // Always drain critical first
  if (queues.critical.length > 0 && currentRate < RATE_LIMIT_PER_SECOND) {
    sendRCS(queues.critical.shift());
  } else if (queues.standard.length > 0 && currentRate < RATE_LIMIT_PER_SECOND) {
    sendRCS(queues.standard.shift());
  }
}, 1000 / RATE_LIMIT_PER_SECOND);

2. Exponential Backoff with Jitter

Simple retries bunch up and make things worse. Add jitter to spread retries:

async function sendWithBackoff(message, attempt = 0) {
  const baseDelay = 1000;
  const maxDelay = 30000;
  const jitter = Math.random() * 1000; // Spread retries

  const delay = Math.min(baseDelay * Math.pow(2, attempt) + jitter, maxDelay);

  try {
    await rcsClient.send(message);
  } catch (error) {
    if (error.code === 429 && attempt < 5) {
      await new Promise(r => setTimeout(r, delay));
      return sendWithBackoff(message, attempt + 1);
    }
    throw error;
  }
}

3. Pre-Launch Load Testing

The teams launching fastest? They're simulating agent conversations before going live:

Simulate 50 concurrent users, not just average volume
Test MCP server call volumes alongside RCS API limits
Validate under realistic peak loads with burst patterns

Carrier-by-Carrier Reality

Carrier	Agent Deployment Notes
Verizon	Strict burst limits -- prefer steady throughput
AT&T	Rich media triggers additional checks
T-Mobile	More lenient for test agents -- verify before production
Regional MVNOs	Often lower limits -- test thoroughly

What NOT to Do When You're Rate Limited

[NO] Create new agent accounts (compounds the problem)
[NO] Retry immediately (extends the block window)
[NO] Route through proxies (flags all accounts)
[YES] Wait, monitor, and implement proper queuing first

The Bottom Line

The old approach: "We'll figure out the limits after it works."

The new approach for AI agents: Design for limits, then scale.

As AI agents become the primary interface for customer service, RCS is the secure, branded channel of choice. But agentic reliability requires engineering every layer -- including rate limit strategy.

Testing AI agent RCS deployments without burning carrier limits? RCS X lets you simulate concurrent conversations, validate payloads, and stress-test rate limit handling before you go live.

DEV Community

RCS API Rate Limits: The Silent Blocker for AI Agent Deployments

Why AI Agents Burn Through RCS Limits Faster Than Campaigns

The Three Limit Types You're Fighting

The Fix: Design for Rate Limits BEFORE You Launch

1. Message Queuing with Priority Lanes

2. Exponential Backoff with Jitter

3. Pre-Launch Load Testing

Carrier-by-Carrier Reality

What NOT to Do When You're Rate Limited

The Bottom Line

Top comments (0)