DEV Community

Cover image for How I scaled a TikTok Ads automation SaaS to 7M requests/day as a solo dev
Davi de Oliveira Vieira
Davi de Oliveira Vieira

Posted on

How I scaled a TikTok Ads automation SaaS to 7M requests/day as a solo dev

A friend in the tech group asked me how I handled observability and security. I answered, the conversation drifted to stack choice, then to why not Kubernetes, then to when it's worth breaking up the monolith. He suggested I write about it. So here it goes.

The context

I run a SaaS that automates TikTok Ads campaign management for agencies. Each customer has N Business Centers → N Advertiser Accounts → N Campaigns → N Ad Groups → N Ads.

Today's numbers:

  • R$ 87k in MRR
  • R$ 1M in ARR
  • More than 7 million requests per day against the TikTok API
  • 600+ endpoints mapped
  • Everything in a Laravel 12 monolith

I'm a solo dev.

Rule number 1: less is more

Every choice I'm about to describe comes from a single constraint: I don't have a team. That changes everything. One more tool is one more tool to configure, monitor, update, and debug at 3 a.m. So my criteria are always:

  • Does the tool solve a real problem I have today?
  • Does it add more complexity than the pain it removes?
  • If it breaks tomorrow, can I fix it on my own?

If the answer to any of those is "no", it stays out.

The challenges of building on top of the TikTok API

Before talking about architecture, I need to explain the environment. Integrating with the TikTok API has quirks that shape every decision that comes after:

1. API updates only ship on Fridays

  • TikTok rolls out contract changes, new endpoints, or fixes typically on Friday
  • If they break something on deploy, the damage happens over the weekend, when their support is at its slowest
  • Friday afternoon is for crossing fingers, Monday morning is for putting out fires
  • I can't plan my own release for Friday — I have to leave room to react

2. Reaching TikTok is hard

  • Official support has an SLA measured in days, not hours
  • Common ticket reply: "try again later" or "contact your account manager"
  • Without a dedicated Partner Manager, you're stuck with the public forum and channel
  • A bug they confirm sometimes takes weeks to turn into a fix in production
  • More often than not, the solution is finding a workaround before support replies

3. Incomplete and inconsistent documentation

  • The official docs cover the "happy path" — edge cases you discover in practice
  • An endpoint returns a different structure than what's documented, and changes without a changelog
  • Per-endpoint rate limits aren't always published — you find out when you hit a 429
  • Status codes that show up in production don't appear in the docs
  • Field naming varies between endpoints in the same family

4. Sandbox ≠ production

  • Different behavior between sandbox and prod
  • Sandbox accepts, prod rejects — or vice versa
  • A field is required in one, optional in the other
  • You test, approve, deploy, and only then discover the real contract was something else
  • The result: a good chunk of final testing has to run on a real account, with a real customer

5. OAuth without a refresh token

  • The access token is "long-term" (no declared expiration)
  • There's no refresh flow — if the token gets revoked, the customer has to reauthenticate manually
  • Storing it securely becomes your problem (I encrypt it with Laravel Crypt)
  • Losing a token means losing access to a customer's account until they reconnect

6. 600+ endpoints mapped

  • Campaign, ad group, ad, creative, pixel, report, appeal, business center, partner — each one with its own set of endpoints
  • Each endpoint has its own quirks (different pagination, different naming, different rate limit bucket)
  • Keeping it all consistent in code requires a strong abstraction layer and internal versioning

7. Rate limit per bucket, not per application

  • TikTok separates rate limits by endpoint family — the "reports" quota is different from "campaigns", which is different from "creatives"
  • Counting total requests isn't enough; I have to count per bucket
  • I implemented this with Redis Lua (sliding window per bucket) because a simple counter doesn't cut it

These constraints shape everything:

  • That's why I have 3 layers of rate limit (pre-flight in middleware, sliding window in Redis, real 429 backpressure)
  • That's why idempotency is mandatory in every job — if a creation fails on retry, I don't want to end up with two campaigns
  • That's why I have detailed request/response logging with the API — when something blows up and I need to open a ticket, I already have evidence
  • That's why I have an error translation layer (TikTokErrorTranslator) — their raw codes aren't fit to show the end customer
  • That's why the OAuth token is encrypted in the database — if it leaks, you lose access to a production customer

Strip these constraints away and a lot of the architecture would look like overkill. With them, it's the bare minimum.

Anatomy of an external integration job

Every operation that talks to the external API runs in a job. Never synchronous in the user's HTTP request. The pattern is generic and applies to any critical integration:

class SyncExternalResourceJob implements ShouldQueue
{
    use Queueable, ExternalApiRateLimit;

    public int $maxExceptions = 3;
    public int $timeout = 120;
    public array $backoff = [10, 30, 60];

    public function retryUntil(): \DateTime
    {
        return now()->addMinutes(30);
    }

    public function handle(): void
    {
        // 1. Validate local state (entity exists, account active, token valid)
        // 2. Idempotency: if the resource has already been synced, abort
        if ($this->resource->external_id !== null) {
            return;
        }

        // 3. Call the external API
        // 4. Classify the error: transient (retry), permanent (fail), billing (pause)
        // 5. Update local state with the returned external_id
        // 6. Dispatch the next job in the chain, if any
    }

    public function failed(\Throwable $e): void
    {
        // Cascade: mark pending children as Failed, translate the provider's
        // technical message into something the end customer understands
    }
}
Enter fullscreen mode Exit fullscreen mode

Each choice has a purpose:

  • maxExceptions instead of tries → a release due to rate limit shouldn't burn an attempt. Only real failures count
  • retryUntil of 30 min → gives the sliding window room to open up without giving up
  • Exponential backoff → if the provider is hiccupping, I don't keep hammering
  • Rate limit trait → injects middleware that runs pre-flight before handle does
  • external_id checkidempotency. If the retry happens after the creation succeeded but before the local update, the next cycle bails. Never creates a duplicate resource on the provider
  • Cascading failed() → a permanent parent failure marks the pending children, translates the technical error into a friendly message

That's the atom. Every external operation on the platform is composed on top of it.

Job chaining: composing atoms into a workflow

A lot of the platform is a hierarchical workflow — create a parent resource, then N children, then N grandchildren. The customer's form captures everything at once, but execution happens in stages:

  1. The controller validates and saves everything as Draft in the local database — atomically, without talking to the provider
  2. Dispatches the parent job with chainToChild = true
  3. The parent job calls the API, saves external_id, and dispatches the child job for each pending child
  4. The child job calls the API, saves its external_id, and dispatches the grandchild job for each grandchild
  5. The grandchild job closes out the chain

Why this way and not one giant transaction?

  • Each step hits an external API → a transaction holding a lock across 3 HTTP calls is suicide
  • Each step fails independently → if the parent succeeds but the child breaks, the parent stays valid on the provider, the child sits in Failed in the database, and the customer reprocesses only what's missing
  • Retry granularity → hit a rate limit on the third grandchild? Only that one goes back to the queue, the others move on
  • Idempotency at every level → ran twice? The second pass sees external_id already filled in and skips

Example of "chaining with recovery":

// If the parent resource has already been synced but I was asked to chain,
// I just need to dispatch the children that haven't been done yet
if ($this->parent->external_id && $this->chainToChild) {
    $pendingChildren = $this->parent->children()
        ->whereIn('status', [Status::Draft, Status::Failed])
        ->whereNull('external_id')
        ->get();

    foreach ($pendingChildren as $index => $child) {
        SyncChildJob::dispatch($child)
            ->delay(now()->addSeconds($index * 2));
    }
    return;
}
Enter fullscreen mode Exit fullscreen mode

The delay(index * 2) spreads the dispatches out over time — it avoids enqueuing 20 jobs at once and burning the rate limit in a burst.

All of this inside a monolith, no Temporal, no Step Functions, no external orchestrator. Queues + per-job idempotency solve a complex workflow.

Rate limit in 3 layers with atomic Lua

This is where the engineering lives. Remember that the external provider limits per endpoint bucket and has three levels (QPS, QPM, QPD)? And that some buckets have a mandatory cooldown of several minutes when you blow past them?

You can't solve that with an off-the-shelf throttle. I built my own rate limiter with 3 layers:

Layer 1 — Pre-flight in the job middleware

Before handle() runs, a middleware calls rateLimiter->check(bucket). If the bucket is full, the job is released back to the queue with a calculated delay — it doesn't count as a consumed attempt.

Layer 2 — Atomic check-and-consume in Redis Lua

The heart of the system. A Lua script that runs inside Redis (atomically, no race condition between N workers):

-- Clear expired entries from both windows
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[2])  -- short window (1s)
redis.call('ZREMRANGEBYSCORE', KEYS[2], '-inf', ARGV[3])  -- long window (60s)

-- Count what's in the window
local short_count = redis.call('ZCARD', KEYS[1])
local long_count  = redis.call('ZCARD', KEYS[2])

local max_short = tonumber(ARGV[4])
local max_long  = tonumber(ARGV[5])

-- If it blew the limit, deny
if short_count >= max_short or long_count >= max_long then
    return {0, max_short - short_count, max_long - long_count}
end

-- Consume: add a timestamp to both windows
redis.call('ZADD', KEYS[1], ARGV[1], ARGV[6])
redis.call('ZADD', KEYS[2], ARGV[1], ARGV[6])

redis.call('EXPIRE', KEYS[1], 2)
redis.call('EXPIRE', KEYS[2], 62)

return {1, max_short - short_count - 1, max_long - long_count - 1}
Enter fullscreen mode Exit fullscreen mode

Why Lua in Redis?

  • Atomic by nature → Redis runs the script single-threaded. It's impossible for another worker to grab a slot between my check and my consume
  • Sliding window via Sorted Set → each request becomes an entry with a microsecond timestamp; expired ones are removed before counting. Precise, none of the "window resets to zero" with a spike at the start
  • Two simultaneous windows → short (per second) and long (per minute), checked in the same transaction

Layer 3 — Backpressure on 429

If a request still slips through and the provider returns 429 (another process consumed the slot, or the limit changed without notice), I record backpressure in a Redis key with a TTL. As long as that key is alive, layer 1 denies everything for that bucket.

For buckets with a mandatory cooldown, the key sticks around for the entire cooldown. Nobody gets through. Respect it or get banned.

Why 3 layers and not just one?

  • Layer 1 (pre-flight) → cheap. Doesn't burn a slot for nothing
  • Layer 2 (check-and-consume) → correct. Atomic, no race
  • Layer 3 (backpressure) → resilient. If the provider tweaks the limit behind the scenes, the 429 is the final authority

A simple counter wouldn't hold up. This setup is what handles the scale without hitting the ceiling.

Specialized queues, 1 orchestrator

I use queues by domain instead of one giant queue. Each one has a clear purpose:

Category Purpose Why it's separate
Interactive Real-time user actions (creating something through the wizard, etc.) Can't sit behind a sync backlog
Batch creation Batch operations (importing N, creating N at once) Isolated so it doesn't compete with interactive
Sync orchestrator Decides what to sync and when Coordinates the sync detail workers
Sync structure Pulls light structure (root entities) Light, frequent
Sync detail Pulls heavy detailing (metrics, dependencies) Heavy, less frequent
Heavy sync Historical backfill, long operations Can't block normal sync
Automation Periodically evaluates customer rules High latency, doesn't lock others
Monitoring Health checks, external event detection Passive, runs in parallel
Upload File transfer to the provider Doesn't compete with short API calls
Bureaucratic flows Slow operations with the provider (e.g., appeals) Non-urgent, has its own window

Why segregate instead of having one giant queue?

  • Latency isolation → a long sync doesn't hold up an interactive action
  • Natural priority → the interactive queue worker grabs work before the monitoring queue worker does
  • Independent scaling → if the sync falls behind, I spin up more processes just for the sync queue, without touching creation
  • Direct debugging → the dashboard shows the backlog of a specific queue, I know exactly what to investigate

And the queue manager (Horizon, in my case) does this with a single config file. I don't need Kubernetes to orchestrate workers. Redis + supervisor handle it.

Observability stack: Laravel-first

Mandatory disclaimer: yes, PHP. Yes, Laravel. Yes, PHP has been dead for some 20 years now. Funny that the corpse is the one paying my rent, my condo fee, and the R$ 87k MRR. Moving on.

I saw a lot of people in the group using Prometheus + Grafana. Excellent stack, but I picked a different path:

  • Laravel Pulse → app metrics (slow queries, jobs, cache, CPU, requests)
  • Laravel Horizon → queue dashboard (throughput, failed jobs, retries)
  • Laravel Telescope → request/query debugging in dev and staging
  • PostHog → product analytics (funnel, feature usage)
  • php artisan pail → live log tailing while debugging

Why not Prometheus + Grafana?

Not because they're bad — because Pulse and Horizon use the same Redis I already have running. Zero new infra. Zero new binaries. Zero agents to install. I flip a flag in .env and I have a dashboard.

If one day I need an infra metric Laravel can't see (e.g., network saturation on an external proxy), then I'll add Prometheus. As long as Pulse handles 95% of what I ask the system, there's no reason to duplicate.

Security stack: three layers

I split security into three levels: edge, server, application.

Edge (Cloudflare)

  • WAF up front (blocks generic attacks before they hit the origin)
  • Edge rate limit
  • Bot Fight Mode

Server (bare metal)

  • UFW firewall — only 22/80/443 open; MySQL blocked externally
  • Fail2ban — automatic ban on SSH/nginx brute force
  • Security headers in nginx (HSTS, X-Frame-Options, Referrer-Policy, Permissions-Policy)
  • server_tokens off — nginx doesn't hand out a version to scanners
  • TLS 1.2+ only, Let's Encrypt certs
  • Manual IP banning when a scanner shows up in the logs

Application (Laravel)

  • Own rate limit (Redis sliding window, 3 layers)
  • OAuth tokens encrypted in the database (Laravel Crypt)
  • Multi-tenancy with global scope on every model — impossible to leak data between customers by forgetting a where
  • Deploys via baked Docker image (no volume-mounting code in prod)

Each layer filters a different type of attack. Cloudflare catches the noise of the internet. UFW/Fail2ban catches whoever made it past. Laravel catches whoever authenticated but tries to access something that isn't theirs.

The scale problem: external API rate limit

When the platform grew, I noticed that my bottleneck wasn't CPU, wasn't the database, wasn't memory. The server handled everything with room to spare. The bottleneck was something I didn't control: the request quota TikTok gives me per developer application.

Every SaaS that integrates with the TikTok API registers an "application" in their portal. That application has a limit of requests per second, per minute, per day. With 7M/day and growing, I was on track to hit the ceiling — and when you do, TikTok returns 429 to everyone, blameless customers included.

The right question: data heavy, compute heavy, or I/O heavy?

Before deciding on the architecture, I had to classify the right problem. There are three models:

  • Compute heavy → CPU/GPU bound (ML, encoding, rendering). Solution: more machine, parallelism, GPU.
  • Data heavy → moving/processing TBs of data (ETL, data lake). Solution: partitioning, streaming, tiered storage.
  • I/O heavy / network bound → waiting on a network or external API response. Solution: concurrency, not horsepower.

My case is I/O heavy. Workers spend 90% of their time waiting on TikTok's response. Across all servers combined, I rarely go past 30% CPU. There's a data layer (structure sync, performance metrics, creatives), but MySQL + Redis swallow it just fine.

If I had classified it wrong and thought "I need to scale CPU", the answer would have been Kubernetes. But because the problem is external quota, the answer is something else.

Why not Kubernetes?

In the group chat, someone asked me directly: "why not k8s with pod autoscaling and monolith redundancy?"

The short answer: k8s scales resources in my infra. My problem isn't a resource in my infra.

If I spun up 10 pods of the monolith behind a load balancer, I'd have 10 pods making requests against the same TikTok application. The quota is the same. The 429s would be the same. I'd have spent time installing k8s to solve a problem it doesn't solve.

On top of that: k8s is one more tool to maintain. Solo. No SRE. No platform team. The operational complexity of k8s is greater than the problem it would solve here.

The decision: break up the monolith, but only the right part

What I actually needed was more than one identity on the TikTok API. Two applications registered in their portal = two independent quotas = double the ceiling.

For that, the ideal is to isolate the layer that talks to the TikTok API in its own service, one that knows how to load balance between multiple registered apps.

So the plan is:

  1. Keep the Laravel monolith for UI, queues, multi-tenancy, billing, dashboard, onboarding
  2. Extract the TikTok API client into a dedicated microservice
  3. That microservice routes requests across N apps registered in TikTok for Business, based on available quota
  4. The monolith calls the microservice like any other dependency

Immediate win: I double my quota ceiling. If I need more, I register a third app and the microservice now has 3 backends to route between. Linear scale, no changes to the core.

Why this preserves observability

A point I thought was important in the conversation: I don't want to lose what already works well.

  • Queues stay in Horizon
  • App metrics stay in Pulse
  • Debugging stays in Telescope
  • Multi-tenancy stays with global scope

The microservice handles only one responsibility: being the smart gateway to the TikTok API. It doesn't break my observability, doesn't multiply my operational pain, doesn't force me to learn another stack.

Break it apart bit by bit, not all at once

"Rewriting the monolith as microservices" is one of the worst decisions a startup can make. Almost every case I've seen ended badly.

What I'm doing is different: extract one service at a time, when the trade-off pays for itself. First the TikTok API client, because the external quota problem justifies it. If down the road another specific bottleneck shows up, I'll extract another one. But only when the benefit is obvious and measurable.

I've worked on distributed systems before (during my time at RD), and the most expensive lesson I learned is: a microservice that doesn't pay for itself is just technical debt with a name tag.

Conclusion

A recap of the decisions:

  1. External constraints define the architecture — Friday deploys, slow support, incomplete docs, and per-bucket rate limits force me into idempotency, detailed logging, and 3 layers of protection
  2. Observability with what I already have (Pulse/Horizon/Telescope) because it reuses existing infra
  3. Security in three layers (Cloudflare → server → app) because each one filters a different type of attack
  4. Monolith before microservices because a solo dev doesn't have the bandwidth to operate distributed systems for no reason
  5. Microservices when the gain is external, not internal — in my case, API quota, not infra resources
  6. No k8s because the problem isn't scaling a resource I control

It's not the trendy stack. It's the stack I can sustain on my own while the platform grows. Sometimes the right call is the boring one.

And every Friday, while the rest of Twitter declares once again that PHP is dead, I sit here praying the TikTok API doesn't die alongside it. So far, the corpse is winning.

Top comments (0)