42% Failure Rate and No One Complained — How My Last Scraper Was Silently Dying

#apify #buildinpublic #webdev #debugging

My 13th Apify actor had a 42% failure rate. I found out three weeks after deploying it.

No user complained. No alert fired. The runs just... failed. Quietly.

Here's what happened, why no one told me, and what I changed.

The Context

I built and deployed 13 Korean data scrapers to the Apify Store over about two weeks. The last one — musinsa-ranking-scraper — went live on March 9th. It scrapes Musinsa, Korea's largest fashion marketplace, for brand rankings and product data.

Pay-per-event pricing went live on March 25th. By then, the actor had already accumulated 40 runs in 30 days.

Of those 40 runs: 23 succeeded, 17 failed. That's a 42.5% failure rate.

Why No One Told Me

Three reasons:

1. The failures looked like user errors, not my bug.
When a run fails, Apify shows the exit code and log. Users see this, shrug, and try again — or switch to a competitor. They don't file a GitHub issue.

2. The failure mode was silent.
The actor didn't crash with a clear error. It started, attempted to initialize the crawler, and hung until timeout or memory limit — then failed. No stack trace pointing at the real culprit.

3. I wasn't monitoring failure rates.
I was watching total runs go up and celebrating. 40 runs! 5 users! Great. I never calculated: what percentage are actually succeeding?

The Root Cause: crawlee v2 Deprecation

The actor was built with apify: "^2.x" — the older SDK. Musinsa is rendered server-side (SSR), so it looked like a simple HTTP scraper. The logic was straightforward.

But as Apify updated their runtime environment, the older SDK behavior changed in subtle ways. The CheerioCrawler from crawlee v2 (bundled with apify 2.x) started failing on certain response handling paths — specifically around how it handled Musinsa's response headers.

The fix was a version bump:

// package.json — before
{
  "dependencies": {
    "apify": "^2.3.2",
    "crawlee": "^2.0.0"
  }
}

// package.json — after
{
  "dependencies": {
    "apify": "^3.0.0"
  }
}

Apify SDK 3.x includes crawlee 3.x natively. The upgrade resolved the response handling issue.

One test run after the fix: SUCCEEDED.

What the Data Looks Like Now

musinsa-ranking-scraper (30-day window, as of 2026-03-30)
  total runs:     40
  succeeded:      23
  failed:         17  <- all pre-fix
  after fix:      3 runs, 3 succeeded

The 30-day stats still show the old failures because they happened inside the window. Give it another week and the numbers will look better.

What I Should Have Done

Monitor failure rates, not just run counts.

Total runs going up is a vanity metric if a significant portion are failing. I should have been tracking:

success_rate = succeeded / total_runs

For a scraper portfolio, 95%+ is a reasonable target. Anything below 90% is a flag.

Set up failure alerts.

Apify has webhooks. I could have configured an alert for runs that fail three times in a row. I didn't. I do now — for the actors that matter most.

Read the release notes when the runtime updates.

Apify occasionally updates their actor runtime. When they do, older SDK versions can break. Subscribe to the changelog. Treat runtime updates like any other dependency update.

The Uncomfortable Part

Five users ran this actor while it had a 42% failure rate. They might have retried. They might have given up. I don't know — Apify doesn't show me user-level retry behavior.

Those 17 failed runs represent real compute, real time, and real frustration I can't trace or recover. The actor is fixed now, but the users who hit the bad runs won't know that unless they come back.

Build in public means sharing the wins and the failures. This was a failure I almost didn't notice.

I'm building Korean data APIs on Apify — scrapers for Naver, Musinsa, Melon, and more. 11,800+ runs across 13 actors. Follow along for the real numbers, good and bad.