NexGenData

Posted on May 14 • Edited on May 18 • Originally published at thenextgennexus.com

Google Killed 5 Billion Short URLs in August 2025. Here's How to Bring Them Back

#apify #googl #archive #urlshortener

Google Killed 5 Billion Short URLs in August 2025. Here's How to Bring Them Back

On August 25, 2025, Google turned off goo.gl for good.

The shutdown was announced in July 2024 with a thirteen-month runway. Starting in late 2024, Google began showing interstitial warning pages when users clicked goo.gl links — "this shortened URL will be deprecated on 25 August 2025" — and by mid-2025 those warnings were showing 100% of the time. On the scheduled date, the goo.gl domain started returning HTTP 404 Not Found for every short URL. Not a redirect, not an informational page. Just: gone.

Google's internal estimate, quoted in the original deprecation post, was that goo.gl had handled "billions" of shortened URLs over its 2009-2018 active lifetime. Third-party corpora suggest the true number is somewhere between 5 and 7 billion individual short codes, representing roughly 200 billion clicks across the service's lifetime. Between the 2018 freeze (when Google stopped issuing new short URLs but continued resolving old ones) and the 2025 shutdown, goo.gl was a write-once read-often archive of the web's link graph from the Web 2.0 era.

And now those billions of links lead nowhere.

The damage is broader than it sounds. Academic papers from 2010-2018 that cite goo.gl links. Forum posts on Reddit, Hacker News, Quora, and Stack Overflow that embedded them. Blog post comments and email newsletters with goo.gl survey links. Android app store listings that used goo.gl for deep links. Billions of archived tweets preserving context-specific short URLs. All of it went from "one click away from content" to "permanent 404" in a single afternoon.

Fortunately, a significant fraction of the goo.gl namespace was rescued in advance by ArchiveTeam — the volunteer digital-preservation collective that has spent two decades running rescue crawls against dying services (Google Reader, Google+, Yahoo Answers, Vine, countless more). Starting in July 2024 and running continuously through August 2025, ArchiveTeam's distributed crawler (the Warrior / goo-gl-grab project) resolved roughly 5 billion goo.gl codes and uploaded the result as a goo.gl_urlteam ZIP set on the Internet Archive, timestamp-indexed and publicly accessible.

This post is about goo-gl-resolver (actor ID 8JRXrd7Diyfv5jq7o), an Apify actor that combines the ArchiveTeam dataset with the Wayback Machine's crawl archive to resolve goo.gl/XXXXXX codes back to their original destinations at $0.004 per resolve on PPE.

Archive dataset coverage figures cited here reflect ArchiveTeam's public rescue totals as of November 2025.

What ArchiveTeam actually saved

ArchiveTeam's goo-gl-grab project was announced on their wiki in July 2024 and recruited volunteers through August 2025. The project ran a distributed brute-force enumeration over the goo.gl short-code space (5-7 character alphanumeric codes, roughly 62^7 = 3.5 trillion possible codes, but only a few billion ever issued). Volunteers ran the ArchiveTeam Warrior — a pre-packaged VM image that automatically pulls jobs, crawls assigned code ranges, and uploads results.

The raw output, stored at archive.org/details/archiveteam_goo_gl, consists of:

A master manifest of ~5.04 billion resolved codes.
Per-code JSONL records with short_code, long_url, HTTP status at resolve time, timestamp, and Archive-derived metadata when available.
A bloom-filter index (~18 GB) for fast membership testing.
Split WARC files totaling ~2.3 TB compressed.

Coverage is incomplete but impressive. ArchiveTeam's estimate is that they resolved somewhere between 70-85% of the goo.gl codes that were ever actively hit in the wild, prioritizing codes that appeared in the Common Crawl corpus, Wayback Machine snapshots, and ArchiveBot-targeted sites. Pure "issued but never clicked" codes — which constitute most of the long tail — were not prioritized and are mostly unresolved.

What ArchiveTeam did not rescue: goo.gl's analytics data (click counts, referrer breakdowns, geographic distribution), the creator account associations, and any codes whose destinations had already been broken/404'd before the ArchiveTeam pass started.

Why a resolver needs two sources

The ArchiveTeam dataset alone is enough for most queries, but has gaps. Wayback Machine complements it:

Wayback for ArchiveTeam misses: codes not in the rescue dataset often still have a Wayback snapshot of the original goo.gl HTTP 301 redirect, captured by the Archive's regular crawls pre-2025.
Wayback for destination validation: even when ArchiveTeam has the destination URL, the destination itself may be 404 today. Wayback often has a snapshot of the actual content the user would have reached.
Wayback for freshness disambiguation: some goo.gl codes changed destinations over their lifetime (Google allowed some edits in edge cases). Wayback's timestamped snapshots show which destination was live at which point.

The goo-gl-resolver actor blends both:

Primary: ArchiveTeam dataset lookup (fastest, ~95% hit rate on in-dataset codes).
Fallback: Wayback CDX API search for goo.gl/XXXXXX, parsing the Location header from the HTTP 301 snapshot.
Final destination check: if requested, verify whether the long URL is still live today, and return a Wayback snapshot URL if not.

The schema output:

{
  "short_code": "ABC123",
  "short_url": "https://goo.gl/ABC123",
  "resolved": true,
  "source": "archiveteam",
  "long_url": "https://example.com/original-article",
  "original_status": 301,
  "resolved_at": "2024-11-14T08:22:31Z",
  "destination_status_today": 200,
  "wayback_snapshot_url": "https://web.archive.org/web/20240315120000/https://example.com/original-article",
  "wayback_snapshot_ts": "20240315120000",
  "fetched_at": "2026-09-04T14:00:00Z"
}

source is "archiveteam", "wayback", "both", or "unresolved". When both sources agree, source is "both" and confidence is highest.

The scale of the problem

The goo.gl shutdown is the largest URL-shortener shutdown in internet history by a large margin. For context:

Shortener	Status	Approximate URLs affected
bit.ly	active	~40B URLs (lifetime issued)
tinyurl.com	active	~5B URLs
t.co (Twitter)	active, redirected post-X migration	~500B URLs issued
ow.ly	active (Hootsuite)	~2B URLs
goo.gl	dead 2025-08-25	~5-7B URLs
is.gd	active	~1B URLs
buff.ly	active (Buffer)	~500M URLs
su.pr	dead 2019 (StumbleUpon)	~50M URLs
ff.im (FriendFeed)	dead 2015	~30M URLs
bit.do	dead 2023	~200M URLs

Per-URL context loss varies enormously. A 2015 goo.gl link in a dead blog comment is usually not worth recovering. A 2012 goo.gl link in a landmark academic paper cited by hundreds of downstream papers is worth substantial effort to recover, and the destination often exists but isn't findable without the resolve step.

Comparing resolver options

Option	Coverage	Cost	Setup	Notes
ArchiveTeam raw WARCs	~70-85% of active codes	free	2.3 TB storage + indexing	Correct answer for archival pipelines; impractical for ad-hoc queries
Wayback CDX API direct	~30-50% of codes	free (rate-limited)	none	One-off queries fine; doesn't scale, rate-limited
Internet Archive URL API	~40-60%	free (rate-limited)	Archive account	Similar to CDX; slightly better for embedded URL extraction
CommonCrawl URL index (CCI)	~20-30%	free (storage)	S3 + Athena	Useful for large-scale lookups, misses codes not crawled
Manual inspection	per-URL	free	painful	What you were doing yesterday
goo-gl-resolver	~85-90% blended	$0.004/resolve	Apify token	Both sources blended, destination-today status included

The blended coverage figure (~85-90%) reflects the union of ArchiveTeam and Wayback minus overlap. For codes that were ever clicked enough to show up in either dataset, the hit rate is above 90%. For true long-tail codes (issued but essentially never used), no dataset helps.

Migration: the resolve-in-place change

Most goo.gl recovery workflows look something like this: you have a corpus (forum posts, academic papers, product catalogs) that mentions goo.gl/XXXXX URLs, and you want to replace each mention with the original long URL or a Wayback snapshot link.

Pattern-recognizing goo.gl/XXXXX in free text:

import re

GOOGL_PATTERN = re.compile(r'https?://goo\.gl/([A-Za-z0-9]{5,8})')

def extract_googl_codes(text):
    return [m.group(1) for m in GOOGL_PATTERN.finditer(text)]

Then batch-resolving via the actor:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")

def resolve_codes(codes):
    run = client.actor("nexgendata/goo-gl-resolver").call(run_input={
        "short_codes": codes,
        "check_destination_today": True,
    })
    return {
        rec["short_code"]: rec
        for rec in client.dataset(run["defaultDatasetId"]).iterate_items()
    }

Then replacing inline:

def rewrite_text(text, resolutions):
    def replace(match):
        code = match.group(1)
        rec = resolutions.get(code)
        if not rec or not rec["resolved"]:
            return match.group(0)
        if rec.get("destination_status_today") == 200:
            return rec["long_url"]
        return rec["wayback_snapshot_url"] or match.group(0)
    return GOOGL_PATTERN.sub(replace, text)

That's the full pattern. Extract codes, batch-resolve, rewrite. The actor handles the ArchiveTeam lookup, Wayback fallback, and destination-status check in parallel.

Code examples

Python: rescuing goo.gl links in a scraped research corpus

An academic-integrity team at a university maintains a corpus of 120,000 scholarly papers indexed for citation-graph analysis. About 6% of those papers contain goo.gl URLs, overwhelmingly from the 2012-2017 golden era of the shortener. Post-shutdown, a noticeable fraction of their reference-recovery tooling broke silently.

from apify_client import ApifyClient
import re
import json

client = ApifyClient("APIFY_TOKEN")
GOOGL_PATTERN = re.compile(r'https?://goo\.gl/([A-Za-z0-9]{5,8})')

def extract_all_codes(paper_path):
    with open(paper_path) as f:
        text = f.read()
    return list(set(GOOGL_PATTERN.findall(text)))

paper_ids = [...]  # list of 120k paper IDs
all_codes = set()
code_to_papers = {}

for paper_id in paper_ids:
    codes = extract_all_codes(f"corpus/{paper_id}.txt")
    for c in codes:
        all_codes.add(c)
        code_to_papers.setdefault(c, []).append(paper_id)

codes = sorted(all_codes)
print(f"{len(codes)} unique goo.gl codes across {len(paper_ids)} papers")

resolutions = {}
for i in range(0, len(codes), 5000):
    batch = codes[i:i+5000]
    run = client.actor("nexgendata/goo-gl-resolver").call(run_input={
        "short_codes": batch,
        "check_destination_today": True,
    })
    for rec in client.dataset(run["defaultDatasetId"]).iterate_items():
        resolutions[rec["short_code"]] = rec

# Write a resolution table for the citation-graph pipeline
with open("googl_resolutions.jsonl", "w") as f:
    for code, rec in resolutions.items():
        f.write(json.dumps({
            "code": code,
            "long_url": rec.get("long_url"),
            "source": rec.get("source"),
            "papers": code_to_papers[code],
            "wayback": rec.get("wayback_snapshot_url"),
            "alive_today": rec.get("destination_status_today") == 200,
        }) + "\n")

A typical result: around 7,200 unique codes across the corpus, of which ~86% resolve via ArchiveTeam, a further ~4% via Wayback fallback, and ~10% remain unresolved. At $0.004 per resolve, the entire pass cost around $29.

curl: single-URL resolve for a browser-extension backend

curl -X POST "https://api.apify.com/v2/acts/nexgendata~goo-gl-resolver/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "short_codes": ["Hpf8tA"],
    "check_destination_today": true
  }' \
  | jq '.[0] | {short_code, long_url, source, destination_status_today, wayback_snapshot_url}'

Fast for shell pipelines and interactive tools. Useful as the backend for a browser extension that rewrites goo.gl links in-page on hover.

Node.js: batch rewrite of archived Reddit dumps

A DataHoarder running a Reddit archive pipeline wanted to rewrite goo.gl links in a monthly Pushshift-equivalent dump (a few hundred million comments per month, with roughly 0.2% containing a goo.gl URL — still hundreds of thousands of unique codes).

const { ApifyClient } = require('apify-client');
const fs = require('fs');
const readline = require('readline');

const apify = new ApifyClient({ token: process.env.APIFY_TOKEN });
const GOOGL = /https?:\/\/goo\.gl\/([A-Za-z0-9]{5,8})/g;

async function collectCodes(inputPath) {
  const rl = readline.createInterface({
    input: fs.createReadStream(inputPath),
  });
  const codes = new Set();
  for await (const line of rl) {
    const comment = JSON.parse(line);
    const body = comment.body || '';
    let m;
    while ((m = GOOGL.exec(body)) !== null) codes.add(m[1]);
  }
  return [...codes];
}

async function resolveAll(codes) {
  const resolutions = new Map();
  for (let i = 0; i < codes.length; i += 5000) {
    const batch = codes.slice(i, i + 5000);
    const run = await apify.actor('nexgendata/goo-gl-resolver').call({
      short_codes: batch,
      check_destination_today: false,
    });
    const { items } = await apify.dataset(run.defaultDatasetId).listItems();
    for (const rec of items) resolutions.set(rec.short_code, rec);
  }
  return resolutions;
}

async function rewriteDump(inputPath, outputPath, resolutions) {
  const rl = readline.createInterface({
    input: fs.createReadStream(inputPath),
  });
  const out = fs.createWriteStream(outputPath);
  for await (const line of rl) {
    const comment = JSON.parse(line);
    if (comment.body) {
      comment.body = comment.body.replace(GOOGL, (match, code) => {
        const rec = resolutions.get(code);
        if (rec && rec.resolved) return rec.long_url;
        return match;
      });
    }
    out.write(JSON.stringify(comment) + '\n');
  }
  out.end();
}

(async () => {
  const codes = await collectCodes('reddit_2018_06.jsonl');
  console.log(`Found ${codes.length} unique goo.gl codes`);
  const res = await resolveAll(codes);
  await rewriteDump('reddit_2018_06.jsonl', 'reddit_2018_06_rewritten.jsonl', res);
})();

Setting check_destination_today: false skips the live-URL verification and cuts cost roughly in half (no outbound check). For archival rewriting, current liveness is rarely worth checking — Wayback URLs are the stable answer anyway.

Python: SEO audit — which external backlinks point to dead goo.gl URLs?

For an SEO team auditing a site's historical backlink profile, knowing which inbound links are dead goo.gl redirects is material. Those backlinks pass no link equity post-shutdown.

from apify_client import ApifyClient
import re

client = ApifyClient("APIFY_TOKEN")
GOOGL = re.compile(r'goo\.gl/([A-Za-z0-9]{5,8})')

# backlinks.csv: inbound_url, target_url (our domain)
import csv
codes = set()
by_backlink = {}

with open("backlinks.csv") as f:
    for row in csv.DictReader(f):
        for m in GOOGL.findall(row["target_url"]):
            codes.add(m)
            by_backlink.setdefault(m, []).append(row["inbound_url"])

run = client.actor("nexgendata/goo-gl-resolver").call(run_input={
    "short_codes": sorted(codes),
    "check_destination_today": True,
})

dead_backlinks = []
for rec in client.dataset(run["defaultDatasetId"]).iterate_items():
    if not rec["resolved"]:
        for inbound in by_backlink[rec["short_code"]]:
            dead_backlinks.append((inbound, "unresolvable"))
        continue
    status = rec.get("destination_status_today")
    if status and status >= 400:
        for inbound in by_backlink[rec["short_code"]]:
            dead_backlinks.append((inbound, f"destination_{status}"))

print(f"{len(dead_backlinks)} dead-linked backlinks found")
for url, reason in dead_backlinks[:20]:
    print(f"  {url} -> {reason}")

A moderately old site can surface hundreds to thousands of dead goo.gl backlinks this way. The business value depends on whether you have relationships with the linking sites and can request updates.

curl: bulk Wayback-only mode for a strict archival workflow

Some archival workflows only want the Wayback-snapshot URL, not the live destination. Pass prefer: "wayback" and the actor returns a Wayback-timestamped URL for every resolvable code, regardless of whether the original destination is live today:

curl -X POST "https://api.apify.com/v2/acts/nexgendata~goo-gl-resolver/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "short_codes": ["Hpf8tA", "fbH8Zx", "tYBv2Q"],
    "prefer": "wayback",
    "check_destination_today": false
  }'

This is the preferred mode for scholarly citation repair, where "the stable archive URL" is the correct answer even when the original destination is still live.

Worked example: a digital-humanities lab rewriting a 2012-2018 forum archive

A digital-humanities lab at a research university maintains a scraped mirror of a now-defunct hobbyist forum, used for longitudinal linguistic analysis of community dialect shifts over a six-year window. The forum was particularly heavy on goo.gl links — community convention encouraged short URLs for share-ability — and post-shutdown roughly 340,000 distinct goo.gl codes in the corpus went dark.

The lab's first attempt at resolution used the Wayback Machine's CDX API directly. Two weeks later they had 41% of codes resolved, had been throttled by the Internet Archive three times, and had a pile of CSV files in various states of joinability. They hadn't touched destination liveness checking yet. The project had become bespoke infrastructure work instead of linguistics research.

Switching to the actor took half a day. The pipeline:

codes = load_googl_codes_from_corpus()  # 340k
batches = [codes[i:i+5000] for i in range(0, len(codes), 5000)]

for batch in batches:
    run = client.actor("nexgendata/goo-gl-resolver").call(run_input={
        "short_codes": batch,
        "prefer": "wayback",
        "check_destination_today": False,
    })
    upsert_to_postgres(client.dataset(run["defaultDatasetId"]).iterate_items())

Total cost: 340k codes × $0.004 = $1,360. Wall-clock time: about 9 hours. Hit rate: 87% via ArchiveTeam + Wayback combined. The research team reclaimed the engineering effort and got back to linguistics.

The cost was higher than the previous "free Wayback" approach, but the free approach had cost two weeks of engineering time (easily $4-6k in salary) and not produced a usable corpus. For one-shot archival passes, the commodity API usually wins.

Schema-level gotchas

A few things worth knowing:

Code length variance. goo.gl codes were historically 5-6 characters alphanumeric, but custom-vanity codes (shipped in 2012-2014) could be up to 8 characters. The regex [A-Za-z0-9]{5,8} catches the standard range; a few vanity codes used underscores or dashes — pass allow_extended_chars: true to broaden the accepted pattern.
Case sensitivity. goo.gl codes were case-sensitive (e.g., abc123 and ABC123 were different destinations). The actor preserves case and treats them as distinct.
Redirect chains. A small number of goo.gl links pointed at another shortener (bit.ly, tinyurl), which pointed at a third destination. The actor resolves one hop and returns the immediate destination; you can chain resolves through other resolvers if you need the terminal URL.
Spam and malware destinations. Roughly 3-5% of goo.gl codes (especially from 2014-2016) pointed at content that was flagged as spam or malware. ArchiveTeam's dataset marks these with a flagged: true field, which the actor passes through as-is. Your pipeline should decide whether to surface these URLs to end users.
Freshness. ArchiveTeam's final pass completed in August 2025. Destinations that changed or went dead between the crawl and today are not reflected in the long_url, but the destination_status_today field (when enabled) will show the current HTTP status.
Analytics data. Goo.gl had a + suffix API (goo.gl/XXXXX+) that returned click counts and analytics. That data was not preserved — it lived only in Google's internal database.

When this is not the right answer

You need every single goo.gl code ever issued. The ~15% unresolved tail cannot be recovered from public sources; Google did not release a final manifest.
You need per-click analytics. Those never left Google's servers.
You're resolving at interactive-user latency. The actor's p50 is 200-500ms, fine for batch rewriting but not fast enough for inline hover-preview without caching.
You have the 2.3 TB WARC dataset and dedicated infrastructure. Self-hosting the ArchiveTeam dataset is cheaper per resolve at very high volumes; the actor exists for teams that don't want to stand that up.
Your short URLs aren't actually goo.gl. Other dead shorteners (su.pr, ff.im, bit.do) require different archival data sources.

FAQ

How complete is the ArchiveTeam dataset really?

ArchiveTeam estimates 70-85% of actively-used codes. "Actively used" here means any code that appeared in Common Crawl, a Wayback snapshot, or any ArchiveBot-targeted site. Codes that were issued but never referenced on any crawled page are mostly absent. For practical purposes — recovering links that real people actually used — the dataset is near-complete.

Why blend Wayback if ArchiveTeam has the data?

Three reasons. First, ArchiveTeam misses some codes Wayback caught from old 301-redirect snapshots. Second, destination-validation sometimes requires Wayback even when ArchiveTeam has the redirect. Third, when a goo.gl code's destination is itself dead today, Wayback is the only path to the actual content the original user would have seen.

What about goo.gl/maps, goo.gl/forms, and other Google-product subpaths?

Google Maps short links (goo.gl/maps/XXXX) continue to work — they redirect to maps.google.com internally and Google kept that path alive. Forms short links (forms.gle/XXXXX) were never part of goo.gl proper and are unaffected. The actor focuses on the generic goo.gl/XXXXX namespace.

How current is the destination-liveness check?

Real-time at query time. When check_destination_today: true, the actor issues a HEAD request to the resolved long URL and returns the HTTP status. This adds ~200-400ms per code and roughly doubles the PPE cost. Leave it off for archival rewrites where current liveness doesn't matter.

Can I run this on Apify's free tier?

Yes. Apify's free tier includes $5/month of compute credit, roughly 1,250 resolves on PPE. For larger jobs, pay-as-you-go at $0.004 per resolve. See the actor page for current pricing.

What about other dead URL shorteners?

We maintain related actors for su.pr, ff.im, bit.do, and a few smaller dead shorteners. Coverage varies dramatically — these smaller services had much less ArchiveTeam attention during their shutdowns. See the related-actors section below.

Will you add new codes as ArchiveTeam updates its dataset?

ArchiveTeam's goo-gl-grab project finished in August 2025 and will not be rerun — goo.gl is permanently dead, so there are no new codes to capture. The dataset is static. If additional codes are surfaced (e.g., from a new Common Crawl release), we'll incorporate them in the actor's lookup index.

Is there any way to recover goo.gl analytics?

No. Google did not release click-count or referrer data during the shutdown. The analytics endpoint (goo.gl/XXXXX+) stopped working before the general shutdown, and no archival project captured that data.

Does this work for `goo.gl/cZR0x1` type codes (with underscores/dashes)?

Yes. Pass allow_extended_chars: true in the run input to broaden the accepted code character set. Most codes are plain alphanumeric, but vanity-issued codes from 2012-2014 occasionally used underscores or dashes.

How does this compare to querying ArchiveTeam's raw data directly?

ArchiveTeam's raw dataset is 2.3 TB of compressed WARC files plus an 18 GB bloom-filter index. Downloading, indexing, and querying it is several days of engineering work and ongoing storage cost. The actor wraps that as a hosted service with Wayback fallback built in. For any project where "ship in an afternoon" matters more than "own the infrastructure," the actor is the faster path.

What's next

If you like this actor, a few related ones from the same archival pipeline:

supr-resolver — resolves dead StumbleUpon su.pr short URLs against the ArchiveTeam StumbleUpon rescue dataset.
bitdo-resolver — resolves bit.do short URLs (service shuttered 2023). Smaller but often-referenced dataset.
wayback-snapshot-finder — bulk Wayback Machine snapshot lookups for any URL, returning timestamp-indexed snapshot URLs for citation repair workflows.
archiveteam-dataset-browser — unified query interface across ArchiveTeam's rescue datasets for multiple dead services.

Conclusion

goo.gl is dead and it isn't coming back. What's salvageable — the billions of redirect mappings that once connected shortened codes to their destinations — was saved by the ArchiveTeam community in the months leading up to the August 2025 shutdown. Roughly 85-90% of actively-used short codes can be resolved from the combined ArchiveTeam + Wayback Machine record today. The remaining long tail (codes issued but essentially never clicked) is largely unrecoverable.

If you have a corpus of 2012-2018 content — forum dumps, academic papers, blog archives, scraped product listings — with goo.gl references in it, resolution is a one-time pass. At $0.004 per code and coverage in the high-80s percentage range, the goo-gl-resolver actor takes an archival recovery project from "two weeks of Wayback scripting against rate-limited APIs" to "an afternoon and a few hundred dollars." Once the pass is done you have a durable resolution table you can use forever; the actor exists to get you to that table efficiently.

The wider lesson — "if a service you depend on is dying, and ArchiveTeam is running a rescue, that rescue is probably the last copy of your data" — is worth internalizing. Services die faster than most infrastructure teams expect, and the only people reliably catching the drops are the preservation volunteers. If you're in a position to mirror, donate, or spin up a Warrior VM: do it before the next shutdown.

DEV Community

Google Killed 5 Billion Short URLs in August 2025. Here's How to Bring Them Back

Google Killed 5 Billion Short URLs in August 2025. Here's How to Bring Them Back

What ArchiveTeam actually saved

Why a resolver needs two sources

The scale of the problem

Comparing resolver options

Migration: the resolve-in-place change

Code examples

Python: rescuing goo.gl links in a scraped research corpus

curl: single-URL resolve for a browser-extension backend

Node.js: batch rewrite of archived Reddit dumps

Python: SEO audit — which external backlinks point to dead goo.gl URLs?

curl: bulk Wayback-only mode for a strict archival workflow

Worked example: a digital-humanities lab rewriting a 2012-2018 forum archive

Schema-level gotchas

When this is not the right answer

FAQ

How complete is the ArchiveTeam dataset really?

Why blend Wayback if ArchiveTeam has the data?

What about goo.gl/maps, goo.gl/forms, and other Google-product subpaths?

How current is the destination-liveness check?

Can I run this on Apify's free tier?

What about other dead URL shorteners?

Will you add new codes as ArchiveTeam updates its dataset?

Is there any way to recover goo.gl analytics?

Does this work for `goo.gl/cZR0x1` type codes (with underscores/dashes)?

How does this compare to querying ArchiveTeam's raw data directly?

What's next

Conclusion

Top comments (0)

Google Killed 5 Billion Short URLs in August 2025. Here's How to Bring Them Back

What ArchiveTeam actually saved

Why a resolver needs two sources

The scale of the problem

Comparing resolver options

Migration: the resolve-in-place change

Code examples

Python: rescuing goo.gl links in a scraped research corpus

curl: single-URL resolve for a browser-extension backend

Node.js: batch rewrite of archived Reddit dumps

Python: SEO audit — which external backlinks point to dead goo.gl URLs?

curl: bulk Wayback-only mode for a strict archival workflow

Worked example: a digital-humanities lab rewriting a 2012-2018 forum archive

Schema-level gotchas

When this is not the right answer

FAQ

How complete is the ArchiveTeam dataset really?

Why blend Wayback if ArchiveTeam has the data?

What about goo.gl/maps, goo.gl/forms, and other Google-product subpaths?

How current is the destination-liveness check?

Can I run this on Apify's free tier?

What about other dead URL shorteners?

Will you add new codes as ArchiveTeam updates its dataset?

Is there any way to recover goo.gl analytics?

Does this work for goo.gl/cZR0x1 type codes (with underscores/dashes)?

How does this compare to querying ArchiveTeam's raw data directly?

What's next

Conclusion

Does this work for `goo.gl/cZR0x1` type codes (with underscores/dashes)?