tkpdx01

Posted on Jun 14

How to Auto-Submit Your Site's URLs to Google and Bing for Faster Indexing Using a Pure-Bash Cron Job on a DigitalOcean Droplet

#api #automation #tutorial #webdev

Introduction

When you publish or update a page, search engines won't necessarily notice right away. They rediscover your content on their own schedule by recrawling your sitemap, which can mean hours or days of delay. Submission APIs let you flip that around: instead of waiting to be crawled, you actively notify the search engine the moment something changes.

In this tutorial, you'll build a small, dependency-light pipeline that pings two indexing protocols from a DigitalOcean Droplet on a schedule. You'll use IndexNow (which Bing, Yandex, Seznam, Naver, and other engines share) as your general-purpose path for any page, and you'll wire up Google's Indexing API for the narrow set of pages Google officially supports. Everything runs in pure Bash with curl, openssl, and jq — no Node.js, Python, or third-party SDKs — so it's easy to audit and cheap to run.

By the end, you'll know how to:

Provision a Droplet and install the three tools the pipeline needs.
Create a Google Cloud service account, sign an OAuth2 JWT by hand with openssl, and exchange it for an access token.
Submit URLs to Google's Indexing API and parse the responses with jq.
Track Google's daily quota across runs with a persistent progress file.
Bulk-submit URLs to IndexNow with a hosted key file.
Schedule both jobs with cron, add random jitter, and log the results.

An honest note before you start. Google officially supports the Indexing API for only two kinds of pages: those with JobPosting structured data, and those with a BroadcastEvent embedded in a VideoObject (livestream video pages). It is not intended for blog posts, product pages, category pages, or marketing pages. The endpoint will technically accept any URL your service account is verified to own in Search Console, but using it for unsupported page types is against Google's documented terms, may be treated as spam, and has historically shown little to no indexing benefit — some sites have even reported de-indexing after abuse. So in this tutorial, IndexNow is your general-purpose tool for ordinary pages, and the Google Indexing API step is scoped to JobPosting/livestream pages. If you point Google's API at general pages anyway, treat it as unsupported and at your own risk.

Prerequisites

Before you begin, you'll need:

One Ubuntu 22.04 (or newer) DigitalOcean Droplet with a non-root user that has sudo privileges. The smallest size is plenty for this workload.
A registered domain, referred to as your_domain throughout, serving your site over HTTPS.
The site verified as a property in Google Search Console. You must be able to add a service account as an Owner of that property.
A Google Cloud project. You can create a free one at the Google Cloud Console.
The ability to upload a small text file to your site's web root (so it's reachable at https://your_domain/<key>.txt).
A plain-text list of the URLs you want to submit. This tutorial assumes a file with one full URL per line.

Step 1 - Provisioning the Droplet and Installing curl, openssl, jq

Start by logging into your Droplet as your sudo user:

ssh your_user@your_droplet_ip

This opens a shell on the server where the cron jobs will eventually run.

Refresh the package index and install the three tools the pipeline depends on:

sudo apt update
sudo apt install -y curl openssl jq

This installs curl (to make HTTPS requests), openssl (to sign the Google JWT and do base64url encoding), and jq (to build and parse JSON). On most Ubuntu images curl and openssl are already present, but running the command guarantees all three exist.

Confirm each tool is available:

curl --version | head -n1; openssl version; jq --version

You should see one version line per tool, similar to:

curl 7.81.0 (x86_64-pc-linux-gnu) libcurl/7.81.0 OpenSSL/3.0.2 ...
OpenSSL 3.0.2 15 Mar 2022
jq-1.6

If any line is missing, re-run the install command. Finally, create a working directory to hold your scripts, keys, and state files, and lock it down so only your user can enter it:

mkdir -p ~/site-index && chmod 700 ~/site-index && cd ~/site-index

The -p flag makes mkdir a no-op if the directory already exists, so this command is safe to re-run. The chmod 700 ensures no other account on the Droplet can traverse into the directory where you'll soon store a private key. You'll keep everything for this project inside ~/site-index.

Step 2 - Creating the Google Service Account and Granting It Access

This step is only needed if you have JobPosting or livestream (BroadcastEvent) pages. If you don't, skip ahead to Step 6 and use IndexNow alone.

First, in the Google Cloud Console, select your project and enable the Indexing API. Navigate to APIs & Services > Library, search for "Web Search Indexing API", and click Enable. Without this, every API call will fail with a "has not been used in project" error.

Next, create a service account. Go to APIs & Services > Credentials > Create credentials > Service account, give it a name like indexing-bot, and finish. Its email will look like your-service-account@your-project.iam.gserviceaccount.com.

Then create a key for it. Open the service account, go to the Keys tab, click Add key > Create new key, choose JSON, and download the file. Upload that JSON file to your Droplet's working directory — for example, save it as ~/site-index/key.json. Protect it immediately:

chmod 600 ~/site-index/key.json

This restricts the key file to your user only, so other accounts on the Droplet can't read your private credentials.

The JWT signing in the next step needs the private key in PEM form. Extract it from the JSON with jq:

jq -r '.private_key' ~/site-index/key.json > ~/site-index/key.pem
chmod 600 ~/site-index/key.pem

The -r flag tells jq to emit the raw string (without surrounding quotes and with the \n sequences in the JSON expanded into real newlines), producing a valid PEM file. You can confirm the result starts with the expected header:

head -n1 ~/site-index/key.pem

-----BEGIN PRIVATE KEY-----

Finally, grant the service account access to your site. In Google Search Console, open Settings > Users and permissions > Add user, paste the service account's email address, and set its permission to Owner. The Indexing API only accepts URLs for properties where the calling service account is a verified owner — a lower permission level will not work.

Step 3 - Signing a Google OAuth2 JWT in Pure Bash

Google's service-account flow uses the RFC 7523 "JWT Bearer" grant: you build a signed JSON Web Token and exchange it for a short-lived access token, with no interactive user-consent step. A JWT is three base64url-encoded segments joined by dots — header.payload.signature. You'll build each segment with openssl.

Create a script named google-token.sh:

nano ~/site-index/google-token.sh

Paste in the following:

#!/usr/bin/env bash
set -euo pipefail

KEY_JSON="$HOME/site-index/key.json"
KEY_PEM="$HOME/site-index/key.pem"
SA_EMAIL="$(jq -r '.client_email' "$KEY_JSON")"

# base64url: standard base64, made URL-safe, with '=' padding stripped
b64url() { openssl base64 -A | tr '+/' '-_' | tr -d '='; }

now=$(date +%s)
exp=$((now + 3600))

header=$(printf '{"alg":"RS256","typ":"JWT"}' | b64url)
payload=$(printf '{"iss":"%s","scope":"https://www.googleapis.com/auth/indexing","aud":"https://oauth2.googleapis.com/token","iat":%d,"exp":%d}' \
  "$SA_EMAIL" "$now" "$exp" | b64url)

# RS256 = RSASSA-PKCS1-v1_5 over SHA-256. Sign "header.payload", then base64url the raw bytes.
sig=$(printf '%s.%s' "$header" "$payload" | openssl dgst -sha256 -sign "$KEY_PEM" -binary | b64url)
jwt="${header}.${payload}.${sig}"

# Exchange the signed JWT for an access token (RFC 7523 grant).
curl -s -X POST https://oauth2.googleapis.com/token \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'grant_type=urn:ietf:params:oauth:grant-type:jwt-bearer' \
  --data-urlencode "assertion=${jwt}" \
  | jq -r '.access_token'

Save and exit (in nano, press Ctrl+O, Enter, then Ctrl+X).

A few things worth understanding here. The b64url helper takes standard base64 output and makes it URL-safe: tr '+/' '-_' swaps the two non-alphanumeric base64 characters for their URL-safe equivalents, and tr -d '=' removes the padding that base64url forbids. The openssl base64 -A flag keeps the output on a single line instead of wrapping it at 64 columns. The claims are required by Google: iss is the service-account email, scope is the indexing scope, aud must be exactly the token endpoint (https://oauth2.googleapis.com/token), iat is the current Unix time, and exp is at most one hour later. The signature is produced with openssl dgst -sha256 -sign, which performs SHA256withRSA — exactly what RS256 means — over the header.payload string. Finally, --data-urlencode percent-encodes the colons in the grant_type value automatically (so urn:ietf:... is sent as urn%3Aietf%3A... on the wire), which means you don't have to escape them by hand.

Make the script executable and run it:

chmod +x ~/site-index/google-token.sh
~/site-index/google-token.sh

If everything is wired up correctly, you'll get a long access token printed to your terminal:

ya29.c.b0Aaekm1K...very-long-string...

This token is a Bearer credential valid for one hour. If you see null instead, jump to the Troubleshooting section — it usually means the API isn't enabled, the service account isn't an Owner yet, or the Droplet's clock has drifted.

Step 4 - Submitting URLs to the Indexing API

With a working token, you can call the publish endpoint. The Indexing API expects a small JSON body — a url and a type, where type is URL_UPDATED (added or changed) or URL_DELETED (removed). Each successful publish for a single URL consumes one unit of your daily quota. A successful call returns HTTP 200 with a small JSON metadata body describing the notification.

Create google-submit.sh:

nano ~/site-index/google-submit.sh

Paste in:

#!/usr/bin/env bash
set -euo pipefail

DIR="$HOME/site-index"
URLS_FILE="$DIR/urls.txt"
ENDPOINT="https://indexing.googleapis.com/v3/urlNotifications:publish"

token="$("$DIR/google-token.sh")"
if [ -z "$token" ] || [ "$token" = "null" ]; then
  echo "Failed to obtain access token" >&2
  exit 1
fi

while IFS= read -r url; do
  [ -z "$url" ] && continue
  body=$(jq -n --arg u "$url" '{url: $u, type: "URL_UPDATED"}')
  resp=$(curl -s -w '\n%{http_code}' -X POST "$ENDPOINT" \
    -H "Authorization: Bearer $token" \
    -H 'Content-Type: application/json' \
    -d "$body")
  code=$(printf '%s' "$resp" | tail -n1)
  echo "$code  $url"
done < "$URLS_FILE"

Save and exit.

This script reads ~/site-index/urls.txt one line at a time. The jq -n --arg u "$url" invocation builds the JSON body safely — using jq rather than string interpolation means special characters in the URL are escaped correctly. The curl flag -w '\n%{http_code}' appends the HTTP status code on its own line, and tail -n1 extracts it, so each line of output pairs a status code with its URL.

Create a small test list first — for a JobPosting page, use a real URL on your domain:

printf 'https://your_domain/jobs/senior-engineer\n' > ~/site-index/urls.txt

Now run the submitter:

~/site-index/google-submit.sh

A successful submission returns HTTP 200:

200  https://your_domain/jobs/senior-engineer

A 200 means Google accepted the notification. A 403 usually means the service account isn't an Owner of the property; a 429 means you've hit the daily quota. If you see any other code, drop the -w/tail trick temporarily and print the full response body — Google returns a descriptive JSON error object that names the exact problem. Both 403 and 429 are covered in Troubleshooting.

Step 5 - Tracking the Daily Quota Across Runs

The Indexing API allows 200 publish requests per day per Google Cloud project by default, counted at the URL level: batching URLs into a single HTTP request doesn't save quota, since each URL still consumes one unit. The quota resets daily at midnight Pacific Time, and going over returns HTTP 429. If your job runs on a schedule and your URL list is long, you need to remember how many URLs you've already submitted today so you stop before hitting the wall. You can request more quota from Google via a form (it's free, though it may require enabling a billing account), but the safest approach is to never exceed what you have.

You'll track progress in a JSON file that records the date and the count submitted so far. Update google-submit.sh to read and write it:

nano ~/site-index/google-submit.sh

Replace the contents with this quota-aware version:

#!/usr/bin/env bash
set -euo pipefail

DIR="$HOME/site-index"
URLS_FILE="$DIR/urls.txt"
PROGRESS="$DIR/progress.json"
ENDPOINT="https://indexing.googleapis.com/v3/urlNotifications:publish"
DAILY_LIMIT=200

today=$(date +%F)

# Pick up today's count if the stored date is today; otherwise reset for the new day.
if [ -f "$PROGRESS" ] && [ "$(jq -r '.date' "$PROGRESS")" = "$today" ]; then
  used=$(jq -r '.used' "$PROGRESS")
else
  used=0
fi

token="$("$DIR/google-token.sh")"
if [ -z "$token" ] || [ "$token" = "null" ]; then
  echo "Failed to obtain access token" >&2
  exit 1
fi

while IFS= read -r url; do
  [ -z "$url" ] && continue
  if [ "$used" -ge "$DAILY_LIMIT" ]; then
    echo "Daily limit of $DAILY_LIMIT reached; stopping."
    break
  fi
  body=$(jq -n --arg u "$url" '{url: $u, type: "URL_UPDATED"}')
  resp=$(curl -s -w '\n%{http_code}' -X POST "$ENDPOINT" \
    -H "Authorization: Bearer $token" \
    -H 'Content-Type: application/json' \
    -d "$body")
  code=$(printf '%s' "$resp" | tail -n1)
  echo "$code  $url"
  if [ "$code" = "200" ]; then
    used=$((used + 1))
  elif [ "$code" = "429" ]; then
    echo "Received 429 (quota exhausted); stopping."
    break
  fi
  jq -n --arg d "$today" --argjson u "$used" '{date: $d, used: $u}' > "$PROGRESS"
done < "$URLS_FILE"

echo "Submitted $used/$DAILY_LIMIT URLs today."

Save and exit.

This version reads progress.json at startup. If the file's stored date matches today, it picks up the previous count; otherwise it resets used to 0 for the new day. Before each submission it checks whether the limit is reached, only increments used on a 200, and stops immediately if Google returns a 429. The counter is rewritten after every URL, so even if the run is interrupted, the next run won't double-submit beyond the quota.

Run it once to seed the progress file:

~/site-index/google-submit.sh && cat ~/site-index/progress.json

You'll see the run output followed by the saved state (the date will be whatever day you run it):

200  https://your_domain/jobs/senior-engineer
Submitted 1/200 URLs today.
{"date":"2026-06-13","used":1}

Step 6 - Bulk-Submitting URLs to Bing/Yandex via IndexNow

IndexNow is the general-purpose path for any page. It's a shared protocol: submitting to one participating endpoint propagates to all of them (Bing, Yandex, Seznam, Naver, and more), so you normally POST to a single endpoint. There's no daily quota, submission is near-instant, and you can send up to 10,000 URLs in one request.

First, generate an API key — a hex string between 8 and 128 characters using a-z, A-Z, 0-9, or -:

openssl rand -hex 16

This prints a 32-character key, for example a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6. Copy it; you'll reference it as <key> below.

IndexNow proves you own the domain by requiring the key to be hosted as a UTF-8 text file whose only content is the key itself. Create that file and place it at your site's web root so it's reachable at https://your_domain/<key>.txt:

echo -n 'a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6' > a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6.txt

The -n flag prevents echo from adding a trailing newline, keeping the file's content exactly equal to the key. Upload this file to your web server's document root (the exact location depends on your stack — for example, /var/www/your_domain/ for many setups). Verify it's reachable over HTTPS:

curl -s https://your_domain/a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6.txt

The response should be the key and nothing else. If the file is at the root and named <key>.txt, the keyLocation field is technically optional, but it's good practice to send it explicitly.

Now create the submission script, indexnow-submit.sh:

nano ~/site-index/indexnow-submit.sh

Paste in:

#!/usr/bin/env bash
set -euo pipefail

DIR="$HOME/site-index"
URLS_FILE="$DIR/urls.txt"
HOST="your_domain"
KEY="a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6"
KEY_LOCATION="https://${HOST}/${KEY}.txt"
ENDPOINT="https://api.indexnow.org/indexnow"

# Build a JSON array of all URLs in the file (skipping blank lines).
url_array=$(jq -R -s -c 'split("\n") | map(select(length > 0))' "$URLS_FILE")

body=$(jq -n \
  --arg host "$HOST" \
  --arg key "$KEY" \
  --arg loc "$KEY_LOCATION" \
  --argjson urls "$url_array" \
  '{host: $host, key: $key, keyLocation: $loc, urlList: $urls}')

resp=$(curl -s -w '\n%{http_code}' -X POST "$ENDPOINT" \
  -H 'Content-Type: application/json; charset=utf-8' \
  -d "$body")

code=$(printf '%s' "$resp" | tail -n1)
echo "IndexNow responded with HTTP $code"

Save and exit.

The jq -R -s -c pipeline reads the whole file as a raw string (-R), slurps it into one value (-s), splits it on newlines, and drops empty entries — turning your URL list into a JSON array in one pass. The second jq call assembles the final payload with host (the bare domain, no scheme), key, keyLocation (the full HTTPS URL to the key file), and urlList. The request uses Content-Type: application/json; charset=utf-8 as the protocol requires.

Make it executable and run it:

chmod +x ~/site-index/indexnow-submit.sh
~/site-index/indexnow-submit.sh

An accepted submission returns HTTP 200:

IndexNow responded with HTTP 200

For a single, quick manual test you can also use the GET form, which carries the key and one URL as query parameters:

curl -s -o /dev/null -w '%{http_code}\n' \
  'https://api.indexnow.org/indexnow?url=https://your_domain/some-page&key=a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6'

The other response codes are worth knowing: 400 means the JSON was malformed, 403 means the key wasn't found or didn't match the hosted file, 422 means a URL doesn't belong to host (or the key mismatches), and 429 means you're submitting too often. As a best practice, only submit URLs that were genuinely added, updated, or removed — spamming the same URLs repeatedly can trip the 429 response.

Step 7 - Scheduling Both Jobs with Cron, Jitter, and Logging

With both submitters working, you can automate them. You'll wrap each in a thin runner that adds a small random delay (jitter) and appends output to a log. Jitter spreads load out so that, if many people schedule the same minute, requests don't all land at the same instant.

Create a wrapper for the IndexNow job, run-indexnow.sh:

nano ~/site-index/run-indexnow.sh

Paste in:

#!/usr/bin/env bash
set -euo pipefail
DIR="$HOME/site-index"
LOG="$DIR/indexnow.log"

# Sleep a random 0-300 seconds unless --no-jitter is passed.
if [ "${1:-}" != "--no-jitter" ]; then
  sleep $((RANDOM % 301))
fi

echo "=== $(date -u +%FT%TZ) IndexNow run ===" >> "$LOG"
"$DIR/indexnow-submit.sh" >> "$LOG" 2>&1

Save and exit. Create the equivalent wrapper for Google, run-google.sh:

nano ~/site-index/run-google.sh

Paste in:

#!/usr/bin/env bash
set -euo pipefail
DIR="$HOME/site-index"
LOG="$DIR/google.log"

if [ "${1:-}" != "--no-jitter" ]; then
  sleep $((RANDOM % 301))
fi

echo "=== $(date -u +%FT%TZ) Google run ===" >> "$LOG"
"$DIR/google-submit.sh" >> "$LOG" 2>&1

Save and exit, then make both runners executable:

chmod +x ~/site-index/run-indexnow.sh ~/site-index/run-google.sh

The ${1:-} syntax safely reads the first argument even when none is passed (avoiding an "unbound variable" error under set -u), so you can run a wrapper with --no-jitter for an immediate manual test. RANDOM % 301 yields a value from 0 to 300, giving each run up to a five-minute random delay. The >> "$LOG" 2>&1 redirect appends both standard output and standard error to the log file.

Test a wrapper without waiting for the jitter:

~/site-index/run-indexnow.sh --no-jitter && tail -n3 ~/site-index/indexnow.log

You should see a timestamped header followed by the HTTP response line in the log. Now open your crontab to schedule both jobs:

crontab -e

Add these two lines, then save and exit:

17 3 * * * /home/your_user/site-index/run-indexnow.sh
3 5 * * *  /home/your_user/site-index/run-google.sh

These run IndexNow daily at 03:17 and Google daily at 05:03 (server time), with each wrapper adding its own random delay on top. cron runs jobs in a minimal environment with a non-login shell and a bare PATH, so always use absolute paths in the crontab line itself — replace your_user with your actual username. (Your scripts can still reference $HOME safely, because cron does set HOME from /etc/passwd; it's the command path in the crontab entry that won't be resolved against your interactive shell's PATH.) Confirm the entries were saved:

crontab -l

This prints your active crontab so you can verify both lines are present. Going forward, regenerate ~/site-index/urls.txt whenever your content changes — ideally limiting it to URLs that were actually added or updated — and the cron jobs will submit them automatically.

Troubleshooting

Google returns HTTP 429. You've exhausted the 200-requests-per-day project quota. The counter in progress.json should normally prevent this, but if you run multiple scripts against the same Google Cloud project, they share one quota pool. Wait for the reset at midnight Pacific Time, or request additional quota through Google's form. If you genuinely need more headroom, prioritize your most important URLs at the top of urls.txt so they're submitted before the limit is reached.

Google returns HTTP 403. The service account isn't recognized as an owner of the property, or the Indexing API isn't enabled. Re-check that you added the service account email as an Owner (not a lower role) in Search Console, and that the "Web Search Indexing API" is enabled in your Google Cloud project. A 403 can also appear if you submit a URL on a domain the account doesn't own. When you hit any unexpected code, remove the -w '\n%{http_code}' trick for a moment and print the raw response — Google's JSON error.message field usually states the precise cause.

google-token.sh prints null. The token exchange failed. Remove the trailing | jq -r '.access_token' from the script temporarily and re-run it to see the full error JSON. Common causes are a malformed PEM key (re-run the jq -r '.private_key' extraction from Step 2), clock skew on the Droplet (run timedatectl and ensure time sync is active, since an iat/exp outside the allowed window is rejected), or the API not being enabled yet.

You can't fetch your own sitemap to build urls.txt, getting 403. Some hosting providers and CDNs apply a firewall or bot-mitigation rule that blocks requests from datacenter IP ranges — which includes your Droplet. If curl https://your_domain/sitemap.xml returns a 403 from the server (look for a mitigation header in curl -I), the block is on the host side, not your script. Work around it by maintaining urls.txt directly on the Droplet (push it from wherever you build your site) rather than fetching the live sitemap, or by allowlisting your Droplet's IP in your provider's firewall.

IndexNow returns 403 or 422. A 403 means the key wasn't found or didn't match the hosted file — re-run the curl https://your_domain/<key>.txt check and confirm the file contains exactly the key with no trailing newline (the -n flag in Step 6 matters). A 422 means a URL in your urlList doesn't belong to the host you specified (or the key doesn't match); make sure every URL uses the same host as the key file, all over HTTPS.

Conclusion

You've built a self-contained, pure-Bash indexing pipeline on a DigitalOcean Droplet. You installed curl, openssl, and jq; created a Google service account and signed an OAuth2 JWT by hand to call the Indexing API for JobPosting and livestream pages; tracked Google's daily quota across runs with a persistent progress file; bulk-submitted ordinary pages to Bing, Yandex, and other engines through IndexNow; and scheduled both jobs with cron, jitter, and logging.

Keep the trade-offs in mind: IndexNow is the safe, general-purpose path with no quota and broad engine support, while Google's Indexing API should stay scoped to the page types Google documents. Neither one replaces a healthy sitemap and good internal linking — they accelerate discovery, they don't guarantee indexing or ranking.

From here, you could extend the pipeline in a few directions: diff your sitemap between runs so you only submit genuinely changed URLs, add a URL_DELETED path to notify Google when pages are removed, ship the logs to a monitoring service or a simple alert when a run sees repeated non-200 responses, or rotate your IndexNow key periodically by hosting a new key file. With the building blocks in place, each of these is a small addition to the scripts you already have.

DEV Community

How to Auto-Submit Your Site's URLs to Google and Bing for Faster Indexing Using a Pure-Bash Cron Job on a DigitalOcean Droplet

Introduction

Prerequisites

Step 1 - Provisioning the Droplet and Installing curl, openssl, jq

Step 2 - Creating the Google Service Account and Granting It Access

Step 3 - Signing a Google OAuth2 JWT in Pure Bash

Step 4 - Submitting URLs to the Indexing API

Step 5 - Tracking the Daily Quota Across Runs

Step 6 - Bulk-Submitting URLs to Bing/Yandex via IndexNow

Step 7 - Scheduling Both Jobs with Cron, Jitter, and Logging

Troubleshooting

Conclusion

Top comments (0)