<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kaugesaar</title>
    <description>The latest articles on DEV Community by kaugesaar (@kaugesaar).</description>
    <link>https://dev.to/kaugesaar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3796818%2F2ad785ca-f296-4615-b767-8fbe7e9b8526.png</url>
      <title>DEV Community: kaugesaar</title>
      <link>https://dev.to/kaugesaar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kaugesaar"/>
    <language>en</language>
    <item>
      <title>Running Screaming Frog on GCP with Cloud Run Jobs</title>
      <dc:creator>kaugesaar</dc:creator>
      <pubDate>Fri, 27 Feb 2026 17:36:45 +0000</pubDate>
      <link>https://dev.to/kaugesaar/running-screaming-frog-on-gcp-with-cloud-run-jobs-59i7</link>
      <guid>https://dev.to/kaugesaar/running-screaming-frog-on-gcp-with-cloud-run-jobs-59i7</guid>
      <description>&lt;p&gt;Running Screaming Frog on a VM means paying for idle time between crawls. Cloud Run Jobs let you spin up a container, run the crawl, and shut down - you only pay for actual compute time.&lt;/p&gt;

&lt;p&gt;At &lt;a href="http://www.precis.com" rel="noopener noreferrer"&gt;Precis&lt;/a&gt;, we built an internal service that runs &lt;a href="https://www.screamingfrog.co.uk/seo-spider/" rel="noopener noreferrer"&gt;Screaming Frog&lt;/a&gt; crawls at scale using GCP. This post walks through the core setup - a simplified version you can deploy in about 30 minutes.&lt;/p&gt;

&lt;p&gt;For this proof-of-concept, we're going to use Cloud Run Jobs, Cloud Storage, and a simple Dockerfile combined with a bash file used as its entrypoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we're building
&lt;/h2&gt;

&lt;p&gt;The setup is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dockerfile that installs Screaming Frog&lt;/li&gt;
&lt;li&gt;Entrypoint script that runs the crawl&lt;/li&gt;
&lt;li&gt;Cloud Run Job that executes the container&lt;/li&gt;
&lt;li&gt;GCS bucket to store exports&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google Cloud Project with billing enabled&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gcloud&lt;/code&gt; CLI installed and configured&lt;/li&gt;
&lt;li&gt;Screaming Frog license&lt;/li&gt;
&lt;li&gt;Basic Docker knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting up GCP resources
&lt;/h2&gt;

&lt;p&gt;These next steps assume you have &lt;a href="https://docs.cloud.google.com/sdk/docs/install-sdk" rel="noopener noreferrer"&gt;gcloud sdk&lt;/a&gt; installed and that you are somewhat familiar with GCP.&lt;/p&gt;

&lt;p&gt;Note that many of the steps below require you to have these two ENVs set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-gcp-project"&lt;/span&gt;
&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-preferred-region"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable the required APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;artifactregistry.googleapis.com run.googleapis.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a storage bucket for the CSV exports that Screaming Frog will generate. We'll later mount this bucket to the Cloud Run Job instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gsutil mb &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; gs://&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-crawl-output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a service account for the job and give it permission to read, write and delete files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud iam service-accounts create screaming-frog-runner &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ScreamingFrog Runner"&lt;/span&gt;

gcloud projects add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:screaming-frog-runner@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.objectAdmin"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Store your Screaming Frog license and a persistent machine ID in Secret Manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Secret Manager API&lt;/span&gt;
gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;secretmanager.googleapis.com

&lt;span class="c"&gt;# Create license secret&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"YOUR-LICENSE-KEY"&lt;/span&gt; | gcloud secrets create screaming-frog-license &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

&lt;span class="c"&gt;# Create a persistent machine ID&lt;/span&gt;
uuidgen | gcloud secrets create screaming-frog-machine-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

&lt;span class="c"&gt;# Grant access to service account&lt;/span&gt;
gcloud secrets add-iam-policy-binding screaming-frog-license &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:screaming-frog-runner@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/secretmanager.secretAccessor"&lt;/span&gt;

gcloud secrets add-iam-policy-binding screaming-frog-machine-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:screaming-frog-runner@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/secretmanager.secretAccessor"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building the Docker image
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:22.04&lt;/span&gt;

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    openjdk-21-jre &lt;span class="se"&gt;\
&lt;/span&gt;    xvfb &lt;span class="se"&gt;\
&lt;/span&gt;    wget &lt;span class="se"&gt;\
&lt;/span&gt;    ca-certificates &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="c"&gt;# Download and install Screaming Frog&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;wget &lt;span class="nt"&gt;-O&lt;/span&gt; /tmp/screamingfrog.deb &lt;span class="se"&gt;\
&lt;/span&gt;    https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_23.2_all.deb &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; /tmp/screamingfrog.deb &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;rm&lt;/span&gt; /tmp/screamingfrog.deb

&lt;span class="c"&gt;# Copy entrypoint script&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; entrypoint.sh /entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /entrypoint.sh

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/entrypoint.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ubuntu 22.04&lt;/strong&gt;: Screaming Frog distributes as a .deb package, so any Debian-based image should work fine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;xvfb&lt;/strong&gt;: Screaming Frog requires a display even in CLI mode, xvfb provides a virtual one so it can run fully headless&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now create &lt;code&gt;entrypoint.sh&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="nv"&gt;URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CRAWL_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;OUTPUT_BASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT_DIR&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="p"&gt;/mnt/crawl-output&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Create a per-run subfolder: domain/YYYY-MM-DD_HHMMSS&lt;/span&gt;
&lt;span class="nv"&gt;DOMAIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'s|https?://([^/]+).*|\1|'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +&lt;span class="s2"&gt;"%Y-%m-%d_%H%M%S"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;OUTPUT_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT_BASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DOMAIN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$OUTPUT_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Setup license, machine ID, and EULA&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /root/.ScreamingFrogSEOSpider
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SF_LICENSE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /root/.ScreamingFrogSEOSpider/licence.txt
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SF_MACHINE_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /root/.ScreamingFrogSEOSpider/machine-id.txt
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /root/.ScreamingFrogSEOSpider/spider.config &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
eula.accepted=15
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Run the crawl&lt;/span&gt;
xvfb-run screamingfrogseospider &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--headless&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--crawl&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--output-folder&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$OUTPUT_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--export-tabs&lt;/span&gt; &lt;span class="s2"&gt;"Internal:All,External:All,Response Codes:All,Page Titles:All,Meta Description:All,H1:All,Images:All"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--overwrite&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--save-crawl&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Crawl completed successfully"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script does four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;License setup&lt;/strong&gt;: Writes the license key we stored in Secret Manager to the path Screaming Frog expects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Machine ID&lt;/strong&gt;: Writes the persistent UUID so each container run identifies as the same machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EULA acceptance&lt;/strong&gt;: Required for it to run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs the crawl&lt;/strong&gt;: &lt;code&gt;xvfb-run&lt;/code&gt; provides the virtual display, and we export a few selected tabs - feel free to edit.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So now in your directory you should have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├── Dockerfile
└── entrypoint.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploying the Cloud Run Job
&lt;/h2&gt;

&lt;p&gt;Deploy the job directly from source (this builds the image using Cloud Build and deploys in one command):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run &lt;span class="nb"&gt;jobs &lt;/span&gt;deploy screaming-frog-crawler &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--service-account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;screaming-frog-runner@&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.iam.gserviceaccount.com &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;16Gi &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--max-retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--task-timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3600 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/mnt/crawl-output &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--set-secrets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;SF_LICENSE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;screaming-frog-license:latest,SF_MACHINE_ID&lt;span class="o"&gt;=&lt;/span&gt;screaming-frog-machine-id:latest &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--add-volume&lt;/span&gt; &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;crawl-storage,type&lt;span class="o"&gt;=&lt;/span&gt;cloud-storage,bucket&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-crawl-output&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--add-volume-mount&lt;/span&gt; &lt;span class="nv"&gt;volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;crawl-storage,mount-path&lt;span class="o"&gt;=&lt;/span&gt;/mnt/crawl-output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build your Docker image using Cloud Build&lt;/li&gt;
&lt;li&gt;Push it to Artifact Registry automatically&lt;/li&gt;
&lt;li&gt;Create (or update) the Cloud Run Job&lt;/li&gt;
&lt;li&gt;Mount the bucket we created as a volume&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Some notes on configuration:
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;4 CPU, 16GB RAM&lt;/strong&gt; I found to be a good starting point for most crawls. Scale up to 8 CPU / 32GB for large sites (100K+ URLs).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Screaming Frog periodically checks available disk space and stops the crawl if it detects 5GB or less remaining. On Cloud Run, available memory serves as disk space - there's no separate disk allocation, even with the GCS mount. So while 2 CPU / 8GB technically works, you'll be cutting it close with Screaming Frog's 5GB threshold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: With 4 CPU / 16GB, expect roughly $0.15-0.20 per hour of crawl time. A typical 10K URL crawl takes 15-30 minutes, so around $0.05-0.10 per crawl. Storage costs are negligible for CSV exports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running crawls
&lt;/h2&gt;

&lt;p&gt;Manual execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run &lt;span class="nb"&gt;jobs &lt;/span&gt;execute screaming-frog-crawler &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--update-env-vars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;CRAWL_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check execution status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run &lt;span class="nb"&gt;jobs &lt;/span&gt;executions list &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--job&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;screaming-frog-crawler &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;View logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud logging &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s2"&gt;"resource.type=cloud_run_job AND resource.labels.job_name=screaming-frog-crawler"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Accessing crawl results
&lt;/h2&gt;

&lt;p&gt;You can browse and download files directly from the &lt;a href="https://console.cloud.google.com/storage/browser" rel="noopener noreferrer"&gt;GCP Console&lt;/a&gt; by navigating to your bucket. Or use the CLI:&lt;/p&gt;

&lt;p&gt;First create a folder where you want to store the files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ./crawl-output/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;List crawl outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gsutil &lt;span class="nb"&gt;ls &lt;/span&gt;gs://&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-crawl-output&lt;/span&gt;/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Download the latest crawl for a domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LATEST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gsutil &lt;span class="nb"&gt;ls &lt;/span&gt;gs://&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-crawl-output&lt;/span&gt;/domain.tld/ | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
gsutil &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LATEST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;*"&lt;/span&gt; ./crawl-output/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or download all crawls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gsutil &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; gs://&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-crawl-output&lt;/span&gt;/&lt;span class="k"&gt;*&lt;/span&gt; ./crawl-output/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output directory includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;internal_all.csv&lt;/code&gt; - All internal URLs discovered&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;external_all.csv&lt;/code&gt; - External links&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;response_codes_all.csv&lt;/code&gt; - HTTP status codes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;page_titles_all.csv&lt;/code&gt; - Page titles&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;meta_description_all.csv&lt;/code&gt; - Meta descriptions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;h1_all.csv&lt;/code&gt; - H1 tags&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;images_all.csv&lt;/code&gt; - Image inventory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;crawl.seospider&lt;/code&gt; - Full crawl file (open in Screaming Frog GUI)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkaugesaar.se%2Fimages%2Fscreaming-frog-result.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fkaugesaar.se%2Fimages%2Fscreaming-frog-result.png" title="Example of output" alt="Screaming Frog output example" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This gives you a working setup for running Screaming Frog crawls serverless. From here, you could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Add configuration files&lt;/strong&gt;: Use Screaming Frog's config files to standardize crawl settings across executions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement progress tracking&lt;/strong&gt;: Parse log output to report crawl progress in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build a web UI&lt;/strong&gt;: Create a simple interface for managing crawls and viewing results (hint: this is what we built)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add notifications&lt;/strong&gt;: Send alerts when crawls complete or fail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track changes&lt;/strong&gt;: Compare crawls over time to detect new issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once deployed, crawls run unattended and you only pay for what you use.&lt;/p&gt;

</description>
      <category>gcp</category>
      <category>tutorial</category>
      <category>automation</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
