Rijul Rajesh

Posted on Oct 6

How to Use Wget to Mirror Websites for Offline Browsing

#webdev #programming #wget #linux

Downloading a website for offline browsing can be useful for documentation, tutorials, or any web content you want to reference without an internet connection.

wget is a command-line tool that makes this process easy and efficient. Using the mirroring feature, you can create a local copy of a website that is fully navigable.

In this guide, we will explore the concept of mirroring and go through the most important wget options that make it work.

What is Wget Mirroring

Mirroring is the process of downloading a website recursively while keeping its structure intact. Wget follows links, downloads all pages and necessary resources like images, scripts, and stylesheets, and rewrites links so that the website works offline. This way, you can open the local copy in a browser and navigate it as if you were online.

`--mirror`

This option is the backbone of website mirroring. It turns on recursion and time-stamping, telling wget to follow links on the site and download all reachable pages. When you run it again later, only the updated pages are fetched, which saves bandwidth and keeps your local copy current.

`--convert-links`

After downloading, web pages often still point to the original site. This option rewrites those links to point to your local files. Clicking around your offline copy works exactly as it would online, without sending requests to the internet.

`--adjust-extension`

Web servers sometimes serve pages without a .html extension. This can prevent browsers from opening them correctly. --adjust-extension ensures all pages have the proper .html extension, keeping your offline site functional.

`--page-requisites`

A web page is rarely just HTML. Images, CSS, JavaScript, and fonts are required for proper rendering. --page-requisites downloads all these supporting files so your local copy looks and works like the original.

`--no-parent`

This option limits wget to the target directory. For example, if you are mirroring example.com/docs, it prevents wget from going up to the root domain. It keeps the download focused and prevents unnecessary files from being fetched.

`--wait=2`

Downloading an entire website can generate many requests in a short time. --wait=2 adds a pause between requests, here two seconds, to reduce the load on the server and act politely.

`--limit-rate=200k`

Sometimes you want to control the speed of the download to avoid using all your bandwidth. --limit-rate=200k caps the download speed at 200 kilobytes per second, allowing you to continue using your internet connection while mirroring.

`--domains=emojipedia.org`

Websites often link to external domains. This option restricts wget to only the specified domain. It prevents unrelated content from being downloaded and keeps your mirror clean.

`--reject-regex=`

Sometimes you want to skip certain URLs, like ads, tracking scripts, or dynamic pages. --reject-regex= allows you to define patterns for URLs that should not be downloaded. This keeps your offline copy relevant and smaller in size.

`-P`

-P sets the folder where wget stores the downloaded files. It keeps the mirrored site organized in one location instead of scattering files in the current working directory.

Example: Mirroring a Website with Wget

wget --mirror \
     --convert-links \
     --adjust-extension \
     --page-requisites \
     --no-parent \
     --wait=2 \
     --limit-rate=200k \
     --domains=samplesite.org \
     --reject-regex=".*(ads|social).*" \
     -P ./sample_site \
     https://samplesite.org

Here is how the options are used.

--mirror starts recursive downloading
--convert-links makes offline browsing seamless
--adjust-extension ensures files open correctly
--page-requisites grabs images, CSS, and scripts
--no-parent keeps the crawl focused on the smileys section
--wait=2 and --limit-rate=200k make the download polite
--domains keeps wget within samplesite.org
--reject-regex avoids unnecessary links like ads or social sharing buttons
-P ./sample_site saves everything in a dedicated folder

After running this, you will have an offline copy of the website

Wrapping up

With these options, wget can create a fully functional offline copy of a website. Mirroring is not just about saving pages. It preserves the structure, links, and resources, making the site usable without an internet connection. By combining recursion, link conversion, and careful control over request rates and domains, you can build clean and reliable mirrors.

If you’ve ever struggled with repetitive tasks, obscure commands, or debugging headaches, this platform is here to make your life easier. It’s free, open-source, and built with developers in mind.

👉 Explore the tools: FreeDevTools

👉 Star the repo: freedevtools

Top comments (1)

Sherry Day • Oct 6

Tip: use --cut-dirs=N to trim leading path segments.