Downloading a website for offline browsing can be useful for documentation, tutorials, or any web content you want to reference without an internet connection.
wget
is a command-line tool that makes this process easy and efficient. Using the mirroring feature, you can create a local copy of a website that is fully navigable.
In this guide, we will explore the concept of mirroring and go through the most important wget
options that make it work.
What is Wget Mirroring
Mirroring is the process of downloading a website recursively while keeping its structure intact. Wget follows links, downloads all pages and necessary resources like images, scripts, and stylesheets, and rewrites links so that the website works offline. This way, you can open the local copy in a browser and navigate it as if you were online.
--mirror
This option is the backbone of website mirroring. It turns on recursion and time-stamping, telling wget
to follow links on the site and download all reachable pages. When you run it again later, only the updated pages are fetched, which saves bandwidth and keeps your local copy current.
--convert-links
After downloading, web pages often still point to the original site. This option rewrites those links to point to your local files. Clicking around your offline copy works exactly as it would online, without sending requests to the internet.
--adjust-extension
Web servers sometimes serve pages without a .html
extension. This can prevent browsers from opening them correctly. --adjust-extension
ensures all pages have the proper .html
extension, keeping your offline site functional.
--page-requisites
A web page is rarely just HTML. Images, CSS, JavaScript, and fonts are required for proper rendering. --page-requisites
downloads all these supporting files so your local copy looks and works like the original.
--no-parent
This option limits wget
to the target directory. For example, if you are mirroring example.com/docs
, it prevents wget
from going up to the root domain. It keeps the download focused and prevents unnecessary files from being fetched.
--wait=2
Downloading an entire website can generate many requests in a short time. --wait=2
adds a pause between requests, here two seconds, to reduce the load on the server and act politely.
--limit-rate=200k
Sometimes you want to control the speed of the download to avoid using all your bandwidth. --limit-rate=200k
caps the download speed at 200 kilobytes per second, allowing you to continue using your internet connection while mirroring.
--domains=emojipedia.org
Websites often link to external domains. This option restricts wget
to only the specified domain. It prevents unrelated content from being downloaded and keeps your mirror clean.
--reject-regex=
Sometimes you want to skip certain URLs, like ads, tracking scripts, or dynamic pages. --reject-regex=
allows you to define patterns for URLs that should not be downloaded. This keeps your offline copy relevant and smaller in size.
-P
-P
sets the folder where wget
stores the downloaded files. It keeps the mirrored site organized in one location instead of scattering files in the current working directory.
Example: Mirroring a Website with Wget
wget --mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--wait=2 \
--limit-rate=200k \
--domains=samplesite.org \
--reject-regex=".*(ads|social).*" \
-P ./sample_site \
https://samplesite.org
Here is how the options are used.
-
--mirror
starts recursive downloading -
--convert-links
makes offline browsing seamless -
--adjust-extension
ensures files open correctly -
--page-requisites
grabs images, CSS, and scripts -
--no-parent
keeps the crawl focused on the smileys section -
--wait=2
and--limit-rate=200k
make the download polite -
--domains
keeps wget within samplesite.org -
--reject-regex
avoids unnecessary links like ads or social sharing buttons -
-P ./sample_site
saves everything in a dedicated folder
After running this, you will have an offline copy of the website
Wrapping up
With these options, wget
can create a fully functional offline copy of a website. Mirroring is not just about saving pages. It preserves the structure, links, and resources, making the site usable without an internet connection. By combining recursion, link conversion, and careful control over request rates and domains, you can build clean and reliable mirrors.
If you’ve ever struggled with repetitive tasks, obscure commands, or debugging headaches, this platform is here to make your life easier. It’s free, open-source, and built with developers in mind.
👉 Explore the tools: FreeDevTools
👉 Star the repo: freedevtools
Top comments (1)
Tip: use --cut-dirs=N to trim leading path segments.