DEV Community

Ko Takagi
Ko Takagi

Posted on • Edited on

3 1

Download website files entirely using wget

If you want to create a local mirror site, you can download a set of websites by wget.

Usage

wget -P /path/to/download -E -k -m -nH -np -p -c https://example.com
Enter fullscreen mode Exit fullscreen mode
Option Overview
-P Set save directory path.
-E This option will cause the suitable suffix to be appended to the local filename.
-k After the download is complete, convert the links in the document to make them suitable for local viewing.
-m Turn on options suitable for mirroring.
-nH Disable generation of host-prefixed directories.
-np Do not ever ascend to the parent directory when retrieving recursively.
-p This option causes Wget to download all the files that are necessary to properly display a given HTML page.
-c Continue getting a partially-downloaded file.

With basic authentication

wget -P /path/to/download -E -k -m -nH -np -p -c --http-user=username --http-password=password https://example.com
Enter fullscreen mode Exit fullscreen mode
Option Overview
--http-user Set username.
--http-password Set password.
Retry later

Top comments (4)

Collapse
 
kwabenasapong profile image
kwabenasapong

How will you download the website if it requires authentication using a username, password and an authenticity token? I tried the following below but I get stuck on the sign-in page;

!/usr/bin/env bash

username=username
password=password
code=wget -qO- https://urlname/sign_in service=https://urlname.io | cat | grep 'name="lt"' | cut -d"_" -f2
hidden_code=_$code
wget --save-cookies cookies.txt \
--keep-session-cookies \
--post-data 'username=$username&password=$password&lt=$hidden_code&_eventId=submit' \
--auth-no-challenge
--delete-after \
urlname/sign_in?service=https://ur...

wget --load-cookies cookies.txt \
urlname.io

Collapse
 
aurelmegn profile image
Aurel • Edited

There is an important option which is -c. It helps to continue the download of the file.

Collapse
 
ko31 profile image
Ko Takagi

Thanks! I updated -c option.

Collapse
 
alex24409331 profile image
alex24409331

awesome article thank you. also, I have found another site scraper service. Maybe it will help someone too. e-scraper.com/useful-articles/down...

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Retry later