If you're anything like me, you like to download things. And sometimes, it's too cumbersome to right click > Save As...
each item on a webpage. The solution to your problem sits in your terminal: the wget
utility. If we add a few options, wget
becomes a beast of a website downloader, and is capable of pulling an entire site for offline viewing, include all of the linked files.
All you have to do is copy & paste your desired URL into the following terminal command:
$ wget -mkEpnp WEBPAGE-URL
The options -mkEpnp
are specified below (pulled from the man
page):
-m
(aka --mirror
): Turns on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing
.
-k
(aka --convert-links
): Converts links for offline viewing.
-E
(aka --adjust-extension
): Adds proper filename extensions to downloaded files.
-p
(aka --page-requisites
): Downloads images, sounds, stylesheets, and other required files for proper offline site rendering.
-np
(aka --no-parent
): Prevents retrieval of the parent directory. Guarantees that only files below a certain hierarchy will be downloaded.
More fun wget
options:
$ --execute robots=off #ignore robots.txt
$ --wait=30 #be gentle, wait between fetch requests
$ --random-wait #waits for a random amount of time before fetch requests
$ --user-agent=Mozilla #sends a mock user agent with each request
Happy downloading! Oh and... I can't be held responsible if you suddenly find yourself investing in a home server setup, NAS drives, or the like.
Top comments (1)
How will you download the website if it requires authentication using a username, password and an authenticity token? I tried the following below but I get stuck on the sign-in page as it only downloads that for me;
!/usr/bin/env bash
username=username
password=password
code=
wget -qO- https://urlname/sign_in service=https://urlname.io | cat | grep 'name="lt"' | cut -d"_" -f2
hidden_code=_$code
wget --save-cookies cookies.txt \
--keep-session-cookies \
--post-data 'username=$username&password=$password<=$hidden_code&_eventId=submit' \
--auth-no-challenge
--delete-after \
urlname/sign_in?service=https://ur...
wget --load-cookies cookies.txt \
urlname.io