Here's an example that I've used to get all the pages from Paul Graham's website:
$ wget --recursive --level=inf --no-remove-listing --wait=6 --random-wait --adjust-extension --no-clobber --continue -e robots=off --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"  --domains=paulgraham.com https://paulgraham.com
| Parameter | Description | 
|---|---|
| --recursive | Enables recursive downloading (following links) | 
| --level=inf | Sets the recursion level to infinite | 
| --no-remove-listing | Keep ".listing" files that are created to keep track of directory listings | 
| --wait=6 | Wait the given number of seconds between requests | 
| --random-wait | Multiplies --waitrandomly between 0.5 and 1.5 for each request | 
| --adjust-extension | Make sure that ".html" is added to the files | 
| --no-clobber | Do not redownload a file if exists locally | 
| --continue | Allows resuming downloading a partially downloaded file | 
| -e robots=off | Ignores robots.txtinstructions. | 
| --user-agent | Sends the given "User-Agent" header to the server | 
| --domains | Comma-separated list of domains to be followed | 
| --span-hosts | Allows navigating to subdomains | 
Other useful parameters:
| Parameter | Description | 
|---|---|
| --page-requisites | Downloads things as inlined images, sounds, and referenced stylesheets | 
| --span-hosts | Allows downloading files from links that point to different hosts | 
| --convert-links | Converts links to local links (allowing local viewing) | 
| --no-check-certificate | Bypasses SSL certificate verification. | 
| --directory-prefix=/my/directory | Sets up the destination directory. | 
| --include-directories=posts | Comma-separated list of allowed directories to be followed when crawling | 
| --reject "*?*" | Rejects URLs that contain query strings | 
 
 
              
 
    
Top comments (0)