wget command to mirror entire sites effectively

To mirror virtually any site neatly and completely, use the below command variations of WGET on any *nix system depending on the situation:

Full-speed crawl, when the host doesn't care or is powerful enough (most cases)

wget --mirror --convert-links --adjust-extension --page-requisites -r -p -e robots=off -U mozilla [URL WITH HTTP/HTTPS AND WWW]

Randomly-timed crawl; when host has potential to blacklist you

wget --mirror --random-wait --convert-links --adjust-extension --page-requisites -r -p -e robots=off -U mozilla [URL WITH HTTP/HTTPS AND WWW]

Explanations and Sources

Basic command structure is from: here.

"-r -p -e robots=off -U mozilla" parameters are from: here.

Both links contain explanations.


why mirror websites?

You might be confused as to why one would want to mirror a website, let alone an entire website with all of its files. We've got WIFI, right? Can't we just go to the website when we need it?

While websites can contain amazingly useful information, they are inherently volatile. They're run by servers that need to stay turned on 24/7, creating the illusion of it being available on-demand. This, of course, means that they might be here one day and gone forever the next day, hence they cannot ever completely be relied on to be available at every given moment. The reason you would want to mirror a website, then, especially if it contains important or obscure information, is to preserve its information and put the availability of that information into your own control. This inevitably applies to all of the media on the Internet.