GNU wget is a free utility for non-interactive download of files from the Web, generally available on most Linux installs. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting wget finish the work. By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.
One of the powerful features of wget is its ability to retreive a complete mirror of a website, locally, on your harddrive. It is also intelligent enough to only download links or files associated with the website, without traversing external links. Though it would be a generous public service, you probably wouldn’t want to mirror the entire Internet!
Here’s how you do it.
From a Linux shell:
$ wget -mk -w 10 http://www.google.com/
- -m instructs wget to enter mirroring mode
- -k instructs wget to convert links in the webpages downloaded to local links
- -w 10 instructs wget to delay 10 seconds between requests. This is really just to maintain proper net etiquette; it’s not required
Not running Linux or otherwise have no access to wget? Don’t fret, wget has been ported to Windows. Wget is a powerful utility that I’ve barely scratched the surface here. Find out more on wget by reading the associated manpages.
If you want to be notified the next time I write something please subscribe to my RSS feed. Thanks for reading!
Discussion
No comments for “Downloading an Entire Web Site with wget”
Post a comment