// you’re reading...

How in the Tech

Downloading an Entire Web Site with wget

GNU wget is a free utility for non-interactive download of files from the Web, generally available on most Linux installs.  It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. Wget is non-interactive, meaning that it can work in the background, while the user is not logged on.  This allows you to start a retrieval and disconnect from the system, letting wget finish the work.  By contrast, most of the Web browsers require constant user’s presence, which can be a great hindrance when transferring a lot of data.

One of the powerful features of wget is its ability to retreive a complete mirror of a website, locally, on your harddrive. It is also intelligent enough to only download links or files associated with the website, without traversing external links. Though it would be a generous public service, you probably wouldn’t want to mirror the entire Internet!

Here’s how you do it.

From a Linux shell:

$ wget -mk -w 10 http://www.google.com/

  • -m instructs wget to enter mirroring mode
  • -k instructs wget to convert links in the webpages downloaded to local links
  • -w 10 instructs wget to delay 10 seconds between requests. This is really just to maintain proper net etiquette; it’s not required

Not running Linux or otherwise have no access to wget? Don’t fret, wget has been ported to Windows. Wget is a powerful utility that I’ve barely scratched the surface here. Find out more on wget by reading the associated manpages.

If you want to be notified the next time I write something please subscribe to my RSS feed. Thanks for reading!

Related Posts

Discussion

No comments for “Downloading an Entire Web Site with wget”

Post a comment