D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Trying to download multiple pdf files using wget and wildcards

 

On Mon, Oct 13, 2008 at 11:56 AM, Henry Bremridge wrote:
> As I suspect that I will be trying to download multiple pdfs again: can anyone 
> point me in the right direction of how I can do this?

wget -r -nd -A*.pdf
http://www.hutchison-whampoa.com/eng/investor/annual/annual.htm

works for me, but it takes a while, for it downloads all links from
that page and then removes the ones that aren't PDF files. I'm sure
this can be made faster/more clever but I'm not too much of a wget
expert.

Using the directory URL
http://202.66.146.82/listco/hk/hutchison/annual/2007 in wget won't
work because the web server at 202.66.146.82 is configured not to tell
you the contents of this directory, so you can not but guess (from the
links on the annual.htm page) what files reside in there.

But, much as I like command-line options, the easiest thing to do is
to install the DownThemAll plugin for Firefox which can download many
links from a web page at once (and can be configured to only download,
say, PDF files).

Martijn.

-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html