D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Trainable web site picker...

 

On Sun, 16 Nov 2008 09:17:30 +0000, Tom Potts
<tompotts@xxxxxxxxxxxxxxxxxxxx> wrote:

>I've just been playin with Audiveris which is a well cool (showing my age her) 
>Java app that takes a sheet music image and converts it to Midi or musicxml 
>so someone like me who cant seem to learn to read sheet music can play 
>scores.
>There are quite a few archives out there with out of copyright material 
>available and I'd like to try converting a lot to MusicXML.
>I'd like to automate the downloading of the images but get rid of the 
>detritus.
>I want a trainable spider that I can show the 'root' page of the collection, 
>click on a table or ddl and set that as the repeat action, then go down to 
>another level and get to (say composer) level, make a local directory, then 
>click to a song, make a local directory, drill down and get the associated 
>image(s), return to composer get next song, , back to root get next part of 
>collection.......
>It occured to me something like this might also be useful for pulling prices 
>from supermarket web sites for a comparison site as they seem to change there 
>arrangments to try and make this difficult - 'Competition? We love it we just 
>do everything we can to stop it...'
>
>Tom te tom te tom

I don't know if it will do your job, but I found a link to snatch
on the Distributed Proofreaders forum. Quite a few Content
Providers use it to grab images of book pages for OCR. Here is a
direct link to the snatch homepage to save you having to register
with DP (to see the forum)

http://caw.homelinux.net/dp/snatch/

(original link from -

http://www.pgdp.net/phpBB2/viewtopic.php?t=4089&postdays=0&postorder=asc&start=105
)

best regards
Dave
-- 
http://www.morgad.co.uk/index.html    gpg:0x64B5E037 
Distributed Proofreaders: http://www.pgdp.net
The NTP server pool http://www.pool.ntp.org
The L&B is being rebuilt! http://www.lynton-rail.co.uk

-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html