D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Trainable web site picker...

 

On Sunday 16 November 2008 18:38, Dave Berkeley wrote:
> I've done a bit of website "screen scraping". It can be difficult,
> depending on cookies, javascript etc. But for simple sites you can parse
> and traverse them quite quickly.
>
> I developed a set of tools for fetching train ticket prices to allow you to
> break a journey down into single stages. You can often save 40% of the
> normal price using this technique.
>
> The tools we used were python, urllib and url2lib, the HTML parser was
> BeautifulSoup, which is really easy to use.
>
> http://www.crummy.com/software/BeautifulSoup/
>
> Combined with the elementtree XML library, gives ElementSoup
>
> http://effbot.org/zone/element-soup.htm
>
> For simple sites, these tools, and perhaps a bit of regex handling, will
> give you everything you want. But you will have to code it instead of
> training it.
I think for getting the music or similar then coding is for me not too bad but 
for supermarkets (hi Chronoppolis + welcome! Afghanistan?) they seem to 
change things regularly to prevent this sort of thing so lots of coding or 
that one step further and the trainer......How I'd love to be able to stick a 
shopping list into my pc, click a button and zip around two or three 
supermarkets for the heapes deals! But what will I do with 200 cases of 
undrinkable cheap lager?
Tom te tom te tom

>
> D
>
> On Sunday 16 November 2008 17:21:03 Chronoppolis wrote:
> > Hello,
> >
> > This may be my very first post (i dont remember - but yay for me). I have
> > not got my linux projects to the point where i can ask concise questions
> > ad so have just enjoyed the emails as a source of great interest. at some
> > point i will post the various projects i am pursuing and issues i face
> > but not today.
> >
> > This last post of yours tom particularly caught my eye as i have a very
> > complicated project and a spider that would hunt through various
> > supermarkets websites for me would be Unbelievably helpful - i would
> > certainly be very interested in any further information you have about
> > this or how one would go about it.
> >
> > I am a newbie programmer and am self teaching myself with a couple of
> > friends as mentors, so this will be a very newbie question. What are the
> > components necissary to create a spider program? is it something that has
> > to be made for each site individually? if the website in question updates
> > will this stop the spider from working? i have other questions but those
> > are the basic ones
> >
> > Dan
> >
> > On Sun, Nov 16, 2008 at 9:17 AM, Tom Potts
>
> <tompotts@xxxxxxxxxxxxxxxxxxxx>wrote:
> > > I've just been playin with Audiveris which is a well cool (showing my
> > > age her)
> > > Java app that takes a sheet music image and converts it to Midi or
> > > musicxml so someone like me who cant seem to learn to read sheet music
> > > can play scores.
> > > There are quite a few archives out there with out of copyright material
> > > available and I'd like to try converting a lot to MusicXML.
> > > I'd like to automate the downloading of the images but get rid of the
> > > detritus.
> > > I want a trainable spider that I can show the 'root' page of the
> > > collection,
> > > click on a table or ddl and set that as the repeat action, then go down
> > > to another level and get to (say composer) level, make a local
> > > directory, then click to a song, make a local directory, drill down and
> > > get the associated image(s), return to composer get next song, , back
> > > to root get next part of collection.......
> > > It occured to me something like this might also be useful for pulling
> > > prices
> > > from supermarket web sites for a comparison site as they seem to change
> > > there
> > > arrangments to try and make this difficult - 'Competition? We love it
> > > we just
> > > do everything we can to stop it...'
> > >
> > > Tom te tom te tom
> > >
> > >
> > > --
> > > The Mailing List for the Devon & Cornwall LUG
> > > http://mailman.dclug.org.uk/listinfo/list
> > > FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html


-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html