Difference between revisions of "Spiders"
Jump to navigation
Jump to search
(Created page with "Spiders, in this context, are things that index the web. So you might also call them indexers. A long time ago I wrote a spider. If I ever get around to digging up that...") |
(No difference)
|
Revision as of 13:16, 16 February 2015
Spiders, in this context, are things that index the web. So you might also call them indexers.
A long time ago I wrote a spider. If I ever get around to digging up that old code, here is where I might find it. Lots of other people are making interesting spiders that you can use.
Portia is one example from the folks over in Cork, Ireland at Scrapinghub. I won't repeat their documentation here needlessly, but I will note my experiences with the tools. I wanted to scrape questions and answers from OKCupid, but so far Portia can't handle the JavaScript login. I need to deconstruct it more to find out what the solution might be.