|
#1
|
|||
|
|||
|
SindiceFetcher/0.1
Is the code behind SindiceFetcher available? I noticed in one of my logfiles that it appears you've extended Nutch to do the crawling of the linked-data web?
140.203.154.194 - - [21/Aug/2008:06:47:56 -0400] "GET /robots.txt HTTP/1.0" 404 9 "-" "SindiceBot/Nutch-1.0-dev (http://sindice.com/developers/bot)" "application/rdf+xml;q=1, application/xml;q=0.6, text/xml;q=0.6, application/xhtml+xml;q=0.75, text/html;q=0.7" //Ed |
|
#2
|
|||
|
|||
|
Ed, you're right, we are using a customized version of Nutch as our crawler. We will likely contribute some of our work back to the Nutch project at some point. Right now, it's still too early because we still make a lot of changes to the code.
|
| Tags |
| bots, crawling, harvesting, http |
| Thread Tools | |
| Display Modes | |
|
|