Sindice Forum > Sindice Services > Getting your data published

 
Thread Tools Display Modes
  #1  
08-22-2008, 06:15 PM
edsu
Junior Member
 
Join Date: Aug 2008
Posts: 1
SindiceFetcher/0.1

Is the code behind SindiceFetcher available? I noticed in one of my logfiles that it appears you've extended Nutch to do the crawling of the linked-data web?

140.203.154.194 - - [21/Aug/2008:06:47:56 -0400] "GET /robots.txt HTTP/1.0" 404 9 "-" "SindiceBot/Nutch-1.0-dev (http://sindice.com/developers/bot)" "application/rdf+xml;q=1, application/xml;q=0.6, text/xml;q=0.6, application/xhtml+xml;q=0.75, text/html;q=0.7"

//Ed
  #2  
08-25-2008, 10:07 AM
Richard Cyganiak
Administrator
 
Join Date: May 2008
Posts: 13

Ed, you're right, we are using a customized version of Nutch as our crawler. We will likely contribute some of our work back to the Nutch project at some point. Right now, it's still too early because we still make a lot of changes to the code.

Tags
bots, crawling, harvesting, http

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 12:56 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.