Sindice Search API

Sindice API
Sindice API:

The Sindice API provides programmatic access to its search capabilities. Please refer for support questions.

Search API (v3)

The third version of the search API supports the following parameters:

  • The q parameter specifies the keyword query.
  • The nq parameter specifies the ntriple query.
  • The fq parameter specifies the filter query. The fq parameter can be specified multiple times. An intersection between the fq parameters will be performed. In the example below, only documents which have a format RDFa and a foaf:Person class will be part of the result set.
    http://api.sindice.com/v3/search?fq=format:RDFA&fq=class:foaf:person&format=json
  • the page parameter (optional) specifies the result page. Pages are 1-indexed, so the first page is 1, the second is 2 and so on. (default value is 1)
  • the sortbydate (optional) parameter is a boolean flag that specifies whether the results have to be sorted by date (they are sorted by relevance, otherwise).
  • The field parameter specifies the fields of the record to retrieve. Various fields can be retrieved at query time, which can be helpful to get more insight of the information contained in the document. Below is the list fo currently supported fields:
    • link: the url of the document. This field is always returned with the result.
    • title: the labels of the document. This field is always returned with the result.
    • ontology: the ontology closure of the document.
    • class: the list of classes used in the document.
    • predicate: the list of predicates used in the document.
    • domain: the third-level domain of the document.
    • explicit_content_size: the number of (explicit) triples in the document.
    • explicit_content_length: the number of bytes of the document.
    • updated: the date when documnet was last time indexed
    • formats: the list of source document formats the metadata was extracted from
    By default, the link, title, updated, formats, explicit_content_size and explicit_content_length fields are retrieved. The field can be specified multiple times. In the example below, only the fields link, title and class are retrieved for each records.
    http://api.sindice.com/v3/search?&field=title&field=class&format=json

Examples:

http://api.sindice.com/v3/search?q=Rome&fq=class:city&format=json
http://api.sindice.com/v3/search?q=Rome&fq=class:city&&field=updated&format=json
http://api.sindice.com/v3/search?nq=* <label> 'Rome'&fq=class:city&page=2&sortbydate=1&format=json

Query Limits

Sindice currently limits to 100 the number of result pages for each query. For special needs you can refer .

Result formats

You can negotiate the content and retrieve three different formats:

  • json: curl -H "Accept: application/json" "http://api.sindice.com/v3/search?q=gabriele&qt=term&page=1
  • rdf: curl -H "Accept: application/rdf+xml" "http://api.sindice.com/v3/search?q=gabriele&qt=term&page=1
  • atom: curl -H "Accept: application/atom+xml" "http://api.sindice.com/v3/search?q=gabriele&qt=term&page=1

The result format is composed of three sections:

  • metadata
  • query
  • result entries

Metadata

The metadata section is composed of metadata information about the search results and links to navigate through the results.

  • totalResults: number of total results
  • updated: generation time of this search
  • base: base url, without the specific page
  • author: author field, Sindice.com
  • itemPerPage: number of items per page
  • startIndex: starting index in this page
  • link: url of this result page
  • previous,self, next, first, last: url of previous, self, next, first and last page of results
  • alternate: link to the HTML alternate representation for this page, i.e., the sindice frontend page.
  • cache_batch: link to the sindice cache api which allow to fetch cache version of all documents at once ( only json format )

Query

A query object containing information about the request that has produced the current search results. The query object is composed of the parameters of the search API, i.e., q, nq, fq, page, sortbydate, field.

Result Entries

The last section is the list of search result entries. Each entry is composed of the fields to be retrieved, defined by the field parameter of the search API. In addition, each entry contain an additional field, cache, which provides a link to fetch the content of the result from the Sindice cache.

Response Format Example

JSON
{
 "updated": "2008/06/03 18:27:29 \+0100",
 "base": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term"
 "totalResults": 211,
 "search": "http://www.sindice.com/opensearch.xml",
 "self": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term\u0026page=1",
 "previous": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term\u0026page=",
 "title": "Sindice search: gabriele",
 "last": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term\u0026page=22",
 "alternate": "http://sindice.com/v3/search?q=gabriele\u0026qt=term",
 "author": "Sindice.com",
 "first": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term\u0026page=1",
 "itemsPerPage": 10,
 "startIndex": 1,
 "next": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term\u0026page=2",
 "link": "http://api.sindice.com/v3/search?q=gabriele\u0026qt=term\u0026page=1",
 "batch_cache":"http://api.sindice.com/v3/cache?field=explicit_content&output=json
 &url=URL1&url=URL2....&url=URL10"
 "query":
  {
   "role": "request",
   "startPage": 1,
   "searchTerms": "gabriele"
  },
 "entries":
  [
   {
    "title": ["Gabriele Albertini"],
    "formats": ["RDF"],
    "content": "183 triples in 32484 bytes",
    "link": "http://dbpedia.org/resource/Gabriele_Albertini",
    "updated": "2008/05/23"
   },
   {
    "title": ["Gabriele Paonessa"],
    "formats": ["RDF"],
    "content": "111 triples in 16153 bytes",
    "link": "http://dbpedia.org/resource/Gabriele_Paonessa",
    "updated": "2008/05/23"
   },
  ...
  ]
}

The format closely matches the OpenSearch format, so refer to that for further details, the only two differences are the title field in the entry, which is a list (a document can have different labels) and the format field which is a list of the formats found in one page (for example, RDFa and microformats).

ATOM
<?xml version="1.0" encoding="iso-8859-1"?>
<feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
      xmlns:sindice="http://sindice.com/vocab/fields#"
      xmlns="http://www.w3.org/2005/Atom">
  <title>Sindice search: gabriele</title>
  <link href="http://api.sindice.com/v3/search?page=1&amp;q=gabriele&amp;qt=term"/>
  <updated>2008-06-03T19:50:39+01:00</updated>
  <author>
    <name>Sindice.com</name>
  </author>
  <id>http://api.sindice.com/v3/search?page=1&amp;q=gabriele&amp;qt=term</id>
  <opensearch:totalResults>211</opensearch:totalResults>
  <opensearch:startIndex>1</opensearch:startIndex>
  <opensearch:itemsPerPage>10</opensearch:itemsPerPage>
  <opensearch:Query role="request" startPage="1" searchTerms="gabriele"/>
  <link href="http://sindice.com/search?page=1&amp;q=gabriele&amp;qt=term"
        rel="alternate" type="text/html"/>
  <link href="http://api.sindice.com/v3/search?page=1&amp;q=gabriele&amp;qt=term"
        rel="first" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v3/search?q=gabriele&amp;qt=term"
        rel="previous" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v3/search?page=2&amp;q=gabriele&amp;qt=term"
        rel="next" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v3/search?page=22&amp;q=gabriele&amp;qt=term"
        rel="last" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v3/search?page=1&amp;q=gabriele&amp;qt=term"
        rel="self" type="application/atom+xml"/>
  <link href="http://www.sindice.com/opensearch-term.xml"
        rel="search" type="application/opensearchdescription+xml"/>
  <entry>
    <title>Gabriele Albertini</title>
    <link href="http://dbpedia.org/resource/Gabriele_Albertini"/>
    <id>http://dbpedia.org/resource/Gabriele_Albertini</id>
    <updated>2008-05-23T00:00:00+01:00</updated>
    <sindice:format>RDF</sindice:format>
    <content>183 triples in 32484 bytes</content>
  </entry>
  <entry>
    <title>Gabriele Paonessa</title>
    <link href="http://dbpedia.org/resource/Gabriele_Paonessa"/>
    <id>http://dbpedia.org/resource/Gabriele_Paonessa</id>
    <updated>2008-05-23T00:00:00+01:00</updated>
    <sindice:format>RDF</sindice:format>
    <content>111 triples in 16153 bytes</content>
  </entry>
</feed>

It is a simple ATOM file, plus the OpenSearch schema plus a single additional tag for carrying informations about the document format. You should be able to parse this easily with any XML parser.

RDFXML

The RDF representation defines the base search URI as a search:Query object, which has links to search:Pages, each one having many search:Results. the other fields should be obvious, and mimic the other searches.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:fields="http://sindice.com/vocab/fields#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns="http://sindice.com/vocab/search#">
  <Query rdf:about="http://api.sindice.com/v3/search?q=gabriele&qt=term">
    <dc:title>Sindice search: gabriele</dc:title>
    <dc:date>2010-09-26T20:57:34+01:00</dc:date>
    <dc:creator>Sindice.com</dc:creator>
    <totalResults>16637</totalResults>
    <itemsPerPage>10</itemsPerPage>
    <searchTerms>gabriele</searchTerms>
    <first rdf:resource="http://api.sindice.com/v3/search?q=gabriele&qt=term&page=1"/>
    <last rdf:resource="http://api.sindice.com/v3/search?q=gabriele&qt=term&page=100"/>
    <page rdf:resource="http://api.sindice.com/v3/search?q=gabriele&qt=term&page=1"/>
    <opensearchDescription rdf:resource="http://www.sindice.com/opensearch.xml"/>
    <result rdf:resource="#result1"/>
    <result rdf:resource="#result2"/>
    ...
  </Query>
  <Page rdf:about="http://api.sindice.com/v3/search?q=gabriele&qt=term&page=1">
    <dc:title>Sindice search: gabriele [page 1]</dc:title>
    <startIndex>1</startIndex>
    <previous rdf:resource="http://api.sindice.com/v3/search?q=gabriele&qt=term"/>
    <next rdf:resource="http://api.sindice.com/v3/search?q=gabriele&qt=term&page=2"/>
    <foaf:page rdf:resource="http://api.sindice.com/search?page=1&q=gabriele&qt=term"/>
  </Page>
  <Result rdf:about="#result1">
    <dc:title>Gabriele Tarquini</dc:title>
    <link rdf:resource="http://dbpedia.org/resource/Gabriele_Tarquini"/>
    <dc:created>2009-09-19T00:00:00+01:00</dc:created>
    <fields:format>RDF</fields:format>
    <content>59 triples in 9131 bytes</content>
    <rank>1</rank>
  </Result>
  <Result rdf:about="#result2">
    <dc:title>Gabriele</dc:title>
    <link rdf:resource="http://www.toprural.co.uk/Rural-hotel/Gabriele_49460_f.html"/>
    <dc:created>2010-09-21T00:00:00+01:00</dc:created>
    <fields:format>MICROFORMAT</fields:format>
    <fields:format>ADR</fields:format>
    <fields:format>RDFA</fields:format>
    <fields:format>HCARD</fields:format>
    <content>29 triples in 3374 bytes</content>
    <rank>2</rank>
  </Result>
  ...
</rdf:RDF>

Integrating JSON in your script

If you want, you can add an additional argument to the request called callback, which will cause the code to be wrapped in a function with the name you choose.
This allows clean integration of the Sindice results in your webpage, for example:

<script type="text/javascript"
src="http://api.sindice.com/v3/search?q=mike&format=json
&callback=showSindiceResults"/>

Notice that to force the rendering of JSON output we added an additional parameter format. It can obviously be used with values atom and rdfxml

Other API versions

Currently, our API Version is 3, with base address http://api.sindice.com/v3/search
As new APIs will be released, the old one will be kept at the existing locations.

The API v2 is still accessible at http://api.sindice.com/v2/search.
The documentation of API v2 is accessible here.