Sindice Search API v2

Sindice API
Sindice API:

The Sindice API provides programmatic access to its search capabilities. Please refer for support questions.

Query services (v2)

There are two types of search in the new API: term search and advanced search.
In general these APIs are based on the OpenSearch 1.1 specification.

  • the q parameter specifies the query
  • the page parameter (mandatory) specifies the result page. Pages are 1-indexed, so the first page is 1, the second is 2 and so on.
  • the qt parameter must be either "term" or "advanced" to select between term Search and Triple Search.
  • the sortbydate parameter is a boolean flag that specifies whether the results have to be sorted by date (they are sorted by relevance, otherwise).

Example:

http://api.sindice.com/v2/search?q=Rome&qt=term&page=1

http://api.sindice.com/v2/search?q=Rome&qt=term&page=1&sortbydate=1

Term Search

Term Search allows you to retrieve documents that are related to keywords and or URIS.
to activate the Term Search use qt=term in the query parameters. Example:

http://api.sindice.com/v2/search?q=Rome&qt=term

Currently, term search enjoys better ranking and is in general more suitable when searching for user provided strings.
Term search automatically parses URIs and uses them to look at URIs inside the RDF. Example:

http://api.sindice.com/v2/search?q=Giovanni+Tummarello+http%3A%2F%2Frichard.cyganiak.de%2Ffoaf.rdf%23cygri&qt=term&page=1

In term search three date string might be combined together with the keyword to restrict results:

date:today - shows results only from today
date:last_week - shows results from -7 days to now
date:last_month - shows results from -31 days to now

http://api.sindice.com/v2/search?q=Giovanni+Tummarello+date:today&qt;=term&page;=1

http://api.sindice.com/v2/search?q=Giovanni+Tummarello+date:last_week&qt;=term&page;=1

http://api.sindice.com/v2/search?q=Giovanni+Tummarello+date:last_month&qt;=term&page;=1


This can be combined with sortbydate parameter. Example

http://api.sindice.com/v2/search?q=Giovanni+Tummarello+date:last_week&qt;=term&page;=1&sortbydate;=1

For the complete documentation of the Advanced Search query language see http://sindice.com/developers/api#QueryLanguages.

Advanced Search

Advanced Search allows the use of triple level expressions in the query. Example

http://api.sindice.com/v2/search?q=*+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%3E+%22Renaud+Delbru%22&qt=advanced&page=1

will locate RDF that contain resources which have "foaf:name" "Renaud Delbru".

For the complete documentation of the Advanced Search query language see http://sindice.com/developers/api#QueryLanguages.

Combined Search

Just like the advanced search, but with an additional parameter that specifies a term query. This additional query will be combined with the advanced query using an AND operator. For example:

http://api.sindice.com/v2/search?q=*+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%3E+%22Renaud+Delbru%22&qt=combined&page=1&qv=michele

will locate those resources which have "foaf:name" "Renaud Delbru" and contain the word "michele".

Result formats

You can negotiate the content and retrieve three different formats:

  • json: curl -H "Accept: application/json" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
  • rdf: curl -H "Accept: application/rdf+xml" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
  • atom: curl -H "Accept: application/atom+xml" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1

The basic format has three "groups" of fields :

  • generation time of this search
  • base url, without the specific page
  • number of total results
  • url of this result page
  • url of previous, next, first and last page of results
  • link to the HTML alternate representation for this page, in the normal sindice website
  • author field, Sindice.com
  • number of items per page
  • starting index in this page
  • a Query object with fields that allow replaying of this query (search Term, page, role)

then there is a list of entries, each one has

  • title, a list of the document labels in JSON and RDF, and a single field with comma separated strings for Atom (we can't change the spec)
  • formats, a list, for example RDFa and Microformat
  • content, a simple string such as: "13 triples in 1000 bytes"
  • link, the document URI
  • updated, the document modification date

In specific, a JSON-encoded object looks like this:

{
 "updated": "2008/06/03 18:27:29 \+0100",
 "base": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term"
 "totalResults": 211,
 "search": "http://www.sindice.com/opensearch.xml",
 "self": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
 "previous": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=",
 "title": "Sindice search: gabriele",
 "last": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=22",
 "alternate": "http://sindice.com/v2/search?q=gabriele\u0026qt=term",
 "author": "Sindice.com",
 "first": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
 "itemsPerPage": 10,
 "startIndex": 1,
 "next": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=2",
 "query":
  {
   "role": "request",
   "startPage": 1,
   "searchTerms": "gabriele"
  },
 "link": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
 "entries":
  [
   {
    "title": ["Gabriele Albertini"],
    "formats": ["RDF"],
    "content": "183 triples in 32484 bytes",
    "link": "http://dbpedia.org/resource/Gabriele_Albertini",
    "updated": "2008/05/23"
   },
   {
    "title": ["Gabriele Paonessa"],
    "formats": ["RDF"],
    "content": "111 triples in 16153 bytes",
    "link": "http://dbpedia.org/resource/Gabriele_Paonessa",
    "updated": "2008/05/23"
   },
  ...
  ]
}

The format closely matches the OpenSearch format, so refer to that for further details, the only two differences are the title field in the entry, which is a list (a document can have different labels) and the format field which is a list of the formats found in one page (for example, RDFa and microformats).

Example ATOM format:

<?xml version="1.0" encoding="iso-8859-1"?>
<feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
      xmlns:sindice="http://sindice.com/vocab/fields#"
      xmlns="http://www.w3.org/2005/Atom">
  <title>Sindice search: gabriele</title>
  <link href="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"/>
  <updated>2008-06-03T19:50:39+01:00</updated>
  <author>
    <name>Sindice.com</name>
  </author>
  <id>http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term</id>
  <opensearch:totalResults>211</opensearch:totalResults>
  <opensearch:startIndex>1</opensearch:startIndex>
  <opensearch:itemsPerPage>10</opensearch:itemsPerPage>
  <opensearch:Query role="request" startPage="1" searchTerms="gabriele"/>
  <link href="http://sindice.com/search?page=1&q=gabriele&qt=term"
        rel="alternate" type="text/html"/>
  <link href="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"
        rel="first" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?q=gabriele&qt=term"
        rel="previous" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?page=2&q=gabriele&qt=term"
        rel="next" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?page=22&q=gabriele&qt=term"
        rel="last" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"
        rel="self" type="application/atom+xml"/>
  <link href="http://www.sindice.com/opensearch-term.xml"
        rel="search" type="application/opensearchdescription+xml"/>
  <entry>
    <title>Gabriele Albertini</title>
    <link href="http://dbpedia.org/resource/Gabriele_Albertini"/>
    <id>http://dbpedia.org/resource/Gabriele_Albertini</id>
    <updated>2008-05-23T00:00:00+01:00</updated>
    <sindice:format>RDF</sindice:format>
    <content>183 triples in 32484 bytes</content>
  </entry>
  <entry>
    <title>Gabriele Paonessa</title>
    <link href="http://dbpedia.org/resource/Gabriele_Paonessa"/>
    <id>http://dbpedia.org/resource/Gabriele_Paonessa</id>
    <updated>2008-05-23T00:00:00+01:00</updated>
    <sindice:format>RDF</sindice:format>
    <content>111 triples in 16153 bytes</content>
  </entry>
</feed>

It is a simple ATOM file, plus the OpenSearch schema plus a single additional tag for carrying informations about the document format. You should be able to parse this easily with any XML parser.

The RDF representation defines the base search URI as a search:Query object, which has links to search:Pages, each one having many search:Results. the other fields should be obvious, and mimic the other searches.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:fields="http://sindice.com/vocab/fields#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns="http://sindice.com/vocab/search#">
  <Query rdf:about="http://api.sindice.com/v2/search?q=gabriele&qt=term">
    <dc:title>Sindice search: gabriele</dc:title>
    <dc:date>2010-09-26T20:57:34+01:00</dc:date>
    <dc:creator>Sindice.com</dc:creator>
    <totalResults>16637</totalResults>
    <itemsPerPage>10</itemsPerPage>
    <searchTerms>gabriele</searchTerms>
    <first rdf:resource="http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1"/>
    <last rdf:resource="http://api.sindice.com/v2/search?q=gabriele&qt=term&page=100"/>
    <page rdf:resource="http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1"/>
    <opensearchDescription rdf:resource="http://www.sindice.com/opensearch.xml"/>
    <result rdf:resource="#result1"/>
    <result rdf:resource="#result2"/>
    ...
  </Query>
  <Page rdf:about="http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1">
    <dc:title>Sindice search: gabriele [page 1]</dc:title>
    <startIndex>1</startIndex>
    <previous rdf:resource="http://api.sindice.com/v2/search?q=gabriele&qt=term"/>
    <next rdf:resource="http://api.sindice.com/v2/search?q=gabriele&qt=term&page=2"/>
    <foaf:page rdf:resource="http://api.sindice.com/search?page=1&q=gabriele&qt=term"/>
  </Page>
  <Result rdf:about="#result1">
    <dc:title>Gabriele Tarquini</dc:title>
    <link rdf:resource="http://dbpedia.org/resource/Gabriele_Tarquini"/>
    <dc:created>2009-09-19T00:00:00+01:00</dc:created>
    <fields:format>RDF</fields:format>
    <content>59 triples in 9131 bytes</content>
    <rank>1</rank>
  </Result>
  <Result rdf:about="#result2">
    <dc:title>Gabriele</dc:title>
    <link rdf:resource="http://www.toprural.co.uk/Rural-hotel/Gabriele_49460_f.html"/>
    <dc:created>2010-09-21T00:00:00+01:00</dc:created>
    <fields:format>MICROFORMAT</fields:format>
    <fields:format>ADR</fields:format>
    <fields:format>RDFA</fields:format>
    <fields:format>HCARD</fields:format>
    <content>29 triples in 3374 bytes</content>
    <rank>2</rank>
  </Result>
  ...
</rdf:RDF>

Integrating JSON in your script

If you want, you can add an additional argument to the request called callback, which will cause the code to be wrapped in a function with the name you choose.
This allows clean integration of the Sindice results in your webpage, for example:

<script type="text/javascript"
src="http://api.sindice.com/v2/search?q=mike&qt=term&format=json
&callback=showSindiceResults"/>

Notice that to force the rendering of JSON output we added an additional parameter format. It can obviously be used with values atom and rdfxml

Query Limits

Sindice currently limits to 100 the number of result pages for each query. For special needs you can contact us .