Skip to main content.

Welcome!

SIREn: Efficient semi-structured Information Retrieval for Lucene

Efficient, large scale handling of semi-structured data (including RDF) is increasingly an important issue to many web and enterprise information reuse scenarios.

Sindice - The Semantic Web Index

Querying graph structured data (RDF) is commonly achieved using specific solutions, called triplestores, typically based on DBMS backends. In Sindice we however needed something much more scalable than DBMS and with the desirable features of the typical Web Search engines: top-k query processing, real time updates, full text search, distributed indexes over shards, etc.

While Lucene has long offered these capabilities, its native capabilities are not intended for large semi-structured document collections (or documents with very different schemas). For this reason we developed SIREn - Semantic Information Retrieval Engine - a Lucene plugin to overcome these shortcomings and efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.

Given its general applicability, we are delighted to release SIREn under the GNU Affero General Public License, version 3 open source license. We hope businesses will find SIREn useful in implementing solutions upon the Web of Data.

You can start by looking at the features, review the performance benchmarks, learn more by reading the documentation and then download and try SIREn by yourself.

Latest News


Read our SIREn case study in Lucene in Action, 2nd Edition

Acknowledgement

This project is based upon works supported by the European FP7 projects LOD2 - Creating Knowledge out of Interlinked Data (Grant Agreement No. 257943) Okkam - Enabling a Web of Entities (contract no. ICT-215032), and by Science Foundation Ireland under Grant No. SFI/02/CE1/I131.