riese home

Contents


Where all started

The riese (RDFizing and Interlinking the EuroStat Data Set Effort) started out in mid 2007 as an initiative in the W3C SWEO Linking Open Data community project.

Linking Open Data logo

There exist several other LOD datasets, due to their similarity most notably: the US Census dataset, and an earlier, small-sized approach to provide Eurostat data (countries and regions).

riese in 30 sec

The riese system architecture is shown below. An Apache 2 Server, SWI Prolog, and a bunch of PHP scripts are at its core. Data from Eurostat is converted into RDF/XML, dumpd into the file system. The rendering engine uses templates and produces XHTML+RDFa, to be consumed by both humans and machines (Semantic Web agents).

A diagram of the riese system architecture

Under the hood

The riese core schema is shown below. Currently the riese core schema is RDF-S based and comprises three main classes: riese:Dataset, riese:Item and riese:Dimension. A dataset is the logical container of either more sub-datasets (related via skos:narrower) or data items. An item represents one single data value (like 497,198,740 for the population of the European Union) with all accompanying metadata about the containing dataset and the dimensions used. A dimension semantically describes the value of a data item in terms of, e.g. time, location, unit, etc.

A diagram of the riese core schema

In riese we strived for high standards-conformity and applying all current best practices. The following common schemas and vocabularies have been used: Dublin Core, SKOS, DOAP: Description of a Project, geonames, and event.

Interlinks to other datasets like geonames are automatically generated on the server-side. Following the WikiWiki approach users are also invited to contribute their own interlinks to resources containing more information. This feature is called User Contributed Interlinking (UCI) and is available on every data page with the 'I know more' button:

Launching the UCI module

Once hitting 'I know more', the UCI module launches and you can add or remove links as you wish. Note that links to HTML content is fine, however, machines prefer RDF ;)

Editing links with the UCI module

More statistics

The current data presented on riese is a snapshot of the publicly available Eurostat data dump taken on 2008-01-09. The statistical data itself is contained in 4,130 TSV (tab-separated value) files with a total size of 5.18 GB. Additionally dictionary files for translating the codes used in the tables and the table of content are needed (488 files).

In total riese will be providing approx. 3,000,000,000 (3 billion) triples in the final version. The current alpha-version only provides a small subset of approx. 5 million triples which will be continuously extended.

See also (external links)

For additional information see also the following:

Acknowledgements

The riese team (Wolfgang Halb, Yves Raimond, and Michael Hausenblas) would like express their gratitude to the people that made available the following awsome technologies: SWI-Prolog, Apache, PHP, RAP - Rdf API and YUI.

The statistical data published on riese was originally published by Eurostat. Provided that the source is acknowledged, Eurostat data may be reproduced under the conditions specified in the general copyright notice.

Creative Commons License
This work is licenced under a Creative Commons Licence

.