Contents
- Where all started
- riese in 30 sec
- Under the hood
- More statistics
- See also (external links)
- Acknowledgements
Where all started
The riese (RDFizing and Interlinking the EuroStat Data Set Effort) started out in mid 2007 as an initiative in the W3C SWEO Linking Open Data community project.
There exist several other LOD datasets, due to their similarity most notably: the US Census dataset, and an earlier, small-sized approach to provide Eurostat data (countries and regions).
riese in 30 sec
The riese system architecture is shown below. An Apache 2 Server, SWI Prolog, and a bunch of PHP scripts are at its core. Data from Eurostat is converted into RDF/XML, dumpd into the file system. The rendering engine uses templates and produces XHTML+RDFa, to be consumed by both humans and machines (Semantic Web agents).
Under the hood
The riese core schema is shown below. Currently the riese core schema is RDF-S based
and comprises three main classes: riese:Dataset, riese:Item and riese:Dimension.
A dataset is the logical container of either more sub-datasets (related via skos:narrower) or data items.
An item represents one single data value (like 497,198,740 for the population of the European Union) with all accompanying metadata about the containing dataset and the dimensions used.
A dimension semantically describes the value of a data item in terms of, e.g. time, location, unit, etc.
In riese we strived for high standards-conformity and applying all current best practices. The following common schemas and vocabularies have been used: Dublin Core, SKOS, DOAP: Description of a Project, geonames, and event.
Interlinks to other datasets like geonames are automatically generated on the server-side. Following the WikiWiki approach users are also invited to contribute their own interlinks to resources containing more information. This feature is called User Contributed Interlinking (UCI) and is available on every data page with the 'I know more' button:
Once hitting 'I know more', the UCI module launches and you can add or remove links as you wish. Note that links to HTML content is fine, however, machines prefer RDF ;)
More statistics
The current data presented on riese is a snapshot of the publicly available Eurostat data dump taken on 2008-01-09. The statistical data itself is contained in 4,130 TSV (tab-separated value) files with a total size of 5.18 GB. Additionally dictionary files for translating the codes used in the tables and the table of content are needed (488 files).
In total riese will be providing approx. 3,000,000,000 (3 billion) triples in the final version. The current alpha-version only provides a small subset of approx. 5 million triples which will be continuously extended.
See also (external links)
For additional information see also the following:
- W3C SWEO Linking Open Data
- W3C XHTML+RDFa primer
- riese's source code (via Google)
- Semantic Web Crawling sitemap extension
Acknowledgements
The riese team (Wolfgang Halb, Yves Raimond, and Michael Hausenblas) would like express their gratitude to the people that made available the following awsome technologies: SWI-Prolog, Apache, PHP, RAP - Rdf API and YUI.
The statistical data published on riese was originally published by Eurostat. Provided that the source is acknowledged, Eurostat data may be reproduced under the conditions specified in the general copyright notice.
.