DCMI Global Meetings & Conferences, DC-2011, The Hague

Font Size: 
Performing Statistical Methods on Linked Data
Benjamin Zapilko, Brigitte Mathiak

Last modified: 2011-08-27

Abstract


In recent years, many government agencies have published statistical information as linked open data (i.e. Eurostat, data.gov.uk). Yet, while there are a number of visualization tools, researchers need to make scientific statistical analysis to answer their research questions. Currently, they have to download the statistical data in a table-based format, in order to use their statistics software, unfortunately losing all the benefits linked data provides to them like interlinking with other datasets. In this paper, we present an approach specifically designed to help researchers to perform statistical analysis on linked open data. By combining distributed sources with SPARQL, we are able to apply simple statistical calculations, such as linear regression and present the results to the user. Results of testing these calculations with heterogeneous data sources expose a wide range of typical issues on data integration which have to be aware of when working with heterogeneous statistical data.

Full Text: PDF (Paper)