This wiki service has now been shut down and archived

Provenance and Linked Open Data

From ESIWiki

Jump to: navigation, search

Contents

Synopsis

The mini-theme aims to bring together stakeholders from academia, government and industry to discuss the provenance challenges associated with linked open data, and to develop a roadmap highlighting future directions for research.

Meetings & Workshops

18 February, 2011 - Seminar on the Findings of the W3C Provenance Incubator Group, Dr Paul Groth, VU University of Amsterdam

30-31 March, 2011 - Workshop: Understanding Provenance and Linked Open Data

27 June 2011 - Developing a Set of Provenance Principles for Linked Open Data

Background

Underpinning the scientific process is the point of sale software transfer of ideas, knowledge and resources, and in recent years the Web has drastically altered both the nature and speed of this exchange. Recently the concept of the Web of Virginia reckless driving Linked Data has emerged as a means to expose, share and connect information on the Web identified by URIs using RDF1 as a data model. Examples include the data.gov.uk initiative which aims to expose UK public data, and bio2rdf.org which provides an atlas of post-genomic data. However, the Web of Linked data still suffers from many of the same problems as the Web of documents in terms of information quality, trust, attribution, privacy, etc.

An illustration of this is reflected in the following quote from the chairman of the UK Audit Commission Michael O'Higgins on the day that government spending data was released: "And that's where top weight loss pills I think the critical issue is - that what is being released is not in fact information, it is data. And data needs context to become information, and it is provision of that context that will be important."

We argue that provenance plays a vital role in enriching the context surrounding open data, and can help support assessment of attributes such as trustworthiness and quality. The W3C Provenance Incubator Group2 has been working to develop a roadmap in the area of provenance for Semantic Web technologies. However, we argue that there is still considerable scope to bring interested parties together to discuss how many of these issues will be tackled in the Web of Linked Data. Edwards et al. have recently introduced the concept of a provenance fabric [1] to refer to the descriptions of physical artefacts, digital artefacts, people, services, service providers and online communications that are woven together through the use of linked data principles. The proposed mini-theme will bring together stakeholders from academia, government and industry to discuss the provenance challenges associated with linked open data, and to develop a roadmap highlighting future directions for link building research. We argue that this is timely for a number of reasons: the rapid growth in available linked data resources; the boom in applications using such resources; and the conclusion of the W3C Provenance Incubator Group (meaning that there is scope to involve many of its members in a follow-on activity).

Key Research Challenges

We have identified the following key research challenges in the context of Provenance and Linked Open Data:

  • Provenance Representation – A number of models exists for describing provenance such as OPM6, Provenance Vocabulary7, Provenir ontology8, etc. A key research challenge is to evaluate such models with respect to the provenance of linked open data.
  • Quality – There is an increasing variety of data available as Linked Open Data being published by a range of different sources. A key research challenge here is to establish a set of indicators which can be used in order to determine the quality of linked data published on the Web.
  • Trust – One major difficulty is that, by its very nature, the Web of Linked Open Data is a large open ecosystem to which anyone may contribute. This raises the question of how much credence to give each resource. A key technology news research challenge here is establishing methods to determine the trustworthiness of both linked data and its providers.
  • Privacy – The Web of Linked Data allows the integration of data from distinct sources giving the opportunity to violate the privacy of those individuals and organisations described by the data. The key research challenge here is how to protect the privacy of individuals and organisations in the Web of Linked Data context. A particularly challenging aspect of this is raising the user’s awareness about what data exists about himself and what data is safe to provide in which context.
  • Attribution – An important aspect of linked open data is the attribution of resources to the entities (e.g. people, organisations) that contributed to the resource’s creation. The key research challenges here is to identify which entity is responsible for the resource.

Issues Related to Provenance and Linked Open Data

During the Workshop: Understanding Provenance and Linked Open Data the participants identified the following top issues associated with provenance and Linked Open Data:

Other issues identified by the group:

Use-cases

If you wish to contribute a case study please register to the wiki and email Edoardo Pignotti (e.pignotti@abdn.ac.uk) to enable the editing permissions. Use Cases Template

Provenance Principles (DRAFT

Recently the concept of the Web of Linked Data has emerged as a means to expose, share, and connect information on the Web. While this approach provides better context for supporting intelligent reasoning, publishing the provenance of open data is essential in order to enable better assessment of important attributes such as trustworthiness and quality. We therefore introduce a set of provenance principles for Linked (Open) Data:

  • Record and publish provenance of resources on the web (whatever format).
    • For every published web resource, publish its provenance as Linked Data
      • Publish links or other data that allow navigation from a resource to its provenance, and vice versa.
        • Publish links from provenance to the provenance of the resources from which it is derived.

Similar to the Linked Open Data rules introduced by Berner-Lee, this provenance principles are “expectation of behaviour” and breaking them does not destroy anything but misses an opportunity to make data more transparent.

The first principle is to record and publish provenance information (about data published as Linked Data) in the web. Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource.

The second principle is to link back to your sources in a way which allow you to know that they are sources.

The third principle is to link back to the provenance of your sources is way which allow you to know this is the provenance of your sources

The fourth principle is to link your provenance to other people’s provenance in such a way the link that you are making is provenance itself. (composable provenance)


Roadmap for Provenance and Linked Open Data.

Useful Links

W3C Provenance Incubator Group Wiki - (http://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki)

Provenance Interchange Working Group Charter - (http://www.w3.org/2011/01/prov-wg-charter.html)

Views
Navigation
This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.