This wiki service has now been shut down and archived
Provenance and Linked Open Data
From ESIWiki
Contents |
Synopsis
The mini-theme aims to bring together stakeholders from academia, government and industry to discuss the provenance challenges associated with linked open data, and to develop a roadmap highlighting future directions for research.
Meetings & Workshops
18 February, 2011 - Seminar on the Findings of the W3C Provenance Incubator Group, Dr Paul Groth, VU University of Amsterdam
30-31 March, 2011 - Workshop: Understanding Provenance and Linked Open Data
27 June 2011 - Developing a Set of Provenance Principles for Linked Open Data
Background
Underpinning the scientific process is the point of sale software transfer of ideas, knowledge and resources, and in recent years the Web has drastically altered both the nature and speed of this exchange. Recently the concept of the Web of Virginia reckless driving Linked Data has emerged as a means to expose, share and connect information on the Web identified by URIs using RDF1 as a data model. Examples include the data.gov.uk initiative which aims to expose UK public data, and bio2rdf.org which provides an atlas of post-genomic data. However, the Web of Linked data still suffers from many of the same problems as the Web of documents in terms of information quality, trust, attribution, privacy, etc.
An illustration of this is reflected in the following quote from the chairman of the UK Audit Commission Michael O'Higgins on the day that government spending data was released: "And that's where top weight loss pills I think the critical issue is - that what is being released is not in fact information, it is data. And data needs context to become information, and it is provision of that context that will be important."
We argue that provenance plays a vital role in enriching the context surrounding open data, and can help support assessment of attributes such as trustworthiness and quality. The W3C Provenance Incubator Group2 has been working to develop a roadmap in the area of provenance for Semantic Web technologies. However, we argue that there is still considerable scope to bring interested parties together to discuss how many of these issues will be tackled in the Web of Linked Data. Edwards et al. have recently introduced the concept of a provenance fabric [1] to refer to the descriptions of physical artefacts, digital artefacts, people, services, service providers and online communications that are woven together through the use of linked data principles. The proposed mini-theme will bring together stakeholders from academia, government and industry to discuss the provenance challenges associated with linked open data, and to develop a roadmap highlighting future directions for link building research. We argue that this is timely for a number of reasons: the rapid growth in available linked data resources; the boom in applications using such resources; and the conclusion of the W3C Provenance Incubator Group (meaning that there is scope to involve many of its members in a follow-on activity).
Key Research Challenges
We have identified the following key research challenges in the context of Provenance and Linked Open Data:
- Provenance Representation – A number of models exists for describing provenance such as OPM6, Provenance Vocabulary7, Provenir ontology8, etc. A key research challenge is to evaluate such models with respect to the provenance of linked open data.
- Quality – There is an increasing variety of data available as Linked Open Data being published by a range of different sources. A key research challenge here is to establish a set of indicators which can be used in order to determine the quality of linked data published on the Web.
- Trust – One major difficulty is that, by its very nature, the Web of Linked Open Data is a large open ecosystem to which anyone may contribute. This raises the question of how much credence to give each resource. A key technology news research challenge here is establishing methods to determine the trustworthiness of both linked data and its providers.
- Privacy – The Web of Linked Data allows the integration of data from distinct sources giving the opportunity to violate the privacy of those individuals and organisations described by the data. The key research challenge here is how to protect the privacy of individuals and organisations in the Web of Linked Data context. A particularly challenging aspect of this is raising the user’s awareness about what data exists about himself and what data is safe to provide in which context.
- Attribution – An important aspect of linked open data is the attribution of resources to the entities (e.g. people, organisations) that contributed to the resource’s creation. The key research challenges here is to identify which entity is responsible for the resource.
Issues Related to Provenance and Linked Open Data
During the Workshop: Understanding Provenance and Linked Open Data the participants identified the following top issues associated with provenance and Linked Open Data:
- Identity
- Provenance of link between data ‘islands’
- Summarisation to aid and usability and and scalability
- Reasoning about provenance
- Complex Objects
- 80/20 principle for provenance
- What general provenance model to use to enable interoperability?
- Provenance for validation of facts
- Is reasoning even possible over provenance?
- Interaction of triple stores & data integration/transformation
- Semantics VS data capture
- Access level of provenance (public VS private)
Other issues identified by the group:
- Provenance of Provenance
- Granularity
- Richness VS ease of use
- Agreeing on the scale of what you are asserting about
- Distinguishing between provenance of primary and secondary datasets
- Process mining from provenance
- Provenance on quality/trust assertions
- Machine-centered processes vs human processes
- Provenance as supporting context/evidence for societal open data
- Provenance of data conversion procedures
- Provenance of power structure behind decision processes
- Provenance for validations of facts
Use-cases
If you wish to contribute a case study please register to the wiki and email Edoardo Pignotti (e.pignotti@abdn.ac.uk) to enable the editing permissions. Use Cases Template
Provenance Principles (DRAFT
Recently the concept of the Web of Linked Data has emerged as a means to expose, share, and connect information on the Web. While this approach provides better context for supporting intelligent reasoning, publishing the provenance of open data is essential in order to enable better assessment of important attributes such as trustworthiness and quality. We therefore introduce a set of provenance principles for Linked (Open) Data:
- Record and publish provenance of resources on the web (whatever format).
- For every published web resource, publish its provenance as Linked Data
- Publish links or other data that allow navigation from a resource to its provenance, and vice versa.
- Publish links from provenance to the provenance of the resources from which it is derived.
Similar to the Linked Open Data rules introduced by Berner-Lee, this provenance principles are “expectation of behaviour” and breaking them does not destroy anything but misses an opportunity to make data more transparent.
The first principle is to record and publish provenance information (about data published as Linked Data) in the web. Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource.
The second principle is to link back to your sources in a way which allow you to know that they are sources.
The third principle is to link back to the provenance of your sources is way which allow you to know this is the provenance of your sources
The fourth principle is to link your provenance to other people’s provenance in such a way the link that you are making is provenance itself. (composable provenance)
Roadmap for Provenance and Linked Open Data.
Useful Links
W3C Provenance Incubator Group Wiki - (http://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki)
Provenance Interchange Working Group Charter - (http://www.w3.org/2011/01/prov-wg-charter.html)