This wiki service has now been shut down and archived

Provenance and Crowdsourcing

From ESIWiki

Jump to: navigation, search

Contents

Name

Provenance and Crowdsourcing

Owner

David Corsar

Linked Open Data Source

Background

We are investigating the development of a passenger information system for rural public transport users, which integrates open data from government agencies and operators with crowd-sourced transport and journey experience reports from small numbers of passengers. However, the use of such techniques may introduce imperfect data (e.g. incomplete, erroneous, or fraudulent reports), which could adversely affect system outputs, reducing user trust; provenance is thus critical in this domain to support information quality assessments. However, we cannot expect such information to be supplied directly by the user. We therefore have to rely upon indirect sources such as details of the mobile device used, past user reports (including feedback from others), membership of social networks, etc.

Use case Scenario

Bob gets on a bus shortly after it starts it route, takes out his smart phone, and loads up passenger contribution app. Bob is first required to provide simple log in information (username, password) and the bus operator/route details. Bob turns on “location monitoring” so the app periodically uploads their location to the server (along with any other information it can determine, such as if the wifi is working, if the phone supports wifi). Five minutes into the journey, Bob enters some experience information, stating that the bus is cold and crowded and uploads that to the server. Along with the location/experience report, the app also uploads the user’s id, and details about the smart phone (brand, operating system, observation source (the phone’s hardware or the user), and, where appropriate, sensor settings). Bob is the only passenger at that point on that particular journey contributing reports to the system.

When the server receives these reports, they are integrated (using Linked Open Data principles) with details about the operator, route (including long/lat details of the bus stops, and road sections), and other relevant sources (e.g. reports of road works in the areas around roads travelled by the bus). Alice is planning on getting on the same bus later along its route, and has registered to be informed about updates about the occupancy levels of the bus, as if it is crowded, she will wait an hour for the next bus.

How does the system determine if Bob’s report is reliable/trustworthy/high enough quality to send Alice a message which will change her behavior for the day?


Problems and Limitations

  • Additional provenance about the contributions cannot be acquired from the provider (so cannot, for example, check with Bob if he really meant that the bus is crowded).
  • The system is completely open, so anyone can send updates, not just people actually on the bus.
  • The system must respond in real time (i.e. quickly enough to ensure its outputs are still relevant)


Requirements for Provenance

  • Describing the users that contribute reports
  • Describing the reports, how, and where they were created
  • Describing the sources/provenance of other pieces of data reports are integrated with


Related Work

Views
Navigation