Provenance in Software Systems

Provenance, briefly, is information "about" a computation, ranging from simple metadata such as time/date/ownership records in file systems to fine-grained explanations showing how the data in the result of a database query or complex distributed computation was derived. Other simple examples include source location information used in error reporting or debugging and traceability information in software engineering. Recently there has been increased interest in developing advanced forms of provenance for scientific databases and scientific computing.

This symposium is third in series of similar events we are organizing to bring leading researchers together with researchers in scientific disciplines with emerging needs involving data provenance, integrity, accountability and audit, and provide time and space for in-depth collaboration.

Although there has not been much research on provenance per se in programming languages and software engineering, we believe that there are interesting relationships that ought to be explored between provenance and topics such as:

  • bidirectional, adaptive, and self-adjusting computation
  • program analysis/information flow security techniques
  • trust modeling
  • traceability
  • source code management/version control/configuration management
  • model-driven design and analysis

Moreover, the fact that these topics appear related to provenance suggests that there may also be commonalities among them that could be explored.

Thus, the goals of the workshop are:

  • to familiarize researchers working on the above topics with possible applications in provenance, scientific data management and other areas
  • to encourage new work on such applications
  • and to seek common foundations underlying the above topics and provenance research in other areas, perhaps yielding a general definition or theory of "provenance".


Registration is free.


Program (tentative)

Monday, March 30 Public workshop day with lectures/panel discussion. Meeting in Cramond Room, National eScience Centre, 15 South College Street. Tentative program as follows:


8:45-9:00 Opening remarks
Session 1
9:00-9:45 Christian Skalka (University of Vermont) Data Provenance in Automated Remote Environmental Monitoring (slides)
9:45-10:30 Perdita Stevens (University of Edinburgh) Traceability in (bidirectional) model transformations (slides)
10:30-11:00 Coffee
Session 2
11:00-11:45 Stijn Vansummeren (University of Hasselt) On the expressive power of provenance in database queries (slides)
11:45-12:30 Nate Foster (University of Pennsylvania) Bidirectional Programming Languages (slides)
12:30-2:00 Lunch
Session 3
2:00-2:45 Steve Chong (Harvard University) Semantics for Provenance Security (slides)
2:45-3:30 Jeffrey Vaughan (University of Pennsylvania) Evidence-based Audit (slides)
3:30-4:00 Coffee
Session 4
4:00-5:00 Closing discussion
March 31-April 3 Free time for collaboration.
