This wiki service has now been shut down and archived
Principles of Provenance
- Start Date: 1 April 2008
- End Date: 15 May 2009
This Theme is being led by:
Applying to be a Visitor to this Theme
Please fill in the application including in the proposal the name of the theme (Provenance).
Recent research in a variety of settings (databases and data warehouses, geographic information systems, scientific workflows, grid computing, and the Semantic Web) has addressed the problem of keeping track of metadata about creation and modification history, influences, ownership, and other provenance or lineage information. Such metadata is essential for making informed judgments about data quality, integrity, and authenticity. In addition, ideas about provenance are now being used in several areas of computer science such as probabilistic databases, operating systems, file synchronization, and annotation propagation. Other topics, such as version control and archiving, may also benefit from better understanding of provenance. We believe the time is ripe to develop the foundations of the topic and address questions such as:
- What does it mean for information to be "provenance"? What is and what isn't provenance?
- What kinds of problems does provenance address, and how does one characterize correct solutions?
- How does one compare different models of provenance?
- Why is provenance so hard to get right, even though it seems rather obvious?
- Where should research efforts be focused to make the best progress?
Schedule of Events
- Opening lecture, April 15 2008. Principles of Provenance, James Cheney.
- Visiting speaker, May 6, 2008. MIT data quality research activities including Recent Work on Propagation of "Believability" using provenance information. Stuart Madnick, Massachusetts Institute of Technology.
- Visiting speaker, Friday, May 30, 2008, 11am, JCMB 2511. Self-Adjusting Computation. Umut A. Acar, Toyota Technological Institute, Chicago.
- Visiting speaker, Tuesday, May 12, 2009. A Theory of Typed Coercions and its Applications, Michael Hicks, University of Maryland
- Closing lecture, May 15 2008, The Future of Provenance, James Cheney.
Workshop on Principles of Provenance (PrOPr) Edinburgh, Scotland, November 19-20, 2007.
First Workshop on Theory and Practice of Provenance, San Francisco, CA, February 23, 2009. The online proceedings is available here. Here are some of the presentations from TaPP.
Workshop on Use Cases for Provenance Edinburgh, Scotland, April 20, 2009
Second Workshop on Theory and Practice of Provenance, San Jose, CA, February 22, 2010.
Other recent provenance workshops and events
Data Provenance/Derivation Workshop, Chicago, 2002
Data Provenance and Annotation, eScience Institute, Edinburgh, December 1, 2003
Workshop on Provenance Aware Storage Systems, October 2005
- IPAW 2006, May 3-5, 2006, Chicago, IL.
- IPAW 2008, June 17-18, 2008, Salt Lake City, UT.
- IPAW 2010, June 15-16, 2010, Troy, NY.
- First Provenance Challenge, workshop held September 13-14, 2006
- Second Provenance Challeng, workshop held June 26, 2007
- Third Provenance Challenge, workshop to be held June 11-12, 2009
BELIEF-II/CASPAR Brainstorming Workshop on Provenance, April 6-7, 2009
Workshop on Data and process Provenance (WDPP 2009) - April 20, 2009
Semantic Web Provenance Management (SWPM 2009) - proposed workshop to be collocated with ISWC 2009, October 25-29.
Provenance in Practice Workshop 2009 (PPW '09), colocated with NbiS 2009, Indianapolis, USA, 19-21 August 2009
This is a (non-exhaustive) list of recent papers on provenance by Theme participants, some of which has been facilitated by the Theme.
- Manish Kumar Anand, Shawn Bowers, Timothy M. McPhillips, Bertram Ludäscher: Efficient provenance storage over nested data collections. EDBT 2009
- D. Archer, L. Delcambre, D. Maier, A Framework for Fine-grained Data Integration and Curation, with Provenance, in a Dataspace, TaPP 2009
- Shawn Bowers, Timothy M. McPhillips, Sean Riddle, Manish Kumar Anand, Bertram Ludäscher: Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life. IPAW 2008: 70-77
- Uri Braun, Avraham Shinnar, Margo I. Seltzer: Securing Provenance. HotSec 2008
- Peter Buneman, James Cheney, Wang Chiew Tan, Stijn Vansummeren: Curated databases. Symposium on Principles of Database Systems 2008:1-12
- Peter Buneman, James Cheney, Stijn Vansummeren: On the Expressiveness of Implicit Provenance in Query and Update Languages. ACM Transactions on Database Systems Volume 33 , Issue 4 (November 2008)
- Peter Buneman, Wang Chiew Tan: Provenance in databases. ACM International Conference on Management of Data (SIGMOD) 2007:1171-1173
- Adriane Chapman and H. V. Jagadish. Why NOT?. In SIGMOD 2009
- James Cheney, Provenance, XML and the Scientific Web, invited talk, PLAN-X 2009
- James Cheney, Peter Buneman, Bertram Ludäscher: Report on the Principles of Provenance Workshop. ACM Special Interest Group on Management of Data (SIGMOD) Record 37(1):62-65 (2008)
- James Cheney, Laura Chiticariu, and Wang-Chiew Tan: Why, how, and where-provenance. Invited to Foundations and Trends in Databases (accepted).
- S. Chong, Towards Semantics for Provenance Security, TaPP 2009
- Andrew Cirillo, Radha Jagadeesan, Corin Pitcher, James Riely: Tapido: Trust and Authorization Via Provenance and Integrity in Distributed Objects (Extended Abstract). ESOP 2008:208-223
- Susan B. Davidson, Juliana Freire: Provenance and scientific workflows: challenges and opportunities. ACM International Conference on Management of Data (SIGMOD) 2008:1345-1350
- V. Deolalikar, H. Laffitte, Provenance as data mining, TaPP 2009
- M. Factor, E. Henis, D. Naor, S. Rabinovici-Cohen, P. Reshef, S. Ronen, G. Michetti, M. Guercio,Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in Preservation Aware Storage, TaPP 2009
- J. Nathan Foster, Todd J. Green, Val Tannen: Annotated XML: queries and provenance. Symposium on Principles of Database Systems 2008: 271-280
- Juliana Freire, David Koop, Luc Moreau: Provenance and Annotation of Data and Processes, Second International Provenance and Annotation Workshop, IPAW 2008, Salt Lake City, UT, USA, June 17-18, 2008. Revised Selected Papers. IPAW 2008
- Floris Geerts and Antonella Poggi, On Database Query Languages for K-relations, Logic in Databases workshop (LID), 2008.
- A. Gehani, M. Kim, J. Zhang, Steps Toward Managing Lineage Metadata in Grid Clusters, TaPP 2009
- T. Gibson, K. Schuchardt, E. Stephan, Application of Named Graphs Towards Custom Provenance Views, TaPP 2009
- Todd J. Green. Containment of Conjunctive Queries on Annotated Relations. 2009 International Conference on Database Theory
- Ragib Hasan, Radu Sion, and Marianne Winslett, The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance, FAST 2009
- Ragib Hasan, Radu Sion, and Marianne Winslett, "Introducing Secure Provenance: Problems and Challenges", ACM StorageSS 2007.
- Jan Hidders, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, Jan Van den Bussche: DFL: A dataflow language based on Petri nets and nested relational calculus. Inf. Syst. (IS) 33(3):261-284 (2008)
- Natalia Kwasnikowska, Jan Van den Bussche: Mapping the NRC Dataflow Model to the Open Provenance Model. IPAW 2008: 3-16
- D. Margo, M. Seltzer, The Case for Browser Provenance, TaPP 2009
- Paolo Missier, Suzanne M. Embury, Richard Stapenhurst: Exploiting Provenance to Make Sense of Automated Decisions in Scientific Workflows. IPAW 2008:174-185
- Luc Moreau et al. Special Issue: The First Provenance Challenge. Concurrency and Computation: Practice and Experience 20(5):409-418 (2008)
- Luc Moreau, Juliana Freire, Joe Futrelle, Robert E. McGrath, Jim Myers, Patrick Paulson: The Open Provenance Model: An Overview. IPAW 2008:323-326
- Luc Moreau, Paul T. Groth, Simon Miles, Javier Vázquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer F. Rana, Andreas Schreiber, Victor Tan, László Zsolt Varga: The provenance of electronic data. Commun. ACM (CACM) 51(4):52-58 (2008)
- Kiran-Kumar Muniswamy-Reddy and David A. Holland, Causality-Based Versioning, FAST 2009
- K. Muniswamy-Reddy, P. Macko, M. Seltzer,Making a Cloud Provenance-Aware, TaPP 2009
- P. Pediaditis, G. Flouris, I. Fundulaki, V. Christophides, On Explicit Provenance Management in RDF/S Graphs, TaPP 2009
- C. Reilly, J. Naughton, Transparently Gathering Provenance with Provenance Aware Condor,TaPP 2009
- A. Rosenthal, L. Seligman, A. Chapman, B. Blaustein, Scalable Access Controls for Lineage, TaPP 2009
- Yogesh L. Simmhan, Beth Plale, Dennis Gannon: Karma2: Provenance Management for Data-Driven Workflows. Int. J. Web Service Res. (JWSR) 5(2):1-22 (2008)
- Yogesh L. Simmhan, Beth Plale, Dennis Gannon: Query capabilities of the Karma provenance framework. Concurrency and Computation: Practice and Experience (CONCURRENCY) 20(5):441-451 (2008)
- Richard P. Spillane, Sachin Gaikwad, Manjunath Chinni, Erez Zadok, and Charles P. Wright, Enabling Transactional File Access via Lightweight Kernel Extensions, FAST 2009
- R. Spillane, R. Sears, C. Yalamanchili, S. Gaikwad, M. Chinni, E. Zadok, Story Book: An Efficient Extensible Provenance Framework, TaPP 2009
- I. Souilah, A. Francalanza, V. Sassone, Provenance in Distributed Systems, TaPP 2009
- Nikhil Swamy, Brian J. Corcoran, Michael Hicks: Fable: A Language for Enforcing User-defined Security Policies. IEEE Symposium on Security and Privacy 2008:369-383
- Nikhil Swamy, Michael Hicks: Verified enforcement of stateful information release policies. PLAS 2008: 21-32
- Simone Stumpf, Erin Sullivan, Erin Fitzhenry, Ian Oberst, Weng-Keen Wong, Margaret M. Burnett: Integrating rich user feedback into intelligent user interfaces. IUI 2008:50-59
- Val Tannen: Provenance for Database Transformations. IPAW 2008:1
- Curt Tilmes, Albert J. Fleig: Provenance Tracking in an Earth Science Data Processing System. IPAW 2008:221-228