This wiki service has now been shut down and archived
Genomic and Environmental Science Data Flows
This mini-theme will track the flow of next generation sequencing (NGS) and embedded networked sensing (ENS) data across platforms, repositories, visualizations, applications and publications. The focus will be to articulate what lies between the domain experts (genomic scientists, environmental scientists and other scientific expertise) and the technical experts (computer scientists, software and electronic engineers). Via two workshops, we will map the interfaces where the application domains and technical domains come together. We will explore problems of replication, durability, and metrology of the NGS and ENS data flows and identify emerging devices and collaborative practices to enable e-science work in these areas.
18-19 May 2011: Data Flows in Environmental Networked Sensors
The aim of the theme is to re-describe how data moves in NGS (Next Generation Sequencing) genomic sciences and in ENS (Embedded Networked Sensing) environmental sciences. In the life sciences, NGS and ENS epitomise very different data ‘topographies’. NGS and ENS not only designate different sources of data (sequencers, sensors), but send very different flows of data across platforms, instruments, repositories, centres, applications and publications. NGS and ENS also delineate the interface between application domain and technical domains differently. We think the contrast between them could be instructive in many respects, and is worth exploring in detail.
The mini-theme will develop maps of trajectories of data from collection (NGS instruments, embedded network sensors) through analysis, storage, visualizations, models, and publications. We have identified two related points of analytical focus: replication and durability. In the NGS setting, replication is a key problem. At the recent BBSRC workshop ‘Challenges of Visualizing Biological Data’ (Bristol, November 2010), Reinhard Schneider (EMBL-Heidelberg) identified the very low rate of replicability of genomic data (~2%). In the ENS setting, durability is a key problem. Recent work at CENS (UCLA) and elsewhere indicates that the obtaining of high volume environmental data over time requires careful planning to make sure environmental and computer scientists have an ongoing stake in keeping the sensor network going.
One key reference point for e-science is what we are calling ‘data metrology.’ Across both NGS and ENS, there are many general measurements of data size and quantity. In practice, they are a very inadequate guide to the problems of doing e-Science. There are good reasons to come up with better metrologies of data and with metrics that indicate where the value in data comes from. However, realistic metrics might be hard to establish precisely because of the evolving nature of the data topographies. Given this problem, the objectives of this theme are to:
- develop an awareness of the problems, obstacles, friction points or gaps that hinder transformations or reshaping of data flows to do better e-science;
- identify practices and devices in the conduct of e-science that sustain collaborative development;
- develop an awareness of some alternative ways of thinking about data flows in genomics and environmental sciences ;
- develop alternative socio-technical models that open up new avenues for interdisciplinary collaboration on devices and practices for research with high throughput data flows.
Key Research Challenges
- Develop a richly described sociology of data (as identified by A. Szalay, e-SI workshop Edinburgh, March 2010) that takes into account problems of replicability, durability and metrology, and offers an empirically grounded typology of ‘the data deluge’;
- Identify best practices for collaborative work along the varying interfaces between domain and technical expertises, such that all participants can be involved in experimental e-science challenges;
- Implement practices that can sustainably increase the use-value of data flows.
Next Generation Sequencing
Google map of NextGen Sequencers
Bioinformatics for Next Generation Sequencing virtual issue
What is next generation sequencing? from wired.com
CLC Next generation sequencing blog
Pathogens: Genes and Genomics blog
Embedded Network Sensing