This wiki service has now been shut down and archived

DIR Research Village

From ESIWiki

Jump to: navigation, search

Return to Workshop wiki Main Page

Data-Intensive Research Workshop: Monday Research Village: Soliciting comments

We would welcome your comments on the research village. This can be about specific stands and the things you saw, about things you would like to have shown or seen, or they can be more general. Please put your comments under the headings below and sign them. You may want to add more headings (two = signs) or subheadings (three = signs).

--MalcolmAtkinson 16:47, 5 March 2010 (UTC)

The research village is an opportunity to meet and discuss data-intensive research with the added stimulation of demonstrations and posters. It will be held on the first floor in and around the Cramond and Swanston room. The research villagers have set up stalls in this area and we will organise a cycle of synchronised 15-minute presentations. Please choose a stand for the first presentation based on your interests and on distributing the crowd reasonably evenly around the stands. There will be a 5-minute interval between each cycle for the presenters to prepare for the next cycle and for other participants to choose a new stand.

This should provide a good opportunity for meeting fellow researchers, finding potentially useful ideas, methods and software and for starting new working relationships. Use the coffee breaks and the end-of-day reception to follow up leads. For those who are staying, it may start conversations that last all week and beyond.

During the research village the following groups who have been making progress with data-intensive research will display and talk about their wares:


NCSA – Meandre, Cloud, and More

Leaders: Xavier Llorà and Bernard A’cs and


The general focus of these short sessions will be to further introduce Meandre through some examples, illustrating how it has and can be been used. A short video presentation highlights many applications of Meandre. We also plan to interactively demonstrate Meandre to highlight some of the concepts and application areas featured during the talk.

  • 5 Minute: Prepared Video
  • 5 Minute: Live Demonstration Material
  • 5 Minute: Interactive Questions and Discussion

Meandre applications

Taverna, myExperiment and Biocatalogue

Leaders: Carole Goble and David De Roure; Presenter: Peter Li and Katy Wolstencroft

The Taverna Workbench is an open-source software tool for designing and executing workflows. Developed under the e-Science programme through myGrid and OMII-UK, Taverna enjoys very wide adoption and is currently used by over 350 academic and commercial organisations throughout the world. Workflows capture explicit methodologies for the systematic analysis of scientific data and facilitate communication of experimental methods across multiple research disciplines - in addition to a substantial user base in bioinformatics, Taverna is used by data-intensive researchers from chemistry to computational musicology. The workbench is available from

We will demonstrate Taverna in conjunction with the myExperiment social web site ( which lets users discover, publish and share workflows. With the largest public collection of scientific workflows (nearly 1000) and a membership of over 3000 researchers, myExperiment enables individual researchers to re-use and re-purpose workflows to suit their needs, reducing workflow re-invention whilst facilitating scientific collaborations and sharing of research expertise. Taverna tightly integrates with myExperiment to bring the full experience of workflow browsing into the Workbench.

Our third element is BioCatalogue (, our registry of biological Web Services for use in workflows. Freely accessible to the world, BioCatalogue provides an open platform for biological Web Services registration, annotation and monitoring. BioCatalogue builds on the Web 2 model of myExperiment to provide a unique resource with community curation.

Katy Wolstencroft and Peter Li (real researchers!) will be demonstrating the capabilities of these tools as he uses them in his own research. Videos of Taverna and myExperiment are also available., and


Taverna Applications



The OMII-UK and Software Sustainability Institute

Director: Neil Chue Hong

Presenters: Ally Hume (OGSA-DAI), George Beckett / James Perry (DiGS), Terry Sloan (SPRINT)

OMII-UK has cultivated many of the leading e-Science software tools, supporting their development through initiatives such as the Commissioned Software Programme and ENGAGE programme. We will be highlighting three pieces of software which are of particular relevance to data-intensive research: OGSA-DAI, an extensible framework for data integration from heterogeneous sources; DiGS, a distributed data management system featuring replication, validation and consistency checking; and DataMINX, which allows managed data transfers between all major grid data storage systems.

The Software Sustainability Institute has been created to work in partnership with research communities to identify key software that needs to be sustained, and makes software not just available but useful for researchers by improving usability, quality and maintenance. One example, which we will demonstrate is the SPRINT Parallel R framework, created by the Department of Pathway Medicine and EPCC at the University of Edinburgh, which allows statistical analyses written in the R programming language to be run on high performance computing systems without specialist knowledge. We will also give more information about the services that the SSI will be providing to the research community.







From Discovery Net and InforSense(IDBS) to the Discovery Cloud

Leader: Yike Guo, Presenters: Katie McMurray and Anthony Rowe

The Discovery Net project and the associated spin out company InforSense were one of the early pioneers in using Workflow technologies as a high level programming framework for data intensive scientific applications, as was highlighted by winning the Most Innovative Data Intensive Application at the 2002 Supercomputing conference. Now part of IDBS on going research into the platform and commercial success in the pharmaceutical sector, especially in the area of Translational Research has shown that Workflow is only one vital component of the Data Intensive Application Stack. One of the significant trends in Translational and Biomarker research is the forming of large consortia of Academic, Biotechnology and Pharmaceutical companies with one research aim. Structurally these projects are data intensive virtual organisations and this requires the ability for research software systems to not just process data for a single organisation, but also to store, analyse and visualise data in an architecture that enables these distributed consortia to work effectively on the same project data. The Discovery Cloud system, shown for the first time at this conference is an ongoing study in how a highly virtualised and elastic cloud based architecture enables the rapid construction of virtual data intensive application stack to support these styles of project. and

Discovery Net


Discovery Cloud

Data-Intensive Research, Edinburgh

Leader: Jano van Hemert

Our group does interdisciplinary research with the aim to progress methods in computer science and tackle data-intensive challenges in science and business. The remit of our group comprises the following.

  • Effective algorithms for data analysis, data mining and combinatorial optimisation.
  • Distributed and data-intensive systems for efficient orchestration of data and computation.
  • Reusable components and new conceptual models for systems that can be deployed across disciplines.
  • Intuitive interfaces and collaboration environments to enable domain-specific researchers to make use of the above systems.

We will show demos of applications of this expertise in several areas, including seismology, brain imaging, computational chemistry, breast cancer and developmental biology. Further demos, papers, software, presentations and news on our activities is on


MonetDB: Scientific Data Management - Why not let the databases in?

Leader: Martin Kersten (CWI) Presenter: Milena Ivanova

The data-intensity of modern sciences puts challenging data management problems to research community in terms of scalability, functionality, and performance. The usage pattern of scientific data warehouses is characterized by long periods of (ad-hoc) analysis of large data volumes, intermixed with regular bulk-load of new data. Analytical applications are often disk-bound since extensive computations and aggregations may span the entire data set or large portions of it.

Research on column-store databases has already indicated their potential to provide a number of advantages in comparison to the traditional row-store systems in the described settings. Vertical organization provides for more efficient data access pattern for disk-bound queries, flexibility in the presence of changing workloads, and may also reduce data redundancy and storage needs.

MonetDB is an open-source column-store database management system, developed at CWI for over a decade. In the competitive world of commercial and open-source databases MonetDB distinguishes by several fundamental characteristics that together provide high performance for analytical workloads. In this presentation we would like to highlight the advantages MonetDB can offer for analytical scientific applications.

Besides the benefits stemming from the vertical organisation, the execution paradigm of MonetDB is based on full materialization of intermediate results. This opens an opportunity to speed up query sequences with overlapping computations by careful preservation and re-use of common intermediates. Commonalities are often observed in logs of scientific activities, where collaborating or competing teams may perform similar, but slightly different analyses.


Centre for Advanced Spatial Analysis, UCL

Leader: Michael Batty; Presenter: Steven Gray

With the recent popularity of web-based mapping systems, the visualisation and analysis of spatial data is becoming increasingly important. In our MapTube website ( we allow users to upload and compare thematic data on top of the regular Google Maps or OpenStreetMap layers. Extending this idea to data collection as well as visualisation, we demonstrate the concept of a “Mood Map” where members of the public answer a question which we ask via an online form. An example might be, “What single factor is affecting you most about the credit crunch?” with a single answer chosen from one of, “Mortgage or Rent”, “Petrol”, “Food Prices”, “Job Security”, “Utility Bills” or “Not Affected”. The first part of their postcode is also entered and it is this that is used to build a map of the responses by postcode district.

The first mood maps were limited in that only we could set them up and that they only updated every half hour. With the ASK project, which is funded by the National e-Infrastructure for Social Simulation (NeISS), we aim to build a more flexible architecture for large-scale crowd sourcing of spatial data. This project will allow ordinary users to set up their own mood map surveys and see the results in real-time.

In addition to the online survey idea behind the mood maps, we have also looked at using other popular social networking sites to extract spatially tagged data. We will demonstrate the “Tweetometer”, which counts tweets on the Twitter site containing the keyword “London” and show how this can be graphed in real-time to show activity. Our system was used to crowd-source a map of snowfall in the UK during December and January, and it was also used by Carling for use in the Carling Cup Final, illustrating the wider use of e-Science research.

Blackford Analysis – Instant 3D Registration

Leaders: Alan Heveans and Ben Panter (Institute for Astronomy, The University of Edinburgh).

Blackford Analysis is a spinout group from the Institute for Astronomy at the University of Edinburgh, applying their MOPED technology to problems involving large datasets. MOPED was originally developed to interpret galaxy spectra, although is a general technique appropriate in many situations involving parametric modeling. The Blackford Analysis team will demonstrate an application that provides real-time registration capability for medical imaging.

Modern methods for diagnostic imaging result in large datasets. Data is acquired in 3D and typically comprises hundreds of slices, each at 512x512 resolution. The simple sounding task of calculating the affine transform between two scans involves the fitting of 12 parameters in over 32 million voxels. Although high performance computers can tackle the problem in a reasonable time (5-10 minutes) on today’s datasets, as resolution increases the problem rapidly becomes intractable.

Blackford Analysis use the MOPED approach to compress the data into a form that can be tackled quickly and efficiently – two medical scans can be aligned in less than a second, using a standard laptop. This step change in performance allows real-time registration of images, with many applications giving increases in patient throughput. Medical imaging is only one of the potential areas where MOPED is useful, and the Blackford team is very keen to discuss any problems of interest to the data-intensive research community.


e-Research South - Lab blog book systems

Leaders: Anne Trefethen and Jeremy Frey; Presenter: David De Roure

The e-Research South Consortium (Oxford, Reading, Southampton and STFC) is building a vibrant regional activity driven by specific application areas and building on existing e-Infrastructure technologies to enhance ease of use, uptake and 'accessibility', to develop know-how and tools for dealing with research data, to provide development of and access to advanced visualisation, and to provide opportunities for public engagement with science.

"Southampton Smart Labs Systems" is one of these activities and provides a solution for data management for experimental and computational researchers for Publication@Source and Data on Demand. We will demonstrate the concept of Blog style systems as Laboratory Notebooks, using our current (Blog2) systems and the new semantically rich system (due for release in April) which is based on the semantic & web 2.0 Blog3 together with the OREChem Ontology of experiments.

The blog book systems are linked to the flow of data from laboratories and experiments and we will demonstrate the fully traceable experimental data flowing from our laser surface experiments. We will show how the self describing data created by the Blog and Laboratory systems enables the use of semantic tools (such as the MIT Simile Software) to display and use the data.

This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.