Track C
From ESIWiki
Back to November SAB Workshop
Contents |
Track C
Reporter: Dave Berry
Command e-Science
AIM: To write the e-Science BlueBook
QUESTIONS
- What are e-Science’s grand challenges?
FM: Is this to invent cloud computing? Is it to get more people to use technology? DA: Biologists are sold on technology already, viz Human Genome Project; the question is how to spread this to other parts of the discipline. FM: But everyone is inventing their own systems.
- GRAND CHALLENGE 1
What framework is appropriate to encourage researchers to share data, to be able to do so easily, to be able to cite it, to control access, to maintain it, ...?
ST: In the climate science community, they produced a publication that described each major dataset. DA: Some time ago we put a data set online, it's been used in 150 papers, none of which has cited it.
- GRAND CHALLENGE 2
How you compose the work of multiple modelling groups or multiple data curation groups, even within a single discipline?
FM: Is this providing core, generic, technology or a set of methods you can teach people? MA: The challenge isn't stated at the level of technology but can also include sociology etc. ST: The problem is particularly with distributed communities.
- GRAND CHALLENGE 3
How can we find more champions in, and disseminate skills across, more disciplines?
AV: Is the question about comparing who people work in different disciplines and training people to do this?
- GRAND CHALLENGE 4
Correctly identifying where to use top-down and when to use bottom-up approaches.
FM: Is the leader of a project is not the business owner, the chances of success are low. AL: You can't produce a complete stalinist economy but e-science centres can identify products that people will want to use.
- GRAND CHALLENGE 5
How do you use 30 years of satellite climate data to evaluate and refine climate models.
- GRAND CHALLENGE 6
A researcher should feel that all the world's data sets are in their laptop. (Analogous to the WWW).
- GRAND CHALLENGE 7
Modelling the fly brain is an example of having to integrate data and computational models across multiple levels from molecular events to how an organisms behaves/interacts with its environment.
- GRAND CHALLENGE 8
To get academic recognition for all the work required to make e-science happen.
ST: Perhaps an analogy is with statisticians, who are one of the few academic disciplines that focus on working with others.
- What are the steps to address those challenges?
MA: We need to solve particular cases before we attempt to do it more broadly.
AL: Identify some problems that can be solved centrally and then be rolled out. [E.g. a national file system]
FM: Areas of success are where leaders have come from the discipline area.
MA: We need a journal of e-Science
We need funding models to establish common formats or vocabularies in particular areas.
AV: How can the e-science community distill the experience gained by the leading-edge projects known more widely. MA: Perhaps we need people whose job it is to do this. FM: It would be good if the articles in the NeSC newsletter could have more information about this. DA: The overhead in communicating is a problem for smaller projects - larger projects often have dedicated staff and resources for this. AV: In JISC terminology, it's the Service Usage Model. A key oint is understanding how your problem is similar and how it differs. ST: This is e-Science as Knowledge Transfer. MA: We need to write this down more - which things work and which don't. One of the problems is that we've had to pretend that everything works. DB: Do we know how to evaluate e-science methods?
- How do we need to change to meet these challenges?
- What is the road map to get e-Science ready for 21st Century challenges?
PEOPLE
- Freddie Moran (chair)
- Doug Armstrong
- Andy Lawrence
- Simon Tett
- Alex Voss
- Malcolm Atkinson
- Dave Berry (reporter)
INTRODUCTORY SALVOS
AL: Central command hasn't worked cross-discipline. Astronomers have used this approach within a discipline. AL has sometimes felt out of place at previous e-science meetings precisely because the astronomers were doing things differently. Astronomers around the world have agreed on standards formats/protocols for publishing data. But if you get too ambitious it becomes difficult to involve everybody - it's hard enough to do this for astronomy. Approx 30,000 astronomers worldwide. Approx 30 major data centres.
AV: Social Sciences have been struggling from the beginning to agree on many things, e.g. what is "society", what is the unit of analysis? Nevertheless, they do draw on e.g. census data. [MA: They also define their own spatial data standards and temporal data standards.] AV: Different types of social scientists will make different use of e-science, e.g. econometricians vs. ethnomethodologists. Some make notes on paper only, at the other extreme people are modelling societies on the NGS. AV: Command e-science is one model of innovation. If we look at models of innovation, top-down/bottom-up is one dimension. Where to the people who make the decisions get their authority? Are they just hoping that the standards they define will get adopted before the next wave of technology? They need to be well-aware of what else is going on.
ST: ST has been working in the Met Office, which seen from outside is a command-and-control organisation. In mid-90s, began work on a vocabulary to store climate models. At that time, there were only 10-20 groups worldwide capable of running climate models. Bringing data together, particularly satellite data and comparing it with models, is a major challenge. Other thoughts: no individual or even a single institution can build all the components of a climate model (e.g. ocean modelling, ice modelling, atmosphere modelling, ...). How do we persuade the academic community to adopt a software engineering process and framework to link these together, when there is no academic incentive to do so. In geography there are many researchers who aren't as numerate as physicists; how can we bring them together to analyse data and answer shared questions (policy, models, etc.)
FM: Is the e-science technology being pushed or pulled? At what levels of a technology stack are the disciplines working - shared computational resources, shared data, analytical applications.
ST: ST's prejudice (his word) is that computer scientists aren't much help. Computational scientists within the discipline drive things along.
DA: For Biology, some of the same problems still remain since the early 90's. One in particular; if someone submits data to a central repository, how can someone else cite that data set? There have also been some examples of big success stories - i.e. sequences based databases and applications.
DISCUSSION
TUTTI: There was a general discussion about the role of computer scientists and the content of computer science education. All agreed that computer scientists should not drive the process but should advise it. Computer science skills can usefully contribute.
FM: Computer scientists advance the fundamental technology, e.g. databases.
MA: The GEON project had tremendous success using the UK-developed ontology for geology but had a major problem trying to get a consensus on how to describe time, from geological time to modern time. The e-scientists who are moving between projects can bring this knowledge to new groups.
DA: Very large scale projects like the Allen Brain Atlas are able to impose standards if they wish.
MA: Are there things we can fix across disciplines (e.g. OGC)?
AV: Would this knowledge be spread by information scientists rather than computer scientists? A lot of computer science is very decontextualised rather than on applied to particular problems.
DB: Is there a problem publishing "applied" computer science, particular e-science, in CS journals?
MA: We need a good journal for e-science.
DA: Many of the "pure" computer scientists at Edinburgh do not actually use computers much. DA publishes in biology when possible because impact and citations are higher (and often it's more relevant).
ST: Is there a disconnect in CS between undergraduate teaching (what students want to learn) and postgraduate research?
FM (to AL): What is your view of e-science - is it models, data, ...?
AL: At one stage AL was worried whether the development work in astronomy would actually bear fruit, but eventually realised there was no need to worry. The first rule of e-science is that you never get the credit; people just use the technology. There is no point asking the CS people to solve the domain-specific processes. What did make a difference was learning about web services, XML, UML: these gave ideas and discipline that enabled the astronomers to take their ideas forward. Progress on standards at the IAU is painfully slow. If command means top-down from that "official" level, it won't happen. Rather, round the world there were a bunch of projects of people being paid to solve the problems. Only then does it have to be ratified - and there is a hypothetical risk that the IAU won't adopt it.
FM: The demand came from the user community. AL: The need came from the user community, but visionary people had to identify the problem. ST: How do you find the resources? MA: This is where a command economy might help, because no single project can do it alone. AL: The community need to get together; each partner has to bring its own source of funding [so presumably the funding agencies have to agree to some extent? - DB] ST: In the climate modelling world, much could be gained by having standard formats for modelling data, but this hasn't happened as quickly as we would have liked.
MA: Can we have a national file system with agreed standards for naming, secure access, etc. ST: Would this empower the user? MA: Do people want to manage files on their own machines or do they want a pervasive system. ST: This might end up in the same state as Grid, which is too complicated to use. MA: I'm asking if there are simple things that can be provided across disciplines.
AL: When the astronomers started, they looked at the technology available but it wasn't ready or didn't quite meet their needs; hence why they developed their own. A common file system would have been ideal. This would be the one thing AL would like e-science to provide.
AGREEING LIMITS
MA: What are your principles for deciding what to agree on and what to leave to the wider ecosystem?
AL: Partly a scale problem - how many people to speak to, how many different technologies involved.
AV: There is already a community of people doing standardisation research.
ST: A command solution has to be extensible, so that groups have adapt it to their particular needs.
PLENARY DISCUSSION
There was mention of standards activity for naming and defining data, originating from the NIH on instruction from the US government. Is this an example of Command e-Science?
There was a little discussion about whether the discipline-specific challenges are examples of command e-science or extreme e-science, or both.
