This wiki service has now been shut down and archived

Cloud Computing

From ESIWiki

Jump to: navigation, search


Discussion commenced by Freddie Moran (Transferred from an email discussion with permission.)

Links to Relevant and Associated Articles, Blogs ... etc

I do not know if its an eSI or NeSC responsibility, but when I read this [Business Week Article] it's a big part of the future for e-Science. Tony says as much in the article. Also its' not in competition with Grid, in fact grid technology is a key part of the infrastructure / underpinning. The more I think about it the eSI needs to get on board or is going to miss the next wave of e-Science. There is also a pragmatic role for NeSC in taking the initiative in running workshops. also where are the first "cloud" universities outside the US are going to be - China, India, or the UK? Also who is going to drive a joined up approach bid for the UK e-Science community? Freddie Moran (29 Dec 2007)

I agree with you that the "cloud" is not a major perturbing influence from the technical point of view and that grids and large-scale data are key to delivering "cloud" services. I think there are two senses in which the "cloud" cannot be ignored.

  1. Prevalent and widely used services that people think of as part of the "cloud" demonstrate what stability, usability and responsiveness can be achieved - this raises expectations and provides re-usable models.
  2. The "cloud" appears to offer economic models and "solutions", which research funders may misinterpret - the business case which sustains the cloud has to be carefully examined so that we can show that the same business case could not underpin the management of scientific data - e.g. the acquisition and interpretation of 10 years of NERC buoys in the Atlantic, or the synoptic sky survey.

I think the businesses that have built the "cloud" and that will emerge in the "cloud" will also provoke technological and business innovation that will change costs and hence change technical and architectural choices for e-Infrastructure provision. Malcolm Atkinson (03 Jan 2008)

Agree with your comments, and I expect you will reference / include in positioning the eSi strategy - good. Freddie Moran (03 Jan 2008)

Clouds are very much part of the picture. One of the reasons I don't use the word more is that I'm more focused on the use and the users (and keeping my story simple!) so I'm promoting "new e-science" - for me, clouds are part of the new e-science infrastructure! And cloud computing is being pushed by Savas and Tony and I think they're doing a good job.

There is an interesting implication for the middleware story. Cloud computing doesn't advocate middleware in the way middleware has become established for Grid, i.e. as a universal layer (the national grid all running at 50 or 60 Hz). Rather clouds can be built using whatever cloud-specific middleware solutions they wish and then expose simple core services. Hence clouds raise an important architectural debate. Dave de Roure (04 Jan 2007)

I thought that this would be your view, and I think it is a helpful point to make. However, in order to expose core services so that they can be used, they have to present their services in some standard ways in order that others can use them. As the set of alternatives increases, and as the scope of these services increase, then the requirement for consistently adopted standards increases to enable tool / user-oriented facility providers to easily use and compose those services.

In my view, the grid story was a first (or nth) attempt to understand what those standards should be - not an attempt to railroad every provider into using the same technology / middleware to implement those standards. It is probable that relatively few of the rival standards around today will survive the ecological competition of the market place. But it is important that such standards emerge - they will be as important for making a globally usable information infrastructure as the extensive set of standards that underpin today's global telecommunications and the services they support.

There is also another important aspect to grids and e-Science. The scientific data management, interpretation and analysis services do not have a business case that supports their construction in the same way as the services within the "cloud" are supported. They cannot be constructed on a pay-per use plan or depend on advertising revenue. The e-Science movement is an attempt to understand how to muster the resources and develop the "business" case for deploying them to build the information systems needed by science. Grids were a rallying cry to pool effort into building commonly required components for those information systems that would not otherwise have been available on the required time scale.

In my view, whatever the rallying cry, global collaboration is still needed to design and build (or commission) those information systems that will underpin and enable research in the majority of disciplines for years to come. I believe that there are many shared information facilities that are needed for today's major challenges (e.g. the four selected as cross-council priorities by the UK government) that cannot be achieved without a huge international effort. That effort will involve a great deal of computing science research, a great deal of engineering, much investment in systems infrastructures and a global commitment to sustain and operate the resulting information systems.

My deep worry is that the "we each do it our own way" ethos of the "cloud" is antithetical to the required collaboration. Malcolm Atkinson (04 Jan 2008)

I think we both argue for ease of use of the glue that holds all this together. You argue for standards while I argue for simplicity and the power of the people - Grid (eg OGSA) is one approach but my suggestion is the glue will occur through self-organisation around a set of easy to use technologies. It may well be the case that if we can solve the grid challenges we can provide a much better engineered solution for the problem you describe (e.g. handling the multiple alternatives) but most scientists don't know they have this problem yet and as they discover it other solutions may emerge in the ecosystem. It could be a VHS vs. beta story. This is the "better not perfect" angle on adoption.

So perhaps grid, as in a set of glue standards, is not being adopted because it is solving a problem that only some users know they have at this stage, and perhaps it never will per se because other solutions for coupling our systems will emerge. Users won't wait for a solution to a problem they don't know they have when there are solutions already available for the problems they do know they have! Clouds and Web 2.0 are there.

While I may argue against the viability of such a coordinated Grid at this time, I do see the value in coordinating e-research and e-infrastructure activities - it lies in the pieces of the picture that benefit from that level of coordination, collaboration and sharing. So it can help a lot with the glue, and with making things as easy as possible, and managing the complexity. This is really important. I think it should therefore embrace the broad set of technologies in the ecosystem and optimise their use for e-research. Some of the grid work is in that ecosystem. You describe Grid as components in delivering information systems, and under that definition a lot more of the pieces could be called Grid if you like.

As for telecommunications, I would love a debate there (recall I used to work for BT...!) TCP/IP was the easy to use inter-network solution with a simple API that overcame the over-engineered 7 layer model and UK coloured book solutions. And it was banned by some universities even as product was being deployed (eg Sun workstations). Highly engineered ATM, great in backbones, yielded to the simplicity and non-determinism of Ethernet. IPv6 is better engineered than IPv4 but the ecosystem has flourished dealing with the problems of IPv4 so v6 hasn't enjoyed rapid adoption. Indeed, the heterogeneity of the core telecommunications infrastructure is now a given, and the challenges are in managing the complexity. As you sometimes tell us, history repeats itself  :) Dave de Roure (04 Jan 2007)

And needless to say, I agree with Dave. I think there is a profound difference in philosophy between clouds and grids.

To echo Savas' mantra, we should build infrastructure using well established existing Web standards. But that is not to say there should be no experimentation with other approaches but I believe users will ultimately decide. Tony Hey (04 Jan 2008)

I think I both agree with you and disagree with you (Dave and Tony) depending how much I drill down in detail to consider each point.

I plan to explore the detail a "little" in order to better expose the issues - I think the purpose of this discussion is to try to identify challenges for e-Science, which are key to addressing the major (research) challenges facing today's global community. These "major challenges" are typified by the UK governments recent cross-council priorities: Energy, Living With Environmental Change (LWEC), Global Threats to Security and Ageing: Life-Long Health and Wellbeing. The emerging e-Science challenges are those which should determine the e-Science roadmap that eSI will produce.

To drill down I have been considering a matrix of science activities. I use the term "science" to cover any endeavour where people are engaged in developing better knowledge and understanding of any aspect of reality: from fundamental particles, universes, stars and planets, through climates, earth systems, ecologies, communities, organisms, organs, cells, biochemistry, chemistry and materials, to medicine, engineering, transport, law, behaviour, speech, language, art, performance and literature. These "fields" of scientific application form the columns of my imagined matrix.

The other dimension of this matrix is the activities scientists undertake to develop knowledge and understanding. Here again, one could illustrate the endless variety of activities but I choose to characterise them into a five archetypical species of activity that will form the rows of my imagined matrix. They do not have a logical order as they feed on each other. (Malcolm Atkinson 09 Jan 2008)

For the very long article by Malcolm that develops this discussion go to Malcolm's Matrix

You all seem to be having a long discussion about extremely ill-defined concepts. The Business Week article was almost content-free - I think The Register's comment debunked it rather well. At most, the Business Week article described (i) a cluster provided by IBM and (ii) a utility computing system - both concepts which have been described by the word "grid" in the past.

Engineering is about the application of fundamental principles in a measurable system. If there is a clear definition of "cloud computing", could one of you point us at it? Until I see such a definition, I shall assume that this is just another round of marketing hype from the computing industry. (Dave Berry, 10th Jan 2008)

I had tried to divert the discussion from the ethereal issues of is X a good thing or a bad thing - to a more constructive discussion about what is e-Science trying to do? What are the elements of those tasks? In what way can we draw on X & Y to help us address those elements? What else needs to be done in order to achieve the tasks? I had hoped that this would lead to deeper understanding of the important milestones and feasible routes on our e-Science Road map.

So I hadn't worried too much about what X was (the "Cloud") any more than I had worried about what Y was (say the "Grid" or "Security" or "Data Mining") as all X & Y have a certain fuzziness at this level of discussion. I had assumed that the "Cloud" was congruent with "Web 2.0", which was described and defined by Dave De Roure at last years summer school as being based on a set of principles. Dave, are the "Cloud" and "Web 2.0" the same thing?

Malcolm 14 January 2008

I thought these were rather interesting -

-- Anna 16:46, 14 January 2008 (GMT)

Thanks Anna, Taking a look at the wikipedia article, the same description, aspirations and architecture could have been (and probably was) written by the ANSA project in 1984 (Andrew Herbert & Joe Sventek). That led to CORBA, Microsoft distributed computing, grids, and so on, all with the same general principles and goals. We get progressively better, in the sense that the cloud of resources becomes more accessible, more robust and less complex to manage. But there is no magic. Someone else is not going to do it for you unless you pay.

How you pay may be:

  • by contributing data that is sufficiently interesting it attracts advertising or businesses data mining customer behaviour - not an option in the majority of e-Science,
  • by paying a service charge against a service-level agreement - to get a 10-year commitment to hosting petabytes of data/year would have a very high charge from any commercial vendor (I suspect) compared with what scientists are prepared to pay (explicitly) for ICT resources - what they actually pay when all staff time and buildings, etc. are considered may be different!
  • by having your own IT services set this up for you, which is possible for large-scale operations such as EBI and NERC data grid,
  • by using a university facility that is paid for by various forms of top-slicing and rental - this works if the funding stream has sufficient persistence for the data and resource preservation you need,
  • national top-slicing (a la HECToR - congratulations on the launch yesterday btw) from research councils and funding councils - this works if enough people believe the top slicing is worthwhile and if they can keep convincing the successive keepers of comprehensive spending reviews that the facility should be maintained - there are vulnerabilities, see the AHDC saga.

In a sense this is revisiting the old territory - do I get my computing from a resource organised by someone else - or do I get organise it locally. In fact a false dichotomy - we don't want to organise our own global IP network or our own DNS servers. We do want to choose large enough characters for our eyes in the panels on our screen. But the middle ground is really interesting. And something has changed.

It used to be the case that if someone else organised our computing resources they made a great many decisions into which we had to fit and they had access to all of our IPR. Now neither are true. Low-cost virtualisation means that

1 We can run code in a virtual machine of our choosing, getting the compute environment for our code we want wherever we run - so we can do what we like the provider is protected from our code, and we can go to another provider and use your virtual environment unchanged. There are limits as to how complex this virtual environment can be (so far) but it is by no means restrictive for most people. 2. The virtual environment mechanism protects the provider's facilities and also protects users from one another - it could develop to allow parts of the computation and environment to have controlled isolation, e.g. middleware protected from applications, applications protected from middleware and the host computers protected from both (see eSI theme 8). 3 Storage can be encrypted and secured at relatively low cost (see eSI theme 8) and hence the IPR of a user on a hosted environment can be protected from most forms of internal attack.

I think that this economic use of trust platforms and virtualisation, which is a common feature of current services in the cloud is bound to have a large impact on the way we organise and deliver computational support for e-Science. This will have significant socio-economic as well as technical effects that need to be understood. It is possible that the "cloud" encapsulates understanding of those effects. If so, can someone point to a good description of them please.

What I do not expect is that the cloud will offer us a free lunch.

Malcolm 15th January 2008

Malcolm, your discussion of payment options seem to assume a scenario where a research community has to maintain a large amount of data over a long period of time. What about scenarios where a single research group wants some computing power for a short period of time? They can just pay for the service they use - rather than requiring someone (their dept/Uni/NGS) to install and operate a dedicated facility.

Also, there are at least two ways we can consider the relevance of cloud computing: (i) as users/customers, (ii) as a technology that we can use in creating our own services. (The same split applies to Web 2.0 - we can use commercial services and/or we can use the technology to implement our own services).

Dave Berry, 16th January 2008

Malcolm said "Storage can be encrypted and secured at relatively low cost (see eSI theme 8) and hence the IPR of a user on a hosted environment can be protected from most forms of internal attack." I guess I should bite at the mention of my theme (Trust and Security in Virtual Communities).

This is true - indeed, it goes further than just storage. We just about have technologies which offer very strong isolation guarantees among virtual machines, and these promise to enable much more adventurous distributed storage and processing applications than have hitherto been conceivable. However, to make that tractable, a good amount of additional infrastructure is needed.

Dave De Roure anticipates "self-organisation around a set of easy to use technologies." That's been the long-standing pattern of the internet. But the abject failure to date has been around security: not least because it is an afterthought in most architecture considerations. The WWW is a total mess when it comes to security: the proliferation of username/password, site based licencing for journals, occasional use of personal certificates, and so on, is tying everyone in knots. The outcome is neither usable nor, in practice, terribly secure, even if good quality building blocks are available.

The world of WS-* offers some potential standards, but take-up seems to remain patchy.

I know that special pleading for security never goes down very well. We can do a huge amount with "good enough" security - indeed, we can achieve a huge amount with essentially no security at all: email is a case in point. On the other hand, you may notice that email is creaking under the strain of spam. And our failure to achieve widespread consensus on stronger security for email is the reason that for "important" documents I still have to sign with ink and consign them to the tender care of the Royal Mail.

Sorry if this seems a diversion from the discussion of cloud computing: I have little doubt that we must indeed choose the best (and simplest!) available components for our infrastructures, and invent new ones only as a last resort. But if our infrastructure choices fail to achieve suitably low levels of security risk and high levels of usability, we may not only live to regret it ourselves, but we will assuredly build solutions which do not meet the ethical and commercial concerns of many potential beneficiaries (or, indeed, providers of services and data).

Andrew Martin, 31st January 2008

Found this by chance:

Belfast e-Science Service Hosting Cloud (December 12 2007)

Anna 11:30, 4 February 2008 (GMT)

Found this also by chance - my presentation at the opening of the NeSC almost 6 years ago !.


Read 5th Utility to be Cloud Computing today, and how key Grid has been as an enabler.

This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.