This wiki service has now been shut down and archived
Data-Intensive Research Workshop: Monday Introduction: Soliciting comments
Please add any comment you would like to make about the Monday's DIR Workshop programme. It can be specifically related to a talk or be general. Please separate entries with headings and add your signature. There is a separate page for comments on the Research Village.
Purpose of Monday's Open Programme
If the content or organisation for today doesn't work for you it is probably my fault. What I hope will happen is that everyone will feel welcome and find at least some of the day interesting and informative. I hope useful ideas will emerge and those who can only manage today go home with new knowledge relevant to their work.
I hope you'll all encounter new people as well as meet people you know, and that conversations about DIR will start building that will go on all week and beyond. I have put headings below to group comments by the sessions they refer to. Please enjoy the day. --MalcolmAtkinson 15:10, 5 March 2010 (UTC)
Registration and Opening (Dave Robertson)
Welcome and Setting the Agenda for the DIR Workshop (Malcolm Atkinson)
Strategies for exploiting large data (Alex Szalay)
Interesting point Alex made is that the integral of data volume is increasing over size of data sets: large data sets are getting larger, but also the number of small data sets is increasing fast.
Galaxy Zoo shows a good example of scaling interaction: by involving 10,000-20,000 volunteers a much larger amount of data can be analysed. Also, it allowed validating the use of public in this case was not worse than using epxerts.
Alex speaks of a paradigm of enabling interaction with turbulence simulation where people can essentially experiment with a simulation in a box, putting in their probes and setting parameters for the system. Such interaction facilities are necessary as almost nobody can deal with the volume of data and amount of computation required themselves.
"Software is becoming a new kind of experiment." so, "how do we build a scalable architecture?", which links to Amdahl's law with IO as the main component. Current supercomputers fail to reach data sets over 20 Terabytes of data because of their architecture. --Jvhemert 11:29, 15 March 2010 (UTC)
Learning from Data in Online Advertising and Games (Thore Graepel)
Please add your comments on this other page.