This wiki service has now been shut down and archived

Thursday DIR Breakouts

From ESIWiki

Jump to: navigation, search

Return to Workshop wiki Main Page


Break-out Sessions on Thursday

Introduction to break-out sessions

After the morning talks there will be small groups meeting in "break-out sessions" in order to give everyone an opportunity to think about the issues and develop ideas. These breakouts will be on a mixture of topics that originate from the day's domain of interest, the morning's talks and the cross-cutting themes. There will normally be between four and eight concurrent themes. They may start at or after lunch and will finish at 15:45 or at a time specified by the day organiser during the morning. Refreshments will be available in the Chapterhouse during the last 30 minutes of a breakout period.

Each break-out group should identify someone to chair their session, someone to record their session and someone to report back to the plenary session at the end of the afternoon.

The reporting back should include:

  • What was the focus of the group?
  • What aspects of DIR were already working well? or Which of their methods or technology were already serving DIR well?
  • What are the current mission-critical challenges?
  • What strategy would the group use to address them?

(as reporting-back time will be short, the reporter should digest this into no more than 5 slides.)

We ask that one or more people record what happened during their group's session by adding an entry in the record below.

Interacting via visualisations of large and complex data

Notes uploaded by --Alastair droop 23:48, 18 March 2010 (UTC)


  • Alastair Droop
  • Mario Caccamo
  • Dave Liewald
  • Jerome Avondo

  • Simon Wong

Visualisation for investigation vs presentation

We make the definition between visualisation for the purposes of information discovery as separate from visualisation for the purposes of disseminating information. The latter (although difficult, and and art form) was not discussed further. Visualisation for data discovery was further discussed.

When we perform visualisation for knowledge discovery, we:

  • Lose information
  • Encapsulate many assumptions within the visualisation method
  • Need to consider both the creators and consumers of the visualisation
  • Need to remember that not all data are created equal

Mapping vs summarisation

We make a further distinction between visualisation techniques which attempt to keep as much of the original data as possible (mapping) and visualisations that attempt to summarise the given data. The required level of detail is an important consideration in deciding which approach to use. If a "lossy" approach is used, we need to consider which data we should remove.

This facilitates:

  • Summarisation
  • Information transfer
  • Data transformation

Questions and Challenges

Several questions were raised:

  • Is visualisation useful; and if so in which circumstances?
  • Is our choice of visualisation method dictated by human traits to find some methods easier to understand?
  • Is scale important?
  • How do we understand the complexity across fields?

We then defined four challenges:

  1. How can we facilitate interactivity?
  2. How can we best use multiple dimensions?
  3. How can we use scale?
  4. How can we maximise data to brain flow?

We discussed the first two challenges in more detail.

How can we facilitate interactivity?

  • Computational power is likely not to be an issue, as the rate of CPU and GPU power is increasing very fast
  • Good algorithms exist, but we must re-use what is already available, and push tools from development into the user space
  • Navigation of complicated visualisations can be standardised, and should use well known frames of reference and orientation (“breadcrumbs”)
  • Multi-user visualisation is likely to be beneficial (we can make breadcrumbs permanent across multiple users of a visualisation)

How can we best use multiple dimensions?

  • We can use new hardware (3D glasses, haptic devices, etc...)
  • We can use the tools and techniques coming from the computer games industry (for example games engines and physics engines)
  • We can consider using other senses, apart from only sight (sound, touch, smell)
  • Game engines provide a fascinating set of possibilities:
    • Easy 3D integration
    • Intuitive interfaces
    • We must work on the APIs, as these are tailored to the requirements of game designers, not scientists
    • We must explore these tools, and develop technologies that can utilise them

Interacting people via collaborative systems and collaborative data collection/curation

Example topics include:

  • scientific gateways
  • data and tool sharing
  • data repositories
  • ontologies and semantics
  • trust
  • reproducibility of results
  • peer-reviewing

Automatic capture and generation of metadata

Examples include

  • avoiding repetitious input
  • mining metadata information from data
  • instruments and software reporting relevant information
  • automated metadata cleaning and consistency checking

Interacting with the path from data to knowledge

Example topics include:

  • learning ramps to get people progressively using more advanced features
  • interfaces to analysis: web portals, service-based provisions of analysis, traditional and application-bound interfaces
  • analysis paradigms, such as choosing the appropriate analysis, dealing with pre-processing of data before and post-processing of results after analysis
  • programming paradigms, such as workflow systems, MapReduce, declarative languages

Interacting with Text (Text Mining)

Sensors everywhere!



a. Trust. Some of the applications – e.g. domestic applications for the elderly - are being tailored to benefit the individual and with their consent. But looking to the future and their implementation – who will provide these services? Who will help the consumer to decide which providers are trustworthy? What systems will be in place to protect the consumer?

b. Big Brother. Privacy. Data about individuals can be collected using sensors for reasons that are not necessarily for their personal benefit. It can also be collected without their explicit consent. Data collection can be by commercial enterprises. It is also collected by, and of interest for, other agencies, for example, employers, the police, the military. Issues of consent, privacy, confidentiality and control. How much does the average citizen need to know about sensor data collection and its use? Whose responsibility is it to raise awareness of its ubiquity? Is lack of control something that just has to be accepted as a part of modern life? Something that we learn to live with? What is the balance of personal versus collective responsibility for the collection and use of personal data?

c. Data security and anonymity. Integration of heterogenous data sets increases their power. The sum is greater than the parts. Is already happening in ways which undermine precautions taken to anonymise the data in individual datasets. (How) can anonymity be protected?

d. What’ the problem anyway?

e. Is there anything new? In the past people live in small communities where very little was private. How different is living with surveillance mediated through digital data?

f. Do the individual and / or collective benefits of optimising interventions to prevent or minimise harm outweigh the risks? Some people think the benefits – personal , or social and environment in terms of improving transport or reducing environmental damage – are worth the risks.

g. Some people know the risks but they just don’t mind. People volunteer information about themselves through Web 2.0. Is concern over data privacy a thing of the past?

h. Scenario research. Use various scenarios to explore social attitudes towards the issues raised by the gathering and use of data from sensors and software services. Feed into policy decisions


a. Opportunities and need for citizen participation in data gathering through sensors, for example in ecological monitoring projects. Need co-development of the social aspects in parallel with the technological aspects of sensors and their systems of implementation. Examples of good practice from SysMO Consortium and DataONE.How does it work in practice?

b. How does it work in practice? Getting beyond from the hype - the data are likely to be messy; the models are likely to be imperfect; the systems are likely to behave in unanticipated ways and have side-effects. Studies of the realities of systems based on data from sensors and software services.

c. What counts as ‘social’ data? Given what can be inferred about individuals from sensor data and social transaction data captured through software services (combined with the added ‘value’ of integrating heterogenous data), which data are social? For example, what can be learnt from the data provided through domestic smart metering?

d. The data never speak for themselves. Microsoft speaks of the ‘data-driven revolution’. On the other hand we know the data never do speak for themselves: data cannot be used without metadata. What metadata is being used? What are the underlying (or implicit) assumptions that are being used in order to compute with this data? What kinds of worlds – realities – are imagined and enacted through the development of these technologies?

e. What is the epistemology of what Microsoft calls the ‘new machine intelligence’? Microsoft is already predicting individual taste and purchasing behaviour e.g. re particular movies based on large database of past purchases by many people. They have developed a model that predicts personal preferences and individual behaviours. Yet it is not clear that it incorporates psychological theory. What kind of science / knowledge is this? What is its discipline?

f. What is this knowledge doing in the world? For example, it is geared towards optimisation, towards making recommendations. It is designed to influence behaviour, and to be the basis for various kinds of interventions. Is it trivial to personalise the options that are presented to you by a search engine? What is at stake? What options are not being given? What is being excluded in these optimisations?

g. Trust in (fear or) numbers. Lessons from forensic DNA profiling. Who can – or has - the confidence and power to challenge this knowledge? In what ways is it resisted? Or subverted?

h. future opportunities and challenges of the ‘data-driven revolution’ for social science research methods

- How can they be harnessed as part of a social science research agenda to enrich our understanding of the social? - What new research training do they require? - Will it be possible for the individual to master all of the skills needed to undertake research in the data-intensive social worlds of the future? - Or will there need to be multidisciplinary teams including socio-informaticians?

Dynamic, Distributed, Data-Intensive Applications Break-out

See Dynamic Distributed Data-Intensive Applications for background information.

This session is will include a mix of talks and discussions. The talks will cover several areas, ranging from real-time traffic simulations to weather modeling, where dynamic distributed data plays a critical role. The discussion session will take stock of existing programmatic approaches and try to identify successes, common approaches and existing gaps.

Add general comments here

With topic headings like the one above (two = signs) and subtopics like this:

This is an example subtopic

Please sign your entries, like this, by pressing the "sign" button above. --MalcolmAtkinson 18:16, 8 March 2010 (UTC) --Jvhemert 19:08, 8 March 2010 (UTC)

This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.