This wiki service has now been shut down and archived
Technology Workshop 1
Return to the main page
Topic: Data streamings strategies
Much of today’s data is available as a continuous stream, from sensors or as a by-product of effectively continuous activity, such as the conduct of electronic business, the global interaction with games or Web2.0 services. Often, the rate of data emerging from these sources is such that it is infeasible to keep it and apply batch analyses. Frequently, the scientists want to use the information dynamically, e.g. on the recognition of a particular temporal variation in astronomic data, commission spectroscopic data capture by some other instrument.
In the data streaming model, input data are usually not available for random access, and the system has no control over the order with which data arrive to be processed. Moreover, data can possibly be unbounded in size. And finally, once an element has been processed, potentially it cannot be recovered - unless it is explicitly stored.
The preceding DIR workshop encountered many examples of data-streaming applications, especially in the scientific field, and identified the gains obtained and challenges to be overcome. By bringing together data-streaming experts, this workshop will develop that understanding further and propose research strategies. The workshop will analyse the experiences obtained in implementing real applications, and in building the existing technologies. In many cases, the experts will contribute their current research. The workshop will identify the factors that hamper the applicability of data-streaming architectures, and propose strategies by which they may be overcome.
Overview of alternative solutions: streaming-query notations, streaming data-selection and derivation algorithms, evaluating streaming-queries and the optimisation of data-streaming systems.
The workshop is by invitation only, because we seek to keep to fewer than 30 participants, so that we can invest a great deal of the time in creative and detailed small-group discussions. If you are working on data-streaming techniques or if your research has demanding data-streaming requirements, please email Malcolm (firstname.lastname@example.org) or Paolo (email@example.com) with a note about your reason for wishing to attend and we will almost certainly invite you, unless we have reached or maximum number plus a few reserves.
The meeting will be held in the Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB. More information can be found on the Informatics Forum's location page.
Provisional outline programme
- Tuesday 15th February - Domain Applications for data streaming
- Wednesday 16th February - Technological solutions
- Thursday 17th Feburary - Domain Applications for data streaming and technological solutions
Biographies of speakers
More will be added as they become available.
Yanif Ahmad is an Assistant Professor in the Computer Science Department at the Johns Hopkins University as of Fall 2010. He previously held a position as a postdoctoral associate in the Database Group at Cornell University, and received his Ph.D. with the Database Group at Brown University in 2009.
Yanif's research interests span stream processing engines, declarative languages and large-scale database systems. His recent research focuses on the foundations of incremental query processing for data streams, novel approaches to query compilation for holistic optimization, declarative query optimization and model-based processing on mathematical representations of streams. His work brings techniques from databases, core systems and programming languages to bear on the principles of database system design, to meet the diverse functionality and performance requirements of handling large, dynamic datasets. Yanif is the recipient of several awards, including an IBM Ph.D. Fellowship, Best Paper Award at the ICDE 2008 Conference, and Best Demonstration at SIGMOD 2005.
Malcolm Atkinson is Director of the e-Science Institute and, as the UK e-Science Envoy, represents the UK at the e-Infrastructure Reflection Group (e-IRG) and the European Grid Initiative (EGI) Design Study, and is a member of the Joint Information Systems Committee Board and JISC Support of Research Committee. He plays a leading role in the Open Middleware Infrastructure Institute UK, and is on the advisory boards of the National Grid Service and Baltic Grid. He led the International Summer School on Grid Computing from 2006 to 2008.
He began his career in computing in 1966. He has worked at seven universities: Glasgow, Pennsylvania, Edinburgh, UEA, Cambridge, Rangoon and Lancaster; and for two companies: Sun Microsystems (at SunLabs in California) and O2 (an Object-Oriented DB company in its early years in Versailles). He led the development of the Department of Computing Science in Glasgow and is now Professor of e-Science in the School of Informatics, University of Edinburgh. He has more than 150 publications. His current research is concerned with data integration and its exploitation. He is currently the lead architect on an EU Framework Programme 7 project called Advanced Data Mining and Integration Research for Europe (ADMIRE).
Roger Barga is currently an architect in the Cloud Computing Futures (CCF) group in Microsoft Research (MSR), where he leads a technical engagements team that works with researchers interested in carrying out large scale computing, scientific research, and data analytics on the Windows Azure cloud. Prior to joining CCF, Roger led the Advanced Research Tools and Services (ARTS) team in MSR which built innovative services and tools for data intensive research. Roger joined Microsoft in 1997 as a Researcher in the Database Group of Microsoft Research, where he participated in both systems research and product development efforts in database, workflow and stream processing systems. Contact him at firstname.lastname@example.org.
Eoin Brazil leads the Irish Centre for High End Computing (ICHEC) Technology Transfer activities. This includes working directly with a range of companies and industries bringing problem-solving solutions, customised software development skills and high performance computing resources to address their business problems. This builds on his past experiences in technology transfer of research and his research work the Interaction Design Centre in the University of Limerick where he held various project and technical roles. The focus of ICHEC's current activities with companies includes projects in cloud computing, analytics and high performance computing. He is also working on various aspects of the European PRACE HPC project within ICHEC.
Jean-Paul Calbimonte is a researcher and PhD student in the Department of Artificial Intelligence at the Faculty of Computer Science at Universidad Politécnica de Madrid, and is a member of the Ontology Engineering Group since 2009.
He worked previously as a researcher at Universidad Católica Boliviana and as a software architect at Piramide Informatik SRL. He holds a Master degree MSc in Computer Science from École Polytechnique Fédérale de Lausanne, EPFL, Switzerland (2007) and graduated as an Engineer in Computer Systems at Universidad Católica Boliviana (2004).
His research interests are mainly focused on Semantic Web and Data Integration. In the context of the Semantic Web and Data Integration research lines, he participates in the SemSorGrid4Env European project (Semantic Sensor Grids for Rapid Appication Development for Environmental Management, FP7-ICT-223913). Under the scope of this project, he works on approaches related to the integration of different and heterogeneous data sources including sensor networks, using semantic technology and ontology-based query answering. The proposed solution uses mappings from streaming schemas to ontological views; and uses query rewriting techniques to transform queries over ontological entities, taking into account temporal and spatial information.
Ally Hume is a software architect at EPCC, The University of Edinburgh. He has worked on data access, integration and analysis projects while working on the OGSA-DAI and ADMIRE projects. He is also the scientific and technical manager of the EU-funded BonFIRE project that is building a Future Internet federated cloud testbed to support Internet of Services research.
Elke A. Rundensteiner
Elke A. Rundensteiner is a Full Professor in the Computer Science Department of Worcester Polytechnic Institute (WPI), and the director of the database systems research laboratory (DSRG) at WPI. Elke received her B.S. degree (Vordiplom) from the Johann Wolfgang Goethe University, Frankfurt, West Germany, in 1984, a Master's degree from the Florida State University, Tallahassee, in 1987, and a Ph.D. degree from the University of California, Irvine, in 1992; all in Computer Science.
Prof. Rundensteiner is an internationally recognized expert in databases and information systems, having spend 20 years of her career focussing on the development of scalable data management technology in support of advanced applications including business, engineering, and sciences. Her current research interests include scalable data stream processing, query optimization, complex event analytics, information integration and visual exploration, and data warehousing fordistributed systems. She has over 300 publications in these and related areas. Her publications on view technology, database integration, and data evolution are widely cited, and her research software prototypes released to public domain have been used by academic and non-profit groups around the world. Her research has been funded by government agencies including NSF, NIH, DOE and by industry and government labs including IBM, Verizon Labs, GTE, HP, NEC, Mitre Corporation, and others.
She has been recipient of numerous honors, including NSF Young Investigator, Sigma Xi Outstanding Senior Faculty Researcher, and WPI Board of Trustees' Outstanding Research and Creative Scholarship award. In 2010, Prof. Rundensteiner has been awarded the 2010 Chairman's Exemplary Faculty Prize, an award which recognizes faculty members who, "as true exemplars of the university's highest aspirations and most important qualities, excel in all relevant areas of faculty performance". She is on program committees of prestigious conferences in the database field, has been editor of several journals, including Associate Editor of the IEEE Transactions on Data and Knowledge Engineering Journal, and of the VLDB Journal, and PC chair of several conferences, most recently EDBT'2012.
Prof. Rundensteiner runs the DSRG group.