Heterogeneity

From ESIWiki

Jump to: navigation, search

Topic leader: Dr Tom Elliott, University of North Carolina at Chapel Hill
Rapporteur: Dr David Wheatley, University of Southampton

Contents

Mashup or messup? The choice is (not) up to us

Tom Elliott: Director, Pleiades Project, Ancient World Mapping Center, University of North Carolina at Chapel Hill, U.S.A.

I have occasionally described the Barrington Atlas and its companion Map-by-Map Directory as "a giant spatial index into the scholarly literature for Greek and Roman places." It was the editorial policy of the Classical Atlas Project (1988-2000), which produced the atlas, to require its scholarly compilers to furnish one or more relevant citations of secondary scholarly literature before a feature could be mapped in the atlas and listed in the directory. Where published secondary work was lacking, primary source citations and (rarely) the authority of named scholars working at the site or in the region were admitted. This practice has obvious benefits for both the editors, and the users, of the atlas. It also hints at the vast and varied universe of information that was searched, sifted, collected, interpreted, synthesized, argued and adjusted in creating the 99 maps and 1,500 pages of supporting data in tabular format. In fact, these tables and maps can be viewed as a regularized, well-structured user interface to an inherently heterogeneous dataset, itself compiled from multiple, differently structured sources that had been created and published for the widest imaginable range of purposes.

Scholars (humanists and scientists alike) work at the interface between heterogeneity and homogenization. We regularly wrestle with complex, chaotic and often contradictory or ambiguous sources, empirical data and prior conclusions in an effort to produce new interpretations that advance knowledge and inform future investigation. When we get it right, new understandings emerge. When we get it wrong, we get pablum.

The traditional scholarly article, monograph or reference work in print almost always constitutes a remix of other data and scholarship, or presents findings based on analysis of such a remix. Even empirical datasets created through laboratory or field work arise through interpretative and classificatory processes that produce new order from observed chaos. Standard bibliographic citation, recognized conventions for textual apparatus, explicit invocation of theoretical interpretative positions and exhaustive descriptions of methodology are all effective techniques for signaling -- to human readers -- underlying heterogeneity and potential discontinuity. Confidence intervals and other statistical measures provide additional qualification and context for some numeric data.

Recent advances in computing, digital culture and scholarly practice are opening up new possibilities and potential pitfalls. In particular, the rising popularity of virtual globe software, neogeographical computing practices and mashups are lowering barriers to 3D visualization as a tool for teaching, research and recreation. The corresponding upsurge in interest is driving the development of easier and more powerful mechanisms for harvesting and aggregating spatially referenced data. Many of these mechanisms are quick and dirty; they bypass the elaborate schemas, protocols and metadata content standards developed by the geospatial computing industry and science funding bodies. Instead, they favor simplicity, economy of expression and lowest-common-denominator web patterns.

Yet -- despite a proliferation of data models and encoding formats for feature services, gazetteers, earth browsers and geographical tags -- it has proved impossible for us to encode all aspects of our project's legacy dataset (the Barrington Atlas itself) in any single standard schema. At present, we are using (internally) a "frankenformat" in which the simplest and most useful pieces of various schemas are ganged together to provide the needed data transport. The most obvious shortcoming of this approach is its idiosyncrasy. No one else has existing code that can parse this format, so publishing our data in it would have limited value. For data interchange we have so far relied on various more standard serializations (KML, Atom + GeoRSS), but do so at the cost of "dumbing down" our data.

It seems to me that one of the more urgent tasks facing archaeologists, historians and other humanists interested in establishing production-level spatial services and methods (or publishing work compatible with them) is the resolving of this data encoding and interchange problem. I am not arguing for the creation of yet another schema or protocol. Rather, I think we must renew efforts to engage with the existing format-and-tools communities to advocate for our needs. We should especially push for the adoption of solutions that can be used, unchanged, across multiple spatial data formats and that, preferably, have already wide use or an active development community. Among the present gaps are:

  • Robust methods for communicating uncertainty, accuracy, precision and similar factors (both qualitative and quantitative) in computationally actionable ways
  • Well known and widely implemented conventions for the (carto)graphical interpretation of such indicators in data
  • Similar mechanisms for transmitting and surfacing novel representations of scholarly process or data provenance now becoming possible for born-digital works
  • Flexible and precise ways to associate events (including durations) and subjects (tags, categories) with places and names and communicate these associations
  • Non-idiosyncratic citation formats for primary and secondary sources (in both print and digital form) that communicate roles (attestation, provenance, argumentation or additional information) and that can be easily mashed up with 3rd-party bibliographic and document-delivery services

To engage successfully with other communities in pursuit of these goals, we must not only attend the same conferences and invite outsiders to meetings convened for the purpose of collaboration, but we must also conduct case studies with real datasets and existing formats, and then publish the results of those experiments. Such work will inevitably involve review of more than data formats. Conceptual models, digital creation processes, editorial workflow and preservation strategies will all undergo evaluation and change. And we must seek more than a series of schemas and associated technical documentation. A body of published best practices, backed up by accessible, exemplary resources and services, is essential.

My talk will touch on these themes and needs. I hope to illustrate many of the underserved requirements outlined above with concrete examples from our project and its dataset. I'll also suggest some concrete steps that can be taken in the coming year to redress the gaps.

Case Study: The challenges of delivering geospatial data for archaeologists

Dr Stuart Jeffrey, AHDS Archaeology

Archaeology as a discipline generates a significant volume of geospatial data for the purposes of curation, cultural heritage management and academic research. In addition to the contested nature of much of this material and the significant legal and ethical problems it can raise, simply finding efficient methods of delivering this data can be challenging. The Archaeology Data Service/AHDS Archaeology has taken a lead in drawing together heterogeneous geospatial datasets and presenting them in appropriate contexts for research reuse. This talk will outline these approaches as well as looking to the future and highlighting the complex interaction between interpretational debates and data presentation.

Case Study: Ptolemy's Error: Truths and falsehoods in heterogeneous spatial data

Leif Isaksen, Oxford Archaeology


Back to programme

 * Scale
 * Heterogeneity
 * Standards and Metadata
 * Main page
Personal tools