This wiki service has now been shut down and archived
Standards and Metadata
Topic leader: Dr William Kilbride, Glasgow Museums
Rapporteur: Stephen Stead, Paveprime Ltd
Standards and metadata in geospatial data: solutions for other people’s problems?
The topic of standards and metadata (and standards of metadata) in geospatial data are widely discussed by better commentators than this one, and it remains an area of active development. In this sense, the workshop will present us with latest thinking and novel solutions to well known problems. Rather than summarising - and doing considerable violence - to the work of the OGC and others with respect to standards, this short introductory paper focuses on three specific aspects of standards development. Purposefully contentious, it will be proposed that standards (and metadata) are characteristically extraneous: they are adopted not for their own sake nor for any internal logic. Instead, data standards can be categorised as solutions to other people’s problems.
In the first instance it will be noted that the role of standards development and promotion and training comes with a discursive price tag: that being ‘in the know’ or ‘on the inside’ provides an ineluctable professional authority which is not easily assailed or assessed. Secondly we are told that data standards promote data sharing. Experience suggests that the presumption in favour of data sharing within the academic community and the presumed role of the academic community in the context of a national spatial data infrastructure is more contentious than policy documents might lead us to suppose. Institutional infrastructure and legal impediments are predisposed to disrupt that aspiration to data sharing, threatening to render the standards hypothetical. Finally we are told to adopt standards for the longue duree: that preservation requires documentation. This is undeniably true but the complexities of geospatial data and the relative immaturity of the operational standards for trusted repositories means that conventional, archive-based models will be sorely tested to provide the sort of long term support that we quite evidently need.
This short essay is intended to be provocative. It is most certainly not the opinion of Glasgow Museums, nor of the JISC Geospatial Working Group. It is not even necessarily of the opinion of the author.
Reduction to language
Knowledge, we are told, is power. This could be stated more elegantly and precisely: literacy is discursive. It establishes, maintains and disguises relationships of dependency and autonomy. In that sense there is no naïve literacy and the infrastructure associated with the maintenance and monitoring of approved forms of literacy are not trivial. This is most apparent in historical contexts where access to literacy and thus knowledge was carefully managed, and more importantly where relationships of dependency were created and made self-evident, such as the relationship between literate clergy and illiterate laity.
Working on the assumption that modern information technology is simply the latest innovation in the long history of literacy and language, we should expect to find these same discursive realities hidden in our own information technology: who is deciding what can and cannot be known? Who is placed in a relationship of dependency from which they cannot easily escape? This issue can be explored in part through standards development. The need and desire to share information means, for example, an impetus towards controlled vocabulary in the humanities, and expectations about the coherence of geographical description. But the naming of things matters in the humanities, so the adoption of a shared language – someone else’s language – risks violence to the subject of study. Phenomenologically speaking, it’s not possible to put your foot into the same river once. Nor is this concern with language and meaning confined to the philosophical ramblings of humanists: cartography is exhaustingly political. So will the search for shared protocols and semantic interoperability wreck the interpretative project of the humanities? This is a moot point – it’s not clear that the external reality of the world can be reduced to language at all, let alone someone else’s language. Stepping away from the solipsistic cliff edge, and assuming that the world can be contained in language, it should be clear that the there is more to the promotion and adoption of vocabulary controls and spatial syntax than might first appear. They are a solution to someone else’s problem and the problem is getting you to do what they want. We need to have a great big row about standards.
Intellectual Rights and Wrongs
From the recondite planes of linguistics to the quotidian bustle of our offices and institutions, it’s not even clear that data sharing is a universal virtue. There is still more lip service than web service. Policy documents proudly proclaim the merits of open access and so we come to expect that data sets will be available to us after an appropriate interval. A few brave souls are good enough to provide instant access. But as the commercial value of the data increases so the ease of access declines. This is especially true of geospatial data. It is customary to criticise mapping agencies for their reluctance to sacrifice their one major asset on the altar of open access, but the reality is that many of our institutions play this game too. It is easy to trace the Ordnance Survey’s caution to a Thatcherite agenda of fiscal independence, but the research community is under the same pressures. On one hand universities are expected to collaborate in an open and sharing environment, on the other they are expected to compete and develop IPR-based business plans which turn ideas into ‘third leg funding’. Data sharing is fine so long as the integrity of data and the profits of the host institution are not compromised.
Problems of intellectual property rights are not insurmountable. The will to succeed and trust in colleagues are powerful forces that time and again mean the issues resolve themselves. But the inadequacy of current legislation is in stark contrast to the relative sophistication of what could be achieved. This is especially true when we consider the complexity of derived data sets in which it is no longer clear who owns what, and therefore difficult to be confident in what can and cannot be re-supplied. Clearer documentation and demarcation of sources is an obvious solution, but the quantities of metadata required and the complexities of licensing suggest that this is likely to be a temporary solution. The development of a national spatial data infrastructure implies data sharing: and for this to happen there has to be a simple and well-understood protocol for who is responsible for what. It’s unlikely that tools like Creative Commons could become the norm in the spatial domain – but the sort of empowering clarity which it implies is at least worth the protracted effort.
Institutional aspirations for repositories present another, and somewhat unexpected, challenge to the free flow of geospatial data. Still in their infancy, the stock-in-trade of repositories are digital research papers: theses, journal articles and the like. The scope of many repositories maps conveniently onto submissions for the Research Assessment Exercise, and somewhat inconveniently onto the range of digital outputs that research produce and consume. Few if any are able to support the technical
Towards the digital heritage service
Intellectual property rights and institutional policies leads us seamlessly into strategies for the preservation of geospatial data. Institutional repositories are characteristically not designed for long-term preservation and the complexity of intellectual property laws inhibit much reasonable short term action for long term gain.
The conventional wisdom of digital preservation envisages a trusted digital repository managing sets of files with appropriate administrative, technical and representational metadata to enable and ensure independent utility. Setting aside for the moment the technical complexity involved in rendering geospatial data – one should assume the designated community and technical sophistication of the archive managers can deal with such issues – the implications of so much derived data should become obvious. It is unreasonable to expect that a single repository will be able to manage all the components. Granted the Open Archival Information System allows for different functions within a single system to be distributed (an OAIS can span various institutions as AHDS has shown). But it does not envisage that the same functions be replicated in multiple agencies. Even allowing for this discrepancy the advent of change-only-updates and live sensor feeds confuses the argument even more. It is hard to imagine a single repository being able to take responsibility for the long term preservation of an integrated GIS project: it seems much more likely that a number of agencies will have to work together. For this to happen there needs to be a matrix of mutually understood and compatible responsibilities with each agent continually assessing the performance and viability of the partners. Mutually managed and distributed curation sounds attractive but the tools and standards for this sort of preservation are still only in draft. This sort of ‘preservation in situ’ is metaphorically closer to heritage management than archives, so is superficially attractive to archaeology and the historic environment, but the similarities are more imagined than real. Perhaps a family of law and practice could be imagined with scheduled ancient data sets of national importance protected as such. But if we can’t fix immediate and basic problems like permanent identifiers the prospects are not good in the medium term.
Standards and standards developers really need to address three issues. At a basic level we need to know why so few people interested in standards development, and decide if we are happy with that. Do we need a much bigger effort to involve the whole community or is it appropriate to leave it in the hands of special interests and technocrats (like me). The standards for information exchange are well developed and are arguably way ahead of institutional policy. This means that they risk becoming hypothetical. Arguably we should stop developing standards and start redeveloping our institutions. Finally the long term for geospatial data is far from secure, and with a proliferation of services and agencies, is likely to become less secure in time (not more). The national spatial data infrastructure wants a future then it needs to get a history.
Available for disagreement at 
Case Study: Experiences at EDINA: the role of Standards and Metadata in SDIs
Dr Guy McGarva, EDINA
Over a number of years EDINA has gained experience in the management and dissemination of national spatial datasets, the development and operation of online metadata editor and publishing tools; engagement in academic and national spatial metadata initiatives to support data discovery, management and sharing; and, more recently, involvement in projects and activities relating to the sharing and curation of spatial data.
This talk will look at the use of relevant standards and metadata in these services and projects and their role in enabling the development of spatial data infrastructures.
* Scale * Heterogeneity * Standards and Metadata