This wiki service has now been shut down and archived


From ESIWiki

Jump to: navigation, search


Workshop report: enhancing and exploring epigraphic and archaeological data through e-Science

This report is a draft, and should be read as a work in progress. Any comments are very welcome

Introduction and purpose

The workshop Enhancing and Exploring Epigraphic and Archaeological Data through e-Science was held at the e-Science Institute in Edinburgh on 10 and 11 February 2009. It was co-funded by the Arts and Humanities e-Science Theme and the British Academy. Attendees included editorial scholars engaged with the Inscriptiones Orae Septentrionalis Ponti Euxini (Ancient Inscriptions of the Northern Black Sea Coast) project, and computational specialists engaged in the digital representation, mark-up and analysis of epigraphic and archaeological data. The purpose was to scope and articulate next steps practical which the IOSPE project could take with respect to e-science and digital research.

Epigraphic data is a critical source of information about the ancient Greek and Roman settlements in the Northern Pontic region and their interactions with indigenous, settled and nomadic populations. The inscriptions come from a range of time periods, and exist in both Greek and Latin. There has not been a new edition published for many decades.

The principle research questions considered by the workshop centred around the nature of the data, and the intellectual judgements needed in order to make meaningful connections between data. The principle technical entity discussed was EpiDoc, an open source XML standard used for marking up inscription data. As well as offering a flexible framework in which to publish inscriptions, it also allows the text to be structured according to semantic entities desired by the research community.

Multilingualism is a major issue. The inscriptions themselves were created in Latin and Greek, and much of the secondary texts (including commentaries) are written in several languages, including English, French and Russian. Besides the issues of understanding and translating, any digital platform will need be able to deal with non-Latin scripts (e.g. Greek, Cyrillic).

Epigraphers already encode and structure their data. The Leiden system is one example of this; as is Petrae. Although perhaps an obvious point, it should be (re)stated that the digital environment does not impose on text the physical constraints of a print publication. On the first day in particular, the discussion dealt at length with the problems posed by cumbersome and expensive ‘traditional’ book publication. Most mainstream academic publishers will simply not be interested in publishing hundreds, or even thousands, of colour images for academic purposes. An interesting distinction was drawn with the physical sciences, where an intellectual tradition of electronic publishing has developed: it was noted that this has led to a democratization of knowledge. Producing a digital edition, however, is in and of itself an interpretive research act. Traditional index and traditional contents pages document and refer to page numbers. Semantic, qualitative mark-up however, of the kind offered by EpiDoc, allows much richer searching and information retrieval, but deciding on the structure of such links is an interpretive rather than a technical process.

Participants agreed as a general principle that if you make detailed information about your research processes, your data, and (in computational terms) any code you write available, others can use it for analytical work in the future. The lack of reuse of information once it has been published is a major issue in ephigraphic studies - i.e. in career terms, publication is often seen as an end in tis own right, rather than a means of provoking more research questions. This does not simply provide benefit for future researchers. It adds a whole new layer of documentation, which enables the testing and (re)validation of one’s research outcomes.

It was noted that much early epigraphic recording practice focuses solely on transcribing the text of an inscription (the case of John Gandy Deering was cited), and takes no account of its physical context that might profoundly affect the meaning and interpretation of the inscription. There is a need to include as detailed data as possible: both about the monument inscribed and about the text of the inscription.

Participants also considered that the application of digital methods to epigraphy invites the rethinking of those semantic classifications and boundaries which the subject has developed. For example ‘inscriptions’ and ‘literature’ have traditionally been considered to be separate; but if two examples are contextually relevant, then they should be considered so.

The importance of cultural processes within the computational and epigraphic disciplines were discussed at length. What is a 'finished product'? In particular, there is a resistance in epigraphy to publishing results that are not ‘perfect’. Although this reflects entirely legitimate professional concerns, it stifles the interpretive process, mitigates against the wider availability of research data, and limits the possibility for collaborative research that might be conducted on that data. Again, the importance of tracking the creative workflow was highlighted; and it was noted that cognate areas that have dealt with e-Science, for example digital philology, have experienced very similar issues.

Technical Breakout group

Crosswalking may be simply defined as going from one data format to another. The purpose of the present discussion is to determine how different epigraphic databases, created for different reasons, by different researchers and at different times can be most usefully integrated using crosswalking methods. A need was identified for rules for data entry. Data entering any kind of digital system needs to have been captured with some degree of consistency. A key question when formulating database search capabilities is therefore what are the common vocabularies that the domain experts use? What sort of queries would they be doing? What would their questions be? It was also noted that what does not match when conducting cross-database searches is, potentially, just as interesting as what does. In this context, a database Linking a database with its domain context. Different databases are created for different reasons, at different times, by different people, for different purposes.

It was noted that external databases could dump content into EpiDoc easily; or Xquery is another option. If 2 different of databases are using different XXLTs, the developer would need to be in touch with each database.

Essentially the issues are intellectual, i.e. how one characterizes the mappings between different classes of information, rather than technical. Every epigraphic database contains fields with broadly equivalent meanings, e.g. fields describing dates, locations etc. This provides considerable scope for crosswalking approaches, but in turn requires academic judgments as to which field in database A most usefully matches, from an epigraphic point of view, matches which field in database B. Ultimately, the only way to ersolve terms which do not agree is to consult the domain experts involved.

The OGSA-DAI architecture offers the capacity for adding a layer to execute queries via XXLT. However it was considered that this approach would not add a great deal of value, as the architecture of the underlying databases would also need to be altered. A key advantage of OGSA-DAI however, is that you can integrate datasets, instead of just relying on some external standard. You therefore have the ability generate data on the fly.

The Eagle database(s) illustrates the problem of intellectually similar, yet technically differing databases, and could be used as a concrete example of some of the technical solutions on offer. Eagle, a consortium of four databases, uses the MS Access format, and is therefore brittle, and vulnerable to data corruption. The Eagle search facility requires the user to select which of the constituent databases they wish to search.

A join query can be described as a query which matches two comparable database fields, such as date. This could be absolute, such as =, or it could capture similarities between fields, and rank them vertically according to the closeness of the match in each field. It was considered that this could be an extremely powerful methodology. E.g. it could estimate the likelihood that ‘Antiochia’ in database A is the same ‘Antiochia’ in database B by (for example) relating time periods associated in database B with the topynym field containing Antiochia, where the time periods are not present at all in database A. A union query on the other hand simply sends a search to two fields in different databases, and returns records which match the search term. The join query concept is analogous to decision support research, e.g. that being undertaken by the AHRC-JISC-EPSRC e-Science project Image, Text, Interpretation: e-Science, Technology and Documents (see; and probabilistic research being undertaken at Leuven.

It was asked what kind of use can be made of the data that will 'revolutionize' the field. It was suggested that an article, with substantial technical input, should be offered for publication describing join queries, and the benefits they could bring for epigraphic research. It was also suggested that the group should monitor the next JISC-DFG joint call for funding possibilities. Look at what could be done if people could do join queries as opposed to union queries?

Annotation of third-party databases was discussed. It could be very useful for an epigrapher to make notes and annotations of an external database, using that database’s ID fields etc, in their own database. Such annotations could (if the epigrapher wished), could be rendered searchable and/or published, and linked to external sources of data. This could make significant use of collaborative social networking technologies.

The technical breakout session concluded that there were five possible practical applications/issues for epigraphic research:

    An example of a join query using two Eagle databases.
    Demonstrating how one can determine if two references to (e.g.) Alexandria are referring to the same place.
    The articulation of different classes of outcome of queries, e.g. ‘equals’, ‘not equals’, ‘less than’ etc.
    Investigation of possible applications of text mining across fields.
    Cross-walking from software programmes commonly used by epigraphers. Most epigraphic scholars, for example, use MS Word. How can plain, or Word, text files be made interoperable with the kind of systems under discussion?

Conclusions / recommendations

1. The main standard format with which participants currently have familiarity is the Petrae database.

2. The issue of multilingualism is essential. Where, in the translation process, should markup occur?

3. The markup should be added through a user-friendly interface.

4. Epigraphic editors should not have to learn coding.

5. Texts that have already been marked sup should, wherever possible, interoperate with new systems. This is not likely to present significant barriers – for example, it is easier to transfer content from Petrae to EpiDoc than Word to EpiDoc, because Petrae already provides a formal structure (although EpiDoc provides rich semantic markup, whereas Petrae is only capable of flat-field categorization).

6. There needs to be a greater understanding of the interpretative workflow in epigraphy, and of which parts of it would benefit most from digital support.

This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.