This wiki service has now been shut down and archived

Questions related to the RDF, Ontologies and Meta-Data Workshop

From ESIWiki

Jump to: navigation, search

On this page you will find a list of questions and answers raised during the RDF, Ontologies and Meta-Data Workshop. Participants are cited where possible but many are missing - apologies. The wrap up discussion session arose from asking the participants over lunch to give one question they still didn't know the answer to and would like to hear discussed.Please feel free to add to these if any questions or answers were missed or if further details or repsonses to the answers are appropriate.

Contents

RDF Seminar (Jayant Sharama/Jurgen Angele)

  • How complete can mapping rules be?
    • Only simple ones, with the graphical interface
    • But can write directly in F-logic – more difficult but very powerful
  • What sort of DB connectivety?
    • JDBC
  • Malcolm mentioned that later interpretability, re-use, editing of rules made by one user is very difficult by later users.
  • Dave - only OWL-lite currently – What about OWL-DL ? How is that going to develop?
    • cannot resolve OWL-DL w/ F-logic not compatible.

e.g. cardinality reasoning difference means we need different reasoning engines i) F-logic ( )OWL (kmz) can represent the ontologies in same way – but the reasoner has to be different.

  • Malcolm - What do we do about erroneous data from databases – how fragile does this make inference etc.
    • to integrate the data you have to resolve this. Need to re-engineer the data which is hard – will never be 100% solved.
    • Malcolm - people can process errors automatically by string matching / approximation – this is difficult with logic programming
  • Ben - deploying of services is simple (for expert) –

a query = variables, output = variables can represent these in tables.

    • Import is more difficult to customise, XML schema – ontology – too complicated

Just a syntactic representation of data.

  • What about non enterprise users e.g. research or government community?
    • Vulcan Chemistry project is an example.
  • Mapping is difficult once you go to a larger user community than in one user group. The effort of mapping ontologies increases with the number of interfacing communities?
    • This is an unfortunate reality – more information sources – more work to integrate!
  • Andy Law raised the issue of DB schema vs Data model. This is onto studio’s approach – needs knowledge engineer to do the mapping.
    • DB schema – the idea seems to be to suck schema out – re-engineer it and assign semantics
    • Data model – but why not use this – already has the semantics.


Evaluation/Experiences of RDF triple stores

  • Malcolm - are all knowledge entities grid entities?
    • Oscar -in this world yes.
  • Jessie - of requirements for S-OGSA which are a priority?
    • Oscar -naming very important
    • lifetime of semantic binding notification and timestamping and outdating.
  • Jessie - how do you decide what to cache locally if RDF resources are distributed and queried over the web?
    • Dave de R- unknown at the time – maybe determined at application level.
  • Privacy? - need to hide some data – semantic web ‘could’ help hide data because we can have provenance.
  • Steven Perry

Making distributed RDF stores from RDBs. Synchronize – keep published RDF graphs in synch with underlying DB Create mapping programmes to generate RDF from SQL queries and files- (flat , XML ) generation of RDF not automated – it is programmatic - mappings have to be set up manually.

  • Andy D - Provenance
    • Discussed - important that data seen as assertions – can be accepted or trusted etc. by users - still not solved but critical.
  • Akiyoshi Matono -Distributed RDF – querying distributed hash table, structured P2 network.
  • Benjamin H Szekely - Developing new API for triple stores
  • Andy L - you can apply Triple patterns as ‘filter’ on a graph – i.e. to just search a partition of the graph – or you can apply a triple pattern to a union of several graphs .
  • Ben - old version or outdated – stored in history table.

How can we do it?

1. Emphasise what the biologists know 2. Biol Ontology – only way to capture the “truth” 3. out do biologists care and are all uses the same?


  • Shift from inferring properties of the domain vs inferring things about the data
  • What extra do we need from an ontology
    • seems RDFS could be good enough - and there is tool support.


Evaluation/Experiences of RDF triple stores Discussion Session

  • Jessie - what will speakers say now that they have heard each other’s talk?
  • Oscar - what about lifetime of RDF – how could you implement it on grid side or in application?
    • revisions in BOCA (IBM), backtracking – could easily implement this.
  • Do you remove triples and how does this knock on and can you timestamp expired outdated graphs or triples – so only time within timespan
    • This will tie into how you deal with provenance and trust which are also ongoing research.
  • Dave - some users want to go back in time – see snapshot of data at that time.
  • Jessie - isn’t the lifetime up to users caching what they need and when they want to update it.
  • What about when resources are lost / dropped? – Who to assure quality of data
  • Andy - e.g. sequence comparison searches get different results at different times because data changes over time – need version and history to allow repeating of analysis.
  • Dave - says it is self regulating like the web!

Oscar * is the semantics open world (Can never know the truth) or closed world (absence of data –> truth)? Steve * people will look to services and data sources that are stable and reliable (i.e. self regulating) Ben – why worry about trust now? – never did before – why assume or demand RDF can or will now service this?

ITIS species 2000 - Two classes of users

HEAD 1. wants latest view - dynamic Vs RELEASE 2. valid timestamped annual version

Some users want “name usage” (potential taxa) others want authority list.

  • Would a google ranking system be as valid as an RDF search?
    • But this does not value new information
  • Oscar – same problem as web now - difference between semantic web result and a deep web search - if you want to search unstable, dynamic data problems inevitable. You need to be able to search stable data and/or search volatile data
  • Shaun questions the approach of searching on individual triples from a dataset – they only make sense in context of the dataset. Better to expose metadata or the resource so can utilise the dataset rather than individual data properties.
  • Dave - real time metadata accumulates very fast –used to keep documents and metadata stable – not true now
  • Ben – difference between RDF as data and RDF as metadata

Moderate. object has publisher as a property but can other people add properties / assertions – you need to know

  • Steve – thinks you should be able to integrate on not triples from individual stores – on basis of who made them / where they are from.

i.e. what are extent and boundaries of named graphs – do you join them?

Example from Andy:

if you have Pig sequence (from pig RDF graph) and Human seq (from Human RDF graph) and someone makes a relationship between the two sequesnces…

  • Who own the relationship?
  • Where is it stored?
  • Is it all one big graph now?
  • Herbert - you have to be able to find the context of a triple i.e. the graph it originates from. (Moderation)
    • if you merge into one store you lose this
    • do you need reification?
      • A triple becomes an object – can stick properties on to say where they come from.
    • Andy says this is critical - you need to store who / why (evidence) for assertions
    • Oscar – this is above the data model layer – it is an application level
  • Wanda Halran - data can evolve and ontology can evolve - Therefore metadata that describes the data evolves

There are two aspect to the versioning – Therefore answer is to store / make the rdf at run time –the ontological query is direct onto the database i.e. separate the semantic layer from the data

  • Dave came back to saying that web shows the simple answer wins (hard coding urls) – not an indirection layer to look up resources – as the engineering was too hard. (More cyclical arguments about trust .)
  • Jessie – always going to need scientists to make decisions of trust. We just want to filter out junk.
  • Species 2000– users / publishers will need to declare “use” / “context” of data in metadata – Dublin core ownership etc. this is then a data discovery issue – the client application layer needs to explore this to find and use appropriate data sources.
  • Jessie – How do you choose a triple store?
    • efficiency etc?
  • Does it have to be fit for purpose
    • e.g. jena very efficient for some queries because puts whole store in memory – cynical view is oracle will win – just wait till they do it then buy it– but limits user base / open source.

Semantic Mediation and Architecture session 1:

  • Andy -

A lot of the is talk about accessing and querying ontology but it is the data/individuals/ instances that are important to the end user. The OGSA-DAI-RDF/ONT query has to do both – access ontology information so that it can then access instances matching ontology classes and properties

  • Herbert -

Usability - Need to get data providers to buy in - i.e. export data according to standards and protocols. Is OGSA-DAI user friendly – convenience of a pre-existing package of protocols, architecture, API etc – but not easy for novice to know what to do with it all.

  • Exposure Discovery of services is critical

OGSA-DAI protocol need to discover resources and know the implementation they represent – so can use appropriate query language for queries against triple store etc. But RDF/S access can be transparent to the store. Implement if only uses RDFS logic – and leave how this is serviced up to the local store ** Malcolm - It is the accepted OGSA-DAI paradigm that the OGSA services are transparent to the end implementation and can just pass through queries in the appropriate language.

  • Is OWL-S any good / popular for describing web services?
    • W3C standard, but no good tools – WSIL-S is a competitor
    • MyGrid looked at OWL-S – but too heavy weight for users – no good tools

- implemented their own simple model and forms for annotation services


Semantic Mediation and Architecture session 2:

Well developed UML modelling for meteorological data but not for terminology - maintain multiple versioned lists.

  • Jessie - How do you distinguish between metadata and data?

Sometimes the terms you want to search on are in the data, not the metadata – e.g. species occurrence

    • Bryan – thinks it is up to the metadata describers to put all the relevant terms into the metadata

You have to support multiple search patterns “Governance Issues” ? - must support multiple discovery formats and portals Bryan argues - have to accept less of data if integrating globally


Discussion Session for Semantic Mediation and Architecture

  • What tools need developing for support of OWL?
  • Shaun - e.g. Protege problems –
    • How do you archive and version ontologies? – cvs
    • importing separate ontologies is a pain – and then get a problem if too many concepts i.e. need tools to support groups collaborating on OWL ontologies / development
  • funding problem – proof of concept - data integration (need to get this working) but ontology but takes long time to develop – so how do you justify the work.
  • Bryan – argues should work at data integration first and not wait for deciding what to integrate.
  • Asuncion - Editor for Network on Ontologies

• Collaborative work • Maybe overlapping different languages • With context – i.e. can use some modules or not • project is gathering all requirements to address these problems • Including metadata and ontology evolution and interoperability tools for import RDFS / OWL so can move between applications and tools Even XML editing tools are bad e.g. importing multiple schemas and namespaces Other tools also necessary for whole workflow of making, editing, joining and marketing ontologies . Are there guidelines on this? Also tools to explore different ontology would be useful.

  • Jessie – back to semantic mediation.

What about satisfying user communities – can we really deliver anything at the moment and what is the timescale for immediate possibilities?

    • Shaun – we do not have to do full classic db integration through common schema – you can do small scale stuff tools to assist data merging (e.g. his merge Actor interpreter in Kepler)

Bioinformatics does not necessarily “merge” the separate datasets – you get related data and input them into workflow.

  • Bryan – how do you incentivize scientists to actually do this data markup?
    • Difficult to develop domain specific tools
    • Get citations on the basis of the dataset uses – therefore can then demand data sets are only published if properly marked up.
  • Jessie – are there two camps - semantics Web vs Grid – they seem to be developing separately

• Is this true • Will they come together • Is there something about them that means they are different?

    • Semantic Web will have a larger user base
    • Globus grid has a smaller more expert community
  • What defines work as so special? It has to be Grid not just Web - is it performance, security etc?

Experience of Building/Using Ontologies

  • How do you link between ontologies?
    • GO – Cell Ontology

e.g. process – cross reference on ID i.e. alignment – is this OK, is it maintainable?


Meta-Data, LSIDs, Ontologies for Schema/Data Integration discussion

  • LSIDs / URIs – in BioRDF working groups – anyone can join in - google to find the wiki. Next tele call 19 June on URIs 4pm
  • Wadsworth – company classifications semantically – by expert user – approach similarities to Dave Thau / Bob Peet approach.
  • LSIDs

mutability - metadata – so you end up putting pointer and reference and data in here – this allows e.g. to point at current preferred version of a set of resources Immutability - data returned cannot change over time

  • LSID use – what hinders
    • Too complex
    • Not sure of immu
    • Tool support flakey (protégé)
    • Too scary
    • Put off
    • Lack of tool support generally – need more and better tools and better dissemination
  • Named graphs
    • persistence of RDF graphs
    • poor tools – oracle expensive - BOCA / sesame under development – jena scalability problems – NG4J
  • Herbert – wants open source tools and want to know what to use – cannot all spend time evaluating / testing / developing tools
  • Stan – making choice for LSID as the technology was not the real issue – the social part of the data markup / collections is more challenging – you can always change tech later.
  • Jessie – Why aren’t we allowed mutable data in LSID resource - could flag whether mutable or immutable in the metadata
    • Ben – reason is so you only need to get it once – and cache it – versioning allows changing – use Abstract concept that points at the current correct record
  • What is an ontology - how different to an object model?
    • Can reason about unstructured data, has semantics, has logics proof theory
    • RDFS – more or less object model
    • OWL-DL – more expressive – has logic
  • Nick – said that individuals should only be use in an ontology when required to capture the constraint / definitions
  • Andy – confused why not just have individuals as data - in some sense ontology = schema
  • Jessie – where is the border between ontology class and individual?
    • Generally should use classes for extensibility and so can subclass etc.
  • OWL vs RDFS (can infer taxonomic classification using these only)

OWL domain, range subclass only is equivalent to RDFS Roger wants “same as” and transitive properties. If you want more complexity then add more OWL features.

  • Tools for relational DB to RDFS schema – what exists?
    • not much and often very naïve translations of tables to classes and column to properties
  • What comes first data or data model?
    • data – but ontologies often created after data collected / for integration
    • datamodel - expect this first
  • Open World vs Closed World
    • Open World - You have to assume something to understand the true meaning of results
    • Closed World – with a DB /SPARQL query – have to consider the RDF store = closed world semantics
  • How do you do conditionals in triples ?– need to entail them w/ a rule
    • Need to add rule – e.g. use SWRL (Semantic Web Reasoning Lang) – plugin to protégé is coming
  • How do you model / capture fuzzy concepts?
    • OWL –DL – some domains do really have fuzzy and inconsistent classes – sometimes not true
    • you do your best to define / classify
    • but cannot always (ever?) define it completely
    • could argue that all classes are “fuzzy” i.e. NOT COMPLETELY SPECIFIED
  • Building Ontologies
  • How do you get agreement?
  • What tools for
    • capturing ideas
    • showing ontology
    • editing by users
    • users demo of it
    • users using it
  • How to translate work on ontologies to agencies / providers etc.- to Get them out – publicise them
  • How do you find and re-use ontologies –
    • swoogle (or even google)
  • Modularisation – recently possible – how do you find these? How do you find an upper Bio-ontology for example
    • RDF / OWL imports are difficult to implement
  • Data that can be expressed as triples:

When is it worth doing this?, What will I gain?, Need rules to help decide…

    • Daniele - Allows: Re-use, publish, repurpose, exchange, standard – and if you want at some stage to do inference
  • What tool experience do people have
    • Cvs – new wonderweb with AP / agent deep this - versioning of ontologies
    • Tools to compare ontologies - there is some comparisons , functionality in protégé
    • Collaborative tools – e.g. web based
  • Different Users
    • different label / language support of concepts is supported
    • or can map between different ontologies
  • Cost: - How do you get cash for this? And how do we cost this and estimate it?



Wrap Up Discussion

GUIDs/LSIDs

  • Why is it that with Life Science IDentifiers (LSIDs) data must be immutable and meta data can be mutable?
    • Immutable data – once resolved never have to resolve it again, expectation is LSIDs will be cached
    • suggested use of abstract LSIDs to deal with data that you want to be mutable
  • Meta data versus data
    • Can return data as rdf. Conceptual graph might stay same but byte serialisation may change. Serialise once and return that when someone asks.
  • Who is using LSIDs?
    • LSIDs are used for actors in Kepler, Taxonomic community prototypes, Taverna
    • TDWG group – jump with both feet in – then backsliding etc. Bottom line is they have decided – choose it and move on – social issues and smeanitc issues are key issues that need to be addressed – harder than the technology issues.


  • Issues of persistence with LSIDs
    • Persistence of names graphs…tools are either prototypes or not open source so persistence is a problem
  • Why are credentials not supported in LSIDs?
    • chose to ignore security – assume the underlying system will deal with that.
    • related to this: might want Time to live on LSID data
      • there are expires headers etc (maybe not part of the spec) but convention implemented consistenly.
  • Is is sensible to use IDs for concepts and instances?
    • use LSIDs as the name space for ontologies.
  • What does the identifier identify? When do you need to give things an identifier or how small can things be to justify giving them an identifier?
  • If people are not using LSIDs - what are they using/is there something better?
  • If people are not using LSIDs - why not?
    • because tools support is flaky
    • scared to implement LSIDs in case it's not the way to go...
      • we're still at the point of trying to see what could be done. Try it and see don’t worry about it – not a huge investment of time.
    • Protégé – breaks with LSIDs. Problem if the tools don’t support the technology.


  • How can we translate work on ontologies etc. into things that are operational and need to go out to agencies to use?
  • How do you gracefully handle evolving data
    • have notion of an abstract LSID or conceptual LSID. Doesn’t have data but has rich meta data. Resources are anything – information resources are a subset that can be returned as digital things – might be abstract notions – which don’t have a real thing just pointers.


Tools

  • Where are we with tools to work with ontologies?
  • What's missing form existing tools?
    • Versioning ( Boca has versioning - good)
    • Comparing ontologies would be useful (Protégé is going to work on this…)
    • Tools to aid collaboration and documentation of ontologies to be built.
  • What particular tools have been found to be useful?
  • Are there tools to convert database schemas to RDFS?
    • Ontoprise has this facility we believe...
      • but Rel db schema isn’t conceptual enough usually.
  • Are there tools to reason with ontologies that are distributed?

Translating Between Different Representation

  • What's the difference between an object model and an ontology?
    • OM used to structure your data but also to interpret unstructured data.

Ontology has semantics, logic theory, models, proof theory. RDFS – taxonomy ~= OM Grow it to become an ontology by adding logic. – then can have inference.

    • Correct transformation of data from open to closed world. If you go from open to closed – express the closure.

I have db of fact – assumes closed world semantics – doe shtis employee have this id – can’t define it – therefore false. Can’t express such things in RDF. Need to make an interpretation based in what’s in their. Assume the open world.

  • How do we translate between onotlogies represented in languages e.g. RDF(S) and OWL-DL?
  • Are there tools to convert database schemas to RDFS?
  • Can an RDFS parser make sense of OWL-DL?
    • Can OWL-DL make sense of RDFS e.g. subclass of.
    • Write in OWL and use only subclassing, domain range then this is morally RDFS.
    • Want same as….and transitive props..


Why and When do we use RDF/OWL

  • What will I gain if I use rdf to relate my concepts?
    • Use rdf when you want to reuse, publish, repurpose, exhchange, standard data and eventually infer.


Building Ontologies

  • How do we build an ontology or ontologies that allow different users to communicate effectively?
  • What comes first the data model or the data?
    • data model from a db point of view data from an ontology point of view.

(OMG ODM – ontology domain meta model….)

  • When do you model things as classes or instances?
    • If things are to be repurposed then best to model things as classes…
    • Instances: OWL ontology for a domain – 60MB file using OWL classes to model their objects.- can OWL be used in this way.
  • In general do instances form part of an ontology?
    • Describing classes – fact that you can create individuals – only important if they help to define the classes. Terminology T-box. Logic based ontology languages – other types of ontology languages. No mental framework that you can use to define the classes.
  • What is the implication of using data to define classes….
    • OWL instances: OWL one-of gets a list of classes. So the class is defined by a list of instances. Can use if for controlled vocabulary.
  • How do you find/reuse existing ontologies?
  • How do we capture/represent/present fuzzy concepts?
    • Assume trying to do best to define concepts, but typically will never be able to capture everything. Accept that ontologies are incomplete (contradiction).

Ontology creation partly about coming to an agreement about what terms are going to use/mean.

  • How do you make a conditional triple? e.g. a->b if x
    • SWRL – semantic web rules language or KRS or something.

Can tie the rule to the triple? Entailed triples?

    • Don’t try and do everything in the ontology….
  • How do we populate ontologies?

Exchange of information

  • How do we pass data around and retian the meaning?


User Issues

  • What are people's experience of open verus closed world? Legacy data is traditionally closed world (in database), what happens when we move to an open world e.g using RDF and smeanic web which is assumed to be an open world.
  • What kind of ontology do we need to get started with taxonomic data?
  • How do you go about proposing, presenting and accpeting ontologies to commnities?
    • When talking to users don’t talk about ontologies ask them to explain the domain


  • In the genomics community how do they relate to the taxononmy?
  • What's the user's take up of these technologies or issues associated with their take up?
  • How do we cost the effort of building ontologies etc. so that funders can be mobiled to support the effort required?
    • very difficult - like costing a software project, need to scope very well.
  • Can we have different ontologies for different types of users?
    • use different versions of LSIDs – technical ontologies and Domain specific ontologies – split work between technical ontologies and domain ontologies.
  • Can you put different faces on ontologies so depending on who’s viewing the ontology you see things differently?
    • modularise the ontology then map or transform the ontologies. If we define the notion of transform can transform it for different people’s views.


Semantic Web

  • What is the future of the semantic web?
  • Where are we currently in the development of the semantic web? and where are we

going?

Security

  • What are the communities planning to do about managing secutrity models across the communtiy?


Semantic Integration

  • Currently the work on semantic integration seems too abstract how do we make it more practical?
Views
Navigation