Naming conventions for ontology engineering
Daniel Schober, PhD
The European Bioinformatics Institute (EBI)
NET Project – Postdoctoral Ontologist
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Collaborative Efforts – Scenario
• Metabolomics Standards Initiative (MSI)
– Describe metabolomics laboratory workflows
• Minimal requirements, augmenting exchange formats
– Ontology working group under OBI…
• Ontology for Biomedical Investigations (OBI)
– Larger collaborative, multi-domains effort
• Brings together p various ‘omics’ and biomedical communities
– Describe general laboratory workflow
• Experimental Design, protocols, data analysis etc.
– Developed under OBO Foundry…
• Open Biomedical Ontologies (OBO) Foundry
– Provides best practices for ontology engineering
– Creates a complete suite of orthogonal and interoperable
• Over 60 ontologies and ~10 core foundry
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Collaborative Efforts – Challenges
• Create networked orthogonal ontologies
– Integrating MSI ontology with OBI
– Integrating OBI with BFO and other OBO-Foundry ontologies, e.g.
• PATO (qualities), ChEBI (chemicals), …
• Integrate modular developments
– Parallel branch development
– OWL-import, referencing
• Improve the communication among developers
– Database developers and biologists
– Semantic web and text miners
-> We need common naming conventions
- To harmonize the appearance and design of modules
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Common Naming Conventions – Why?
• Representational artefacts built according to different
- Engineering methodologies
• MethOntology, Tove, Enterprise, …
– Engineering Tools
• Protégé, OBO-Edit, OntoEdit, …
– Representation languages and semantics
• OBO, OWL and CLIPS-Frames, …
- Engineering ‘schools’ and philosophies
• GO, semantics web, AI (Protégé Frames), …
• Manchester, Saarbruecken, Stanford, Trento, Karlsruhe, …
• Realists, Conceptualists, …
• As diverse as these backgrounds are the naming conventions applied !
– Diverse ad hoc ways to name what is represented
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
ID convention
uppercase prefix, underscore, number
vs. lowercase prefix, colon, string
or no name just ID string
Space vs. underscore vs. nil
UpperCamelCase vs. underscore
Namespace prefix
Compound name
Administrative helper classes
Singular vs. Plural, xref
Instance convention
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Existing Naming Conventions – Status
• Semantic web best practices and deployment group web
– Format specific: OWL
– Limited visibility: information dispersed and embedded into many
• BioPax manual
– Limited visibility: naming conventions only implicitly dealt with in
general documentation
– Implementation specific: naming conventions discussed at
implementation level (Protégé/OWL)
– Limited coverage: IDs addressed marginally (page 53, Technical Notes
RDF:ID), no conventions on relations
• GO developers style guide
– Format specific: mainly OBO; has its own definition for namespace which
differs from the one in OWL/semantic web
– Limited visibility: naming conventions dispersed throughout websites,
e.g. GO namespace, term names and identifiers are explained in
different documents
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Existing Naming Conventions – Status
• ISO-Standards
– Information overflow: About 40 documents that contain closely
related guidelines
– Limited access: commercial
• ANSI/ISO Z39.19-2005
– Semantics specific: Controlled vocabulary, e.g. about terms, not
– Limited coverage: No term ID handling or versioning addressed
• Law and order - Assessing and enforcing compliance with ontological
modeling principles in the Foundational Model of Anatomy (FMA)
S Zhang, O Bodenreider, Computers in Biology and Medicine 36 (2006)
– Scientific domain dependent: anatomy
– Hardly visible: paper access
 Acceptance and visibility is ‘limited’ to specific target community
 We need universally applicable conventions
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Our Goals
• Overcome diversity and fragmentation
– Collect existing naming conventions
• Make them accessible via repository
– Review and compare
• Create a single common document
– Distil universally valid aspects for OWL and OBO
– Ensure visibility for target domains
– Move towards a common resource for the OBO Foundry groups
• Provide best practice guidelines
– Provide robust names for ontology classes
– Not a ‘knowledge representation language’ for names, like e.g.
HUGO does for gene symbols (awgTg(GBtslenv)832Pkw)
• Engage in discussion with other groups
– A two phases approach …
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Towards Common Naming Conventions
• Phase 1: Straw man document
- “Working towards naming conventions for use in controlled
vocabulary and ontology engineering”
• See Bio-Ontologies SIG Proceedings, p. 29-32
- Created for MSI Ontology WG, targeting the larger OBI group
- Implementation and format independent
• Phase 2: Survey OBO Foundry groups
- Questionnaire (work in progress)
• Ontology and engineering process
• Current practice in naming entities
• Envisioned benefits of common conventions
• In depth questions on particular conventions
– Results to be posted under OBO Foundry wiki
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Naming Convention Straw Man - Examples
• Explicit and concise names
– Avoid omissions and ellipses
• Plant Ontology (PO) used 'cell' for 'plant cell'
– Avoid negative names like ‘non-separation device’
– Avoid ambiguous words
• 30 meanings of ‘set’; e.g. plurality ‘protocol set’ or action ‘parameter set’
– Brand name convention: use [company name+brand name+superclass]
• ‘US 2’ becomes ‘Bruker US 2 NMR magnet’
 To ensure shared understanding of intended meaning
• Typographical issues
– Use lowercase as in natural language
• most flexible, e.g. ‘pH’, ‘DNA_hybridisation’ (no acronym boarder problems)
– Avoid punctuation, sub/superscripts
– Resolve special characters consistently, e.g.  ->alpha
 To ensure readability, reduce diversity in appearance
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Naming Convention Straw Man - Examples
• Lexical issues
– Reuse words and avoid synonyms within compound names
• ‘x_part_of_process’, ‘y_part_of_process’ and ‘z_part_of_process’
instead of ‘x_component_of_process’, ‘y_portion_of_process’,
 To decrease learning- and search-burden on user side, to ease text
mining by reducing string variability
– Use underscore or space separator (instead of CamelCase)
• prevents distortions like ‘CapNMRProbe’ and ‘pHValue’, yet allows
brandnames like ‘SampleJet’
 To ease text mining and readability (demarked word borders)
– Use singular nominal word form
• Avoid inconsistencies like ‘biphenyl’ (CHEBI:17097) under a IUPAC
required ‘biphenyls’ (CHEBI:22888)
 To harmonize appearance, to avoid redundancy, to ease ontology
cross-referencing and import
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Common Naming Convention – Open Issues
• Syntactic issues
– Qualifier order: put the qualifier term before the part being
qualified ?
• ‘NMR_instrument’ in place of ‘instrument_for_NMR’
– ‘Helper’ strings in class names: establish general ones ?
• E.g. ‘sensu’ postfix in GO to indicate species specificity, ‘fruiting body
development (sensu Bacteria)’ (GO:0030583)
• Semantic issues
– Administrative ‘helper’ classes: how to name these metadata bins ?
• unclassified (OBI_200067), ChEBI_objects (OBI_336), toBeDiscussed,
– Identifiers and namespace: are conventions useful ?
• OBI uses [group prefix+underscore+unique number], e.g. OBI_334
• BFO uses [meaningful string], e.g. IndependentContinuant
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Common Naming Convention - Benefits
• Communication has improved …
- In geographically distributed, collaborative efforts
- Between developers from different domains and backgrounds
• Appearance of what we represent has been normalized
- Not just a matter of aesthetics
- Manoeuvring within the hierarchy became faster
… we further envision …
• Facilitated access to ontologies through meta-tools
• Reducing diversity with which ontology libraries and tools have
to cope with, e.g. OLS, BioPortal, PROMPT and text mining tools
• Facilitating ontology integration and cross-referencing
• Comparison, alignment (OWL-import) and mapping
• Serving as guideline for new communities
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI
Acknowledgements and Resources
• Authors and those contributing to the discussion
– Susanna-Assunta Sansone, Philippe Rocca-Serra, Suzi Lewis, Waclaw
Kusnierczyk, Barry Smith, Chris Mungall, Jane Lomax, Robert Stevens,
Frank Gibson, Luisa Montecchi-Palazzi, Dietrich Rebholz
• Members of MSI, PSI, OBI groups and OBO Foundry coordinators
• Further info
- “Working towards naming conventions for use in controlled vocabulary
and ontology engineering”, Bio-Ontologies SIG Proceedings, p. 29-32
• Funding sources (supporting my work)
– UK BBSRC e-Science BB/D524283/1 and BB/E025080/1
– Semantic Mining NoE (visits to IFOMIS and Manchester)
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI