Data models using Shape Expressions in the Medical domain

Scientific Research Project Number: MA 2018 040
Place: WESO (Web Semantics Oviedo) SPAIN


Interoperability between different applications if one of the main challenges in Information Technologies and affects especially medical informatics applications which need to handle vast amounts of data from heterogeneous sources. RDF is a knowledge representation language with a graph based model which facilitates the automatic integration of heterogeneous data. It can be considered the lingua franca of the semantic web and has been employed to represent linked data and knowledge in multiple domains like e-Health. The Shape Expressions (ShEx) language was designed as an intuitive and human-friendly language that allows domain experts to describe data models for RDF. It has already been applied to generate the FHIR-HL7 RDF data models and has helped in the development of the Gene Wiki project. Although the ShEx language was designed as an intuitive language, further work needs to be done both to demonstrate its effective intuitiveness and usability and to improve existing ShEx tooling so it can be used not only by computer scientists but also by domain experts.

Description of the SRP Project/Problem

We aim of the project is to develop some tools that allow medicine practitioners to design data models and schemas in the Health environment using the ShEx language and to assess its efficiency as a data model language. Although ShEx has already been employed in several projects, further work needs to be done to improve its usability with domain experts and to assess its efficiency in real scenarios.

Research Questions

Is it possible to develop languages and tools that allow medical domain experts to define data models and schemas in their domain of expertise?


The research will focus on the identification of features, both from a language design and user interface point of view that allow domain experts to capture knowledge about data models using Shape Expressions.

Time period.

7 months


Iovka Boneva, José Emilio Labra Gayo, Eric G. Prud'hommeaux: Semantics and Validation of Shapes Schemas for RDF. International Semantic Web Conference (1) 2017: 104-120

Jose E. Labra Gayo, Eric Prud’hommeaux, Iovka Boneva, Dimitris Kontokostas (2018) Validating RDF Data, Synthesis Lectures on the Semantic Web: Theory and Technology, Vol. 7, No. 1, 1-328, DOI: 10.2200/S00786ED1V01Y201707WBE016, Morgan & Claypool

Tim E. Putman, Sebastien Lelong, Sebastian Burgstaller-Muehlbacher, Andra Waagmeester, Colin M. Diesh, Nathan A. Dunn, Monica C. Munoz-Torres, Gregory S. Stupp, Chunlei Wu, Andrew I. Su, Benjamin M. Good:

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database 2017: bax025 (2017)

Harold R. Solbrig, Eric Prud'hommeaux, Grahame Grieve, Lloyd McKenzie, Joshua C. Mandel, Deepak K. Sharma, Guoqian Jiang: Modeling and validating HL7 FHIR profiles using semantic web Shape Expressions (ShEx). Journal of Biomedical Informatics 67: 90-100 (2017)


Ronald Cornet, Amsterdam UMC, locatie AMC, afdeling KIK,
Jose Labra, WESO (Web Semantics Oviedo) SPAIN, +34-985103394