Scientific Research Project

[MA 2021 07] Machine Learning for Prediction of Acute Kidney Injury Using Clinical Notes and medical terminologies

Amsterdam UMC, location AMC, department of Medical Informatics

Proposed by: Iacopo Vagliano [i.vagliano@amsterdamumc.nl]

Machine Learning for Prediction of Acute Kidney Injury Using Clinical Notes and medical terminologies

Scientific Research Project Number: MA 2021 07

Place: Amsterdam UMC, location AMC, department of Medical Informatics

Introduction

Machine learning applied to electronic health records (EHRs) can generate actionable insights, in particular quantifying patient health and predicting future outcomes in the context of critical care. Moreover, with the progress in natural language processing (NLP), extracting valuable information from biomedical texts has gained popularity among researchers.

Description of the SRP Project/Problem

The study will first start by using traditional machine learning models, with just structured data and without text, as the baseline to compare whether applying clinical notes can improve predictive performance. The focus will be on deep learning models, such as convolutional neural networks and recurrent neural networks. The goal is to first build models which rely exclusively on clinical features and then we extend them to make use of clinical notes. E.g. the student may build one RNN model with clinical notes and one without, one CNN with clinical notes and one without. Data will be from the MIMIC and/or eICU dataset, which contains both clinical features and clinical notes, and performance of models with and without text will be evaluated with discrimination (Area under Precision-Recall Curve, Area under the receiver operating curve) and calibration (Brier score, calibration curves) measures.

Furthermore, we are also interested to evaluate whether adopting external structured medical terminology in the form of knowledge graphs in our developed models. A knowledge graph is a graph-based abstraction of knowledge to represent data, built to explicitly represent knowledge that are being extracted from diverse data sources at large scale. Example of knowledge graphs in the medical domain are SNOMED CT and Unified Medical Language System (UMLS). Once the general results of the models performance with and without clinical notes, knowledge graphs, such as SNOMED CT or UMLS are obtained, they might be further combined with clinical notes as a mean to further improve predictive performance. Methods such as retrofitting and cui2vec can be used for this to achieve better text embeddings in clinical notes. Such methods may provide further insights on relationships between clinical concepts. The same performance metrics introduced for clinical notes will be applied to evaluate the models.

It is recommended having followed the course Special topics in data science in medicine (MAM11).

Research question

1. Does text allows models to better predict the likelihood of developing AKI on the MIMIC/eICU dataset?

2. Does using further external structured knowledge (knowledge graphs) improve the prediction performance?

Expected results

An algorithm and a trained prediction model using clinical notes and knowledge graphs.

A validation approach and performance results of this algorithm, with and without text and with/without knowledge graphs.

A master thesis written in a form of a scientific article.

Time period:

7 months

References:

Y. Li, L. Yao, C. Mao, A. Srivastava, X. Jiang and Y. Luo, "Early Prediction of Acute Kidney Injury in Critical Care Setting Using Clinical Notes," 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 2018, pp. 683-686. https://doi: 10.1109/BIBM.2018.8621574

MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35. Available at: http://www.nature.com/articles/sdata201635

Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG and Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Scientific Data (2018). DOI: http://dx.doi.org/10.1038/sdata.2018.178

J. Rogers, O. Bodenreider Snomed ct: Browsing the browsers KR-MED (2008), pp. 30-36

Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.

A.L. Beam, B. Kompa, I. Fried, N. Palmer, X. Shi, T. Cai, I.S. Kohane, Clinical Concept Embeddings Learned from Massive Sources of Medical Data, arXiv, 2018, pp. 1–27 arXiv:1804.01486. https://arxiv.org/abs/1804.01486

Faiza Khan Khattak, Serena Jeblee, Chloé Pou-Prom, Mohamed Abdalla, Christopher Meaney, Frank Rudzicz. A survey of word embeddings for clinical text. Journal of Biomedical Informatics: X, Volume 4, 2019, 100057, ISSN 2590-177X. https://doi.org/10.1016/j.yjbinx.2019.100057

Contact

Iacopo Vagliano , Amsterdam UMC, location AMC, department of Medical Informatics, i.vagliano @amsterdamumc.nl