[MA 2022 10] Assessing portability of prediction models across data sources via Out-Of-Distribution (OOD) detection

Amsterdam UMC, location AMC. Department of Medical Informatics
Proposed by: Giovanni Cinà [g.cina@amsterdamumc.nl]

Title

Assessing portability of prediction models across data sources via Out-Of-Distribution (OOD) detection


Place of the SRP Project

Amsterdam UMC, location AMC. Department of Medical Informatics.


Introduction

Most evaluation frameworks for supervised learning tasks assume the data to be identically distributed (part of what is known as the iid assumption). Unfortunately this assumption is often violated in practice: for various reasons deployed models often receive data points that are distributed differently compared to training data. In these circumstances the models can have severe performance drops, a contingency that can be especially worrisome in high-stakes environments such as e.g. healthcare [1]. This danger has been underscored by some high-profile failures in ML applications [2].

This can happen for example when data is collected from different sources, e.g. when a predictive model is fed data from a new hospital. One approach to tackle this problem is to perform OOD detection, namely classification models that are able to detect whether data points are ‘in distribution’, and therefore the predictive model will have the expected performance, or not. The problem of finding the best OOD detector is still open, and most likely task-dependent [3, 4].



Description of the SRP Project/Problem

This project aims at leveraging the OOD detection framework to assess whether predictive models can generalize to new hospitals. Concretely, the student will be studying datasets comprising data from several hospitals, in particular eICU (a public repository of ICU data from the US) and the NICE registry (a registry of Dutch ICU data).

The first predictive task to consider will be in-hospital mortality; the suitability of the task will then be re-assessed after a preliminary analysis on the available datasets. Different setups of OOD detection will be investigated, where certain subsets of hospitals are selected as training set (‘in distribution’) while others are taken as ‘new’ and measured for out-of-distribution. The level of novelty of a hospital data will then be compared with the performance of the predictive model.

The ultimate goal is to be able to build a reliable mechanism to predict if a model will generalize or not to a new hospital, or whether re-training is needed, preventing dangerous applications of predictive models to settings in which performance is likely to drop.


Research questions

1. To what extent is it we use Out-Of-Distribution at a dataset level to forecast whether a given predictive model will generalize to a new hospital?

2. Is the OOD framework useful:

a. Can it point to data discrepancy issues?

b. can it handle datasets with different levels of granularity?

c. Can it handle datasets with highly unbalanced outcome?

d. Can it handle datasets with many categorical features?


Time period

7 months.


Contact

Giovanni Cinà, Amsterdam UMC, location AMC, Department of Medical Informatics, g.cina@amsterdamumc.nl

References

[1] Ulmer, Dennis, Lotta Meijerink, and Giovanni Cinà. "Trust issues: Uncertainty estimation does not enable reliable ood detection on medical tabular data." Machine Learning for Health. PMLR, 2020.

[2] Wong, Andrew, et al. "External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients." JAMA Internal Medicine 181.8 (2021): 1065-1070.

[3] Zadorozhny, Karina, et al. "Out-of-Distribution Detection for Medical Applications: Guidelines for Practical Evaluation." arXiv preprint arXiv:2109.14885 (2021).

[4] Hendrycks, Dan, et al. "Scaling out-of-distribution detection for real-world settings." arXiv preprint arXiv:1911.11132 (2019).