[MA 2022 07] Predicting recovery of functioning after Covid-19 infection: analysis of structured data from electronic patient records

Amsterdam UMC, location VUmc, Dept of Rehabilitation Medicine
Proposed by: Edwin Geleijn, physiotherapist / innovation manager [e.geleijn@amsterdamumc.nl]

Introduction

COVID-19 leads in 5% of the population to long term functional limitations. To date, it is unclear which factors are associated or might predict these limitations, and it is also unclear why some patients recover in a short period of time and others take more than 1 year. Our study aims to better understand and predict functional recovery after Covid-19 infection.

Up until now we have trained NLP based classifiers in unstructured data in the electronic health record (EHR) to determine the level of functioning on nine functional domains throughout the course of the disease and rehabilitation. These domains were derived from the International Classification of Functioning, disability and health (ICF) and include mood, breathing, fatigue, walking, eating, maintaining body weight, attention, exercise capacity and work/education. We have applied the classifiers to EHR text data, leading to a structured dataset. We currently have access to a dataset of over 250.000 patients including approximately 3500 COVID-19 patients from two hospitals (Amsterdam UMC and UMCU).


Description of the SRP Project/Problem

Now we know de course of function over time, we are interested te learn which factors might be related to a certain course. To analyse this we have approximately 40 Gb of structured data available that can be used to answer a variety of research questions.


Research questions

How do subgroups of COVID-19 patients differ in their level functioning at several predetermined event-dates (i.e. hospital admission, discharge and follow-up?) Which factors are associated with the level of functioning within and between those subgroups?

1) Stratification

a. In- hospital

i. ICU

ii. Non-ICU

b. Outpatient- after hospitalization Amsterdam UMC (secondary outpatients)

c. Outpatient only (primary outpatients)

d. COVID (primary diagnosis)- non-COVID (secondary diagnosis- no COVID)

e. Hospital: Amsterdam UMC/UMCU


1) Alignment to event-date

a. Date of hospital admission

b. Date of ICU admission

c. Date of diagnosis

d. Observation date.


2) Comparisons, covariance matrices

a. Per stratum

b. Per event- date- observation period


3) Cross- sectional associations

a. At hospital admission

b. At hospital discharge

c. Per event- date- observation period


4) Potential determinants to evaluate

a. Age

b. Comorbidity

c. N of medication

d. Ventilation/ y/n, L/min (disease severity).


Expected results

• A report that clearly describes the structure of the database, including a code book

• A cleaned and prepared dataset

• Results of the analyses within and between strata


Time period (7 months)

Immediate


Contact

Edwin Geleijn, physiotherapist / innovation manager

e.geleijn@amsterdamumc.nl


References

Verkijk S, Vossen P. MedRoBERTa.nl: A Language Model for Dutch Electronic Health Records; 2021; https://clinjournal.org/clinj/article/view/132

World Health Organization. (2013). How to use the ICF: A practical manual for using the International Classification of Functioning, Disability and Health (ICF) [https://www.who.int/publications/m/item/how-to-use-the-icf---a-practical-manual-for-using-the-international-classification-of-functioning-disability-and-health]. Exposure draft for comment. Geneva: WHO

Meskers, C. et al. Automated recognition of functioning, activity and participation in COVID-19 from electronic patient records by natural language processing: a proof- of- concept.


Machine Learning Pipeline

• GitHub: https://github.com/cltl/aproof-icf-classi_er