[MA 2025 03] Private knowledge distillation by copying the behaviour of prediction models with differential privacy
Department of Medical Informatics (KIK)
Proposed by: Iacer Calixto, assistant professor of artificial intelligence [i.coimbra@amsterdamumc.nl ]
Introduction
Recently, researchers have developed an efficient method to “copy” the behaviour of a prediction model ‘A’ into another prediction model ‘B’. For instance, ‘A’ can be a neural network trained to predict mortality given a patient’s medical history, and ‘B’ can be any kind of machine learning model (e.g., a decision tree, support vector machine, random forest, logistic regression, neural network, etc.) [1, 2]. We refer to model A as the teacher model, and model B as the student model. The method proposed in [1,2] allows for one to distil the knowledge from the teacher model into the student model so that in terms of performance they are similar. In the original formulation in [1,2] researchers assume no access to the original training data used to train the teacher model. This is not the case in this SRP project. The reasons to distil this knowledge from the teacher to the student are many, and a few examples include interpretability (e.g., the student model comes from a family of machine learning models that are more interpretable than the teacher) or compute efficiency (e.g., the student model requires less specialised hardware to run and is therefore cheaper to deploy in terms of energy consumption).
Description of the SRP Project/Problem
In healthcare applications simply copying the behaviour of a prediction is suboptimal if we cannot guarantee the privacy of the subjects used for training the prediction model. In this SRP, you will use the MIMIC-IV [3] dataset and work on the problem of predicting the length-of-stay (a task that can be modelled as a regression task) and the risk of death (a task that can be modelled as a classification problem) for an intensive care unit (ICU) patient using differential privacy [4]. MIMIC-IV includes data for over 40,000 ICU patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC) in the United States, including demographics, labs, medications, and more.1 You will build neural network-based teacher models using all relevant data available for a patient (including structured data and optionally free-text clinical notes). You will then adapt the methodology introduced in [1,2] to distil the knowledge of the teacher into different student models by incorporating differential privacy in the learning mechanism. You will compare using differentially private optimisation algorithms (like DP-SGD [5] or DP-ADAM [6]) as well as ensemble-based algorithms like PATE [7]. You will investigate in what scenarios the knowledge distillation works well and in what scenarios it fails.
Research questions
RQ 1) To what extent does including differential privacy in knowledge distillation as proposed in [1,2] apply to transferring the knowledge of a teacher model trained on a regression task, e.g., predicting length-of-stay using MIMIC-IV?
RQ 1.1) What about including differential privacy when transferring knowledge on a classification task, e.g., predicting mortality using MIMIC-IV?
RQ 2) How do private optimisation algorithms (e.g. DP-SGD) compare to ensemble-based algorithms (e.g., PATE) in this context of private knowledge distillation in the framework of [1,2]?
Expected results
The main outcome of this SRP project is a scientific paper. We will publish the results of your work in a top-tier machine learning workshop, and you will use this paper as your thesis for defending your SRP.
You will deliver a publicly available code base where all the experiments conducted on your SRP will be shared with the research community.
Time period, please tick at least 1 time period
November – June ?
May – November ?
Contact
Iacer Calixto, assistant professor of artificial intelligence, KIK, i.coimbra@amsterdamumc.nl
References
[1] N. Statuto, I. Unceta, J. Nin, and O. Pujol. A scalable and efficient iterative method for copying machine learning classifiers. Journal of Machine Learning Research, 24(390):1–34, 2023. URL http://jmlr.org/papers/v24/23-0135.html.
[2] I. Unceta, J. Nin, and O. Pujol. Copying machine learning classifiers. IEEE Access, 8:160268–160284, 2019. URL https://api.semanticscholar.org/CorpusID:67877026.
[3] A. EW Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J Pollard, S. Hao, B. Moody, B. Gow, et al. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023.
[4] Dwork, C. (2006). Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds) Automata, Languages and Programming. ICALP 2006. Lecture Notes in Computer Science, vol 4052. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11787006_1
[5] Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). Association for Computing Machinery, New York, NY, USA, 308–318. https://doi.org/10.1145/2976749.2978318
[6] R. Gylberth, R. Adnan, S. Yazid and T. Basaruddin, "Differentially private optimization algorithms for deep neural networks," 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Bali, Indonesia, 2017, pp. 387-394, doi: 10.1109/ICACSIS.2017.8355063.
[7] Papernot, N., Abadi, M., Erlingsson, Ú., Goodfellow, I., & Talwar, K. (2022, July). Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. In International Conference on Learning Representations.