Scientific Research Project

[MA 2024 01] The cost of added privacy: the impact of privacy-preserving federated learning methods on model performance

Amsterdam UMC, location AMC, Department of Medical Informatics

Proposed by: Sebastian van der Voort [s.r.vandervoort@amsterdamumc.nl]

Introduction

Federated learning is a new method to train algorithms using data from multiple parties, without that data needing to be shared1. This opens up larger data collections for training, as hospitals can keep control of their (sensitive) data. Although the data itself does not have to be shared, federated learning does require the exchange of model parameters. These model parameters could be used by a bad actor to learn something about the original data. Therefore, there is an interest in added privacy when using federated learning by using privacy-preserving methods such as Secure Multi-Party Computation, Fully Homomorphic Encryption, and Differential Privacy2. These methods constrain the amount of information that could be gained from the shared model parameters, however this might come at a cost of reduced model performance.

Description of the SRP Project/Problem

The focus of this project is the evaluation of different privacy-preserving methods that can be applied within federated learning. Initially, a small literature search is required to get an overview of the available privacy-preserving methods. Based on this overview a choice is made for the most interesting methods to investigate further. These methods will then be implemented (in Python) in available federated learning frameworks such as Flower3. Models will be trained based on data from intensive care unit (ICU) patients, to make a model that can predict mortality for these patients. However, other data can also be used if of interest to investigate the generalizability of these methods.

After implementation of the different privacy-preserving methods and training of the models, the performance of these models compared to the baseline (without privacy-preserving methods) is evaluated. This evaluation should take into account the model performance, as well as a measure for the added privacy when using the privacy-preserving methods.

Research questions

1. What kinds of privacy-preserving methods can be applied for federated learning?

2. To what extent do commonly available federated learning frameworks support the available privacy-preserving methods?

3. How is model performance affected by the different privacy-preserving methods?

4. If there is a negative impact on model performance, can we quantify the trade-off between reduced model performance and increased privacy preservation?

Expected results

The primary expected result is a master thesis in the form of a report that addresses the above research questions. This requires the implementation and analysis of different federated learning models (using existing frameworks) in Python. Depending on the quality and outcome of the analysis, the report can also be a publishable scientific article.

Time period

X November – June

X May - November

Contact

Sebastian van der Voort, s.r.vandervoort@amsterdamumc.nl

Amsterdam UMC, location AMC, Department of Medical Informatics

References

1. Zhang, Fan, et al. "Recent Methodological Advances in Federated Learning for Healthcare." arXiv preprint arXiv:2310.02874 (2023).

2. Prayitno, et al. "A systematic review of federated learning in the healthcare area: From the perspective of data properties and applications." Applied Sciences 11.23 (2021): 11191.

3. https://www.flower.dev