WCSE 2022
ISBN: 978-981-18-3959-7 DOI: 10.18178/wcse.2022.06.014

Differentially Private, Federated Learning on Gene Expression Data for Tumour Classification

Souhail Meftah, Meenatchi S. M. S. Annamalai , Dominic J. F. Byrne, Khin Mi Mi Aung, Bharadwaj Veeravalli

Abstract— Over recent years, machine learning (ML) methods have enabled considerable progress to be made within a variety of data-rich research domains. Genomics is one prominent field, with ML-based approaches achieving exciting results in a range of historically difficult tasks. One such area where ML approaches have achieved particular success is the classification of cancerous tissues using gene expression data. Despite this success, recent advances share a common issue with many other areas of ML research - state-of-the-art performance relies upon the aggregation of large training datasets. Many have raised concerns over the implications such large-scale data aggregation practice has for data privacy, and the nature of genomic data makes such concerns all the more pertinent within a cancer classification setting. Because of the sensitivity of such data, strict policies are enacted to protect patient privacy. Whilst currently unavoidable, such policies are a key roadblock to advances in ML research in this domain. Federated learning (FL) aims to solve this issue by allowing ML models to be trained on decentralized data. Despite the promise held by this approach, recent work has demonstrated that FL is not sufficient for fully private training of ML models. The incorporation of differential privacy (DP) into an FL framework has previously been shown to enable decentralized model training in a privacy-preserving manner. We introduce a method to incorporate custom differentially private algorithms directly into federated learning workflows that are not currently implemented by privacy libraries such as Opacus.

Index Terms—differential-privacy, federated learning, gene expression

Souhail Meftah
Institute for Infocomm Research, A*STAR, SINGAPORE
The National University of Singapore, SINGAPORE
Meenatchi S. M. S. Annamalai
Institute for Infocomm Research, A*STAR, SINGAPORE
Dominic J. F. Byrne
Institute for Infocomm Research, A*STAR, SINGAPORE
The University of Manchester, Manchester, UK Khin Mi Mi Aung
Institute for Infocomm Research, A*STAR, SINGAPORE
Bharadwaj Veeravalli
The National University of Singapore, SINGAPORE

[Download]


Cite: Souhail Meftah, Meenatchi S. M. S. Annamalai , Dominic J. F. Byrne, Khin Mi Mi Aung, Bharadwaj Veeravalli, "Differentially Private, Federated Learning on Gene Expression Data for Tumour Classification, " Proceedings of 2022 the 12th International Workshop on Computer Science and Engineering (WCSE 2022), pp. 86-95, June 24-27, 2022.