DEVELOPING PRIVACY-PRESERVING FEDERATED LEARNING MODELS FOR COLLABORATIVE HEALTH DATA ANALYSIS ACROSS MULTIPLE INSTITUTIONS WITHOUT COMPROMISING DATA SECURITY
DOI:
https://doi.org/10.60087/jklst.vol3.n3.p139-164Keywords:
Developing Privacy-Preserving Federated Learning Models, Health Data Analysis , Data SecurityAbstract
Federated learning is an emerging distributed machine learning technique that enables collaborative training of models among devices and servers without exchanging private data. However, several privacy and security risks associated with federated learning need to be addressed for safe adoption. This review provides a comprehensive analysis of the key threats in federated learning and the mitigation strategies used to overcome these threats. Some of the major threats identified include model inversion, membership inference, data attribute inference and model extraction attacks. Model inversion aims to predict the raw data values from the model parameters, which can breach participant privacy. The membership inference determines whether a data sample was used to train the model. Data attribute inference discovers private attributes such as age and gender from the model, whereas model extraction steals intellectual property by reconstructing the global model from participant updates. The review then discusses various mitigation strategies proposed for these threats. Controlled-use protections such as secure multiparty computation, homomorphic encryption and conidential computing enable privacy-preserving computations on encrypted data without decryption. Differential privacy adds noise to query responses to limit privacy breaches from aggregate statistics. Privacy-aware objectives modify the loss function to learn representations that protect privacy. Information obfuscation strategies hide inferences about training data.
Downloads
References
Acar, A., Aksu, H., Uluagac, A.S., and Conti, M. (2018). A survey on homomorphic encryption schemes: Theory and implementation. ACM Comput. Surv. 51, 1–35. https://doi.org/10.1145/3214303.
Adnan, M., Kalra, S., Cresswell, J.C., Taylor, G.W., and Tizhoosh, H.R. (2022a). Federated learning and differential privacy for medical image analysis. Sci. Rep. 12, 1953. https://doi.org/10.1038/s41598-022-05539-7.
AlBadawy, E.A., Saha, A., and Mazurowski, M.A. (2018). Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing. Med. Phys. 45, 1150–1158. https://doi.org/10.1002/mp.12752.
Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175-1191). ACM.
Buyukates, B., Therefore, J., Mahdavifar, H., and Avestimehr, S. (2022). Lightveri l: Lightweight and veri iable secure federated learning. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), pp. 1–20. URL: https://openreview.net/pdf?id=WA7I-Fm4tmP.
Cao, X., Moore, C., O’Neill, M., O’Sullivan, E., and Hanley, N. (2013). Accelerating fully homomorphic encryption over the integers with supersize hardware multiplier and modular reduction. Cryptology ePrint Archive. URL: https://eprint.iacr.org/2013/616.
Carlini, N., Liu, C., Kos, J., Erlingsson, U., and Song, D. (2018). The secret sharer: Measuring unintended neural network memorization & extracting secrets. Preprint at arXiv. https://doi.org/10.48550/arXiv.1802.08232.
Dayan, I., Roth, H. R., Zhong, A., Harouni, A., Gentili, A., Abidin, A. Z., Liu, A., Costa, A. B., Wood, B. J., Tsai, C.-S., Wang, C.-H., Hsu, C.-N., Lee, C.-K., Ruan, P., Xu, D., Wu, D., Huang, E., Kitamura, F. C., Lacey, G., ... Flores, M. G. (2021). Federated learning for predicting clinical outcomes in patients with COVID-19. Nature Medicine, 27, 1735–1743.
Dayan, I., Roth, H. R., Zhong, A., Harouni, A., Gentili, A., Abidin, A. Z., Liu, A., Costa, A. B., Wood, B. J., Tsai, C.-S., Wang, C.-H., Hsu, C.-N., Lee, C.-K., Ruan, P., Xu, D., Wu, D., Huang, E., Kitamura, F. C., Lacey, G., ... Flores, M. G. (2021). Federated learning for predicting clinical outcomes in patients with COVID-19. Nature Medicine, 27, 1735–1743.
Dayan, I., Roth, H.R., Zhong, A., Harouni, A., Gentili, A., Abidin, A.Z., Liu, A., Costa, A.B., Wood, B.J., Tsai, C.-S., et al. (2021). Federated learning for predicting clinical outcomes in patients with COVID-19. Nat. Med. 27, 1735–1743. https://doi.org/10.1038/s41591-021-01506-3.
Demelius, L., Kern, R., and Trugler, A. (2023). Recent advances of differential privacy in centralized deep learning: A systematic survey. Preprint at arXiv. https://doi.org/10.48550/arXiv.2309.16398.
Dunnmon, J.A., Yi, D., Langlotz, C.P., Re´, C., Rubin, D.L., and Lungren, M.P. (2019). Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology 290, 537–544. https://doi.org/10.1148/radiol.2018181422.
Ekberg, J.-E., Kostiainen, K., and Asokan, N. (2014). The untapped potential of trusted execution environments on mobile devices. IEEE Secur. Priv. 12, 29–37. https://doi.org/10.1109/MSP.2014.38.
Fakoor, R., Ladhak, F., Nazi, A., & Huber, M. (2013). Using deep learning to enhance cancer diagnosis and classification. In Proceedings of the WHEALTH Workshop.
Ficek, J., Wang, W., Chen, H., Dagne, G., and Daley, E. (2021). Differential privacy in health research: A scoping review. J. Am. Med. Inf. Assoc. 28, 2269–2276. https://doi.org/10.1093/jamia/ocab135.
Gentry, C., and Halevi, S. (2011). Implementing gentry’s fully homomorphic encryption scheme. In Annual international conference on the theory and applications of cryptographic techniques (Springer), pp. 129–148. https://doi.org/10.1007/978-3-642-20465-4_9.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Comput. Surv. 51, 1–42. https://doi.org/10.1145/3236009.
Haidar, M., and Kumar, S. (2021). Smart healthcare system for biomedical and health care applications using aadhaar and blockchain. In 2021 5th International Conference on Information Systems and Computer Networks (ISCON) (IEEE), pp. 1–5. https://doi.org/10.1109/ISCON52037.2021.9702306.
Hitaj, B., Ateniese, G., & Perez-Cruz, F. (2017). Deep models under the GAN: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 603–618). Association for Computing Machinery.
Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., and Zhang, X. (2022). Membership inference attacks on machine learning: A survey. ACM Comput. Surv. 54, 1–37. https://doi.org/10.1145/3523273.
Huang, C., Yao, Y., Zhang, X., Teng, D., Wang, Y., and Zhou, L. (2022). Robust secure aggregation with lightweight verification for federated learning. In 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (IEEE), pp. 582–589. https://doi.org/10.1109/TrustCom56396.2022.00085.
Jarin, I., and Eshete, B. (2022). Dp-util: comprehensive utility analysis of differential privacy in machine learning. In Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, pp. 41–52. https://doi.org/10.1145/3508398.3511513.
Jayaraman, B., and Evans, D. (2019). Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19) (USENIX Association), pp. 1895–1912. https://doi.org/10.5555/3361338.3361469.
Kaissis, G., Ziller, A., Passerat-Palmbach, J., Ryffel, T., Usynin, D., Trask, A., Lima, I., Mancuso, J., Jungmann, F., Steinborn, M.-M., et al. (2021). End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell. 3, 473–484. https://doi.org/10.1038/s42256-021-00337-8.
Kalapaaking, A.P., Stephanie, V., Khalil, I., Atiquzzaman, M., Yi, X., and Almashor, M. (2022). SMPC-based federated learning for 6G-enabled internet of medical things. IEEE Network 36, 182–189. https://doi.org/10.1109/MNET.007.2100717.
Karargyris, A., Umeton, R., Sheller, M.J., Aristizabal, A., George, J., Bala, S., Beutel, D.J., Bittorf, V., Chaudhari, A., Chowdhury, A., et al. (2021). Medperf: Open benchmarking platform for medical artificial intelligence using federated evaluation. Preprint at arXiv. https://doi.org/10.48550/arXiv.2110.01406.
Li, J. C., Meng, Y., Ma, L. C., Li, X. D., & Ding, Q. (2022). A federated learning-based privacy-preserving smart healthcare system. IEEE Transactions on Industrial Informatics, 18, 2021–2031.
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2018). Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127.
Liang, M., Li, Z., Chen, T., & Zeng, J. (2015). Integrative data analysis of multiplatform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4), 928-937.
Makhdoumi, A., Salamatian, S., Fawaz, N., and Me´dard, M. (2014). From the information bottleneck to the privacy funnel. In IEEE Information Theory Workshop (ITW 2014). IEEE, pp. 501–505. https://doi.org/10.1109/ITW.2014.6970882.
Malekzadeh, M., Hasircioglu, B., Mital, N., Katarya, K., Ozfatura, M.E., and Gunduz, D. (2021). Dopamine: Differentially private federated learning on medical data. Preprint at arXiv. https://doi.org/10.48550/arXiv.2101.11693.
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) (Vol. 54, pp. 1273-1282). PMLR.
Melis, L., Song, C., De Cristofaro, E., & Shmatikov, V. (2019). Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP) (pp. 691–706). IEEE.
Miguel-Herrera, M., & Ramachandran, K. K. (2014). International branding and performance implications in emerging markets. International Journal of Management, 5(7), 1-15.
Moore, W., and Frye, S. (2019). Review of HIPAA, part 1: history, protected health information, and privacy and security rules. J. Nucl. Med. Technol. 47, 269–272. https://doi.org/10.2967/jnmt.119.227892.
Murakonda, S.K., and Shokri, R. (2020). ML privacy meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning. Preprint at arXiv. https://doi.org/10.48550/arXiv.2007.09339.
Nasr, M., Shokri, R., and Houmansadr, A. (2019). Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE symposium on security and privacy (SP) (IEEE), pp. 739–753. https://doi.org/10.1109/SP.2019.00065.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In Deep Learning and Unsupervised Feature Learning, NIPS Workshop.
Onesimu, J. A., Karthikeyan, J., & Sei, Y. (2021). An efficient clustering-based anonymization scheme for privacy-preserving data collection in IoT based healthcare services. Peer-to-Peer Networking and Applications, 14, 1629–1649.
Orekondy, T., Schiele, B., and Fritz, M. (2019). Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4954–4963. https://doi.org/10.1109/CVPR.2019.00509.
Pfohl, S.R., Dai, A.M., and Heller, K. (2019). Federated and differentially private learning for electronic health records. Preprint at arXiv. https://doi.org/10.48550/arXiv.1911.05861.
Pham, Q.-V., Zeng, M., Ruby, R., Huynh-The, T., and Hwang, W.-J. (2021). UAV communications for sustainable federated learning. IEEE Trans. Veh. Technol. 70, 3944–3948. https://doi.org/10.1109/TVT.2021.3065084.
Qi, P., Chiaro, D., Guzzo, A., Ianni, M., Fortino, G., and Piccialli, F. (2024). Model aggregation techniques in federated learning: A comprehensive survey. Future Generat. Comput. Syst. 150, 272–293. https://doi.org/10.1016/j.future.2023.09.008.
Sanyal, S., Addepalli, S., and Babu, R.V. (2022). Toward data-free model stealing in a hard label setting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15284–15293. https://doi.org/10.1109/CVPR52688.2022.01485.
Schneider, M., Masti, R.J., Shinde, S., Capkun, S., and Perez, R. (2022). SoK: Hardware-supported trusted execution environments. Preprint at arXiv. https://doi.org/10.48550/arXiv.2205.12742.
Shokri, R., & Shmatikov, V. (2015). Privacy-preserving deep learning. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 909-910). IEEE.
Smestad, C., and Li, J. (2023). A systematic literature review on client selection in federated learning. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering, pp. 2–11. https://doi.org/10.1145/3593434.3593438.
Song, C., Ristenpart, T., and Shmatikov, V. (2017). Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, pp. 587–601. https://doi.org/10.1145/3133956.3134077.
Sun, T., Li, D., & Wang, B. (2021). Decentralized federated averaging. arXiv preprint arXiv:2104.11375.
Thakkar, O.D., Ramaswamy, S., Mathews, R., and Beaufays, F. (2021). Understanding unintended memorization in language models under federated learning. In Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 1–10. https://doi.org/10.18653/v1/2021.privatenlp-1.1.
Tonni, S.M., Vatsalan, D., Farokhi, F., Kaafar, D., Lu, Z., and Tangari, G. (2020). Data and model dependencies of membership inference attack. Preprint at arXiv. https://doi.org/10.48550/arXiv.2002.06856.
Topol, E.J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56. https://doi.org/10.1038/s41591-018-0300-7.
Usynin, D., Rueckert, D., Passerat-Palmbach, J., and Kaissis, G. (2022). Zen and the art of model adaptation: Low-utility-cost attack mitigations in collaborative machine learning. Proc. Priv. Enhanc. Technol. 2022, 274–290. https://doi.org/10.2478/popets-2022-0014.
Voigt, P., and Von dem Bussche, A. (2017). The general data protection regulation (GDPR). In A Practical Guide, 1st Ed., 10 (Springer International Publishing), pp. 3152676. https://doi.org/10.5555/3152676.
Wainwright, M. J., Jordan, M. I., & Duchi, J. C. (2012). Privacy aware learning. In Advances in Neural Information Processing Systems (NIPS).
Wang, Z., Song, M., Zhang, Z., Song, Y., Wang, Q., & Qi, H. (2019). Beyond inferring class representatives: User-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications (pp. 2512–2520). IEEE.
Wei, W., Liu, L., Loper, M., Chow, K. H., Gursoy, M. E., Truex, S., & Wu, Y. (2020). A Framework for Evaluating Gradient Leakage Attacks in Federated Learning. arXiv preprint arXiv:2004.10397.
Xu, J., Glicksberg, B. S., Su, C., Walker, P., Bian, J., & Wang, F. (2021). Federated learning for healthcare informatics. Journal of Healthcare Informatics Research, 5, 1–19.
Zhang, G., Liu, B., Zhu, T., Zhou, A., and Zhou, W. (2022). Visual privacy attacks and defenses in deep learning: a survey. Artif. Intell. Rev. 55, 4347–4401. https://doi.org/10.1007/s10462-021-10123-y.
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with noniid data. arXiv preprint arXiv:1806.00582.
Ziller, A., Usynin, D., Remerscheid, N., Knolle, M., Makowski, M., Braren, R., Rueckert, D., and Kaissis, G. (2021). Differentially private federated deep learning for multisite medical image segmentation. Preprint at arXiv. https://doi.org/10.48550/arXiv.2107.02586.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online)
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2024 All rights reserved by the respective authors and JKLST.