Distributed Machine Learning: Improved Data Protection for AI Applications?

Munich, 17 September 2022

Artificial Intelligence (AI) supports people in medicine, mobility and everyday work. The basis for AI systems is training with data – often also personal information. The method of distributed Machine Learning (ML) can improve data protection in the development of AI applications because the data used is not bundled centrally, but remains on the end devices and thus with the users. However, this can also create new points of attack for cybercriminals. A concise overview of the potential and risks of distributed Machine Learning is provided in the first issue of AI at a Glance, a new publication series of Plattform Lernende Systeme.

AI systems analyze large volumes of – in some cases sensitive – data. Companies face major legal uncertainties in developing AI applications using personal data; the hurdles to compliance with data protection and the right to informational self-determination are high. The distributed Machine Learning method offers a technical solution to create privacy-preserving AI applications: instead of being trained centrally on a server, Machine Learning (ML) models are trained in a decentralized manner on many end devices. Thus, personal data remains with the users.

“Distributed Machine Learning opens up new possibilities for effective and scalable use of data without having to share it. This enables many useful applications with sensitive data possible in the first place,” says Ahmad-Reza Sadeghi, Professor of Computer Science at Technical University of Darmstadt and member of the IT Security, Privacy, Legal and Ethical Framework working group of Plattform Lernende Systeme.

Potential for medical AI Solutions

Current technical approaches to distributed Machine Learning include Split Learning, Federated Learning, and Swarm Learning. In particular, AI-based healthcare solutions that use personalized patient data to detect, for example, disease cases such as Covid-19 or leukemia can benefit from the distributed Machine Learning method.

“AI systems in medicine can only be successful if they have the necessary amounts of data to achieve high accuracy. Distributed Machine Learning represents one of the most important technical options for making this possible while preserving the informational self-determination of the individual,” says Björn Eskofier, Professor of Machine Learning and Data Analytics at the Friedrich-Alexander-University of Erlangen-Nürnberg and member of the Health Care, Medical Technology, Care working group of Plattform Lernende Systeme.

However, distributed Machine Learning can also open new gateways for attackers and potentially create a deceptive sense of security, according to the publication AI at a Glance. How emerging attack windows can be closed without limiting performance is still the subject of research.

About the Format: AI at a Glance

AI at a Glance provides a concise, well-founded overview of current developments in the field of Artificial Intelligence and highlights potentials, risks as well as open questions. The analyses are produced with the support of experts from Plattform Lernende Systeme and are published by the Managing Office. The new publication series kicks off with Distributed Machine Learning: Improved Data Protection for AI applications?. It is available for download free of charge.

Distributed Machine Learning: Improved Data Protection for AI Applications?

Potential for medical AI Solutions

About the Format: AI at a Glance

Publication

Projects

Topic