Federated learning: embedding privacy in the learning algorithm
The future of global data protection regulations continues to be a subject of debate, however it can be expected that more tight regulations will face it with the uptake of AI and big data systems that has recently advanced faster than everybody thought due to the COVID-19 crisis. But what does the future hold for data-driven organizations and how do we need to start AI systems that are intrinsically managing users privacy? Federated learning solves the issues of data ownership and privacy.
The advent of 5G and edge Artificial Intelligence platforms introduces the opportunity for distributed AI systems. Meaning, devices can be connected in a network of nodes, each generating their own local data from sensors or users and each with their own computational resources. The network allows to share the locally learned knowledge between all nodes in the network.
In the first place, these nodes perform predictions based on machine learning models. In the case of iterative learning, nodes also generate data to learn from. The machine learning model learns from this data. The model both provides predictions in the edge whilst also improving performance by learning continuously. The main challenge is to share the learnings of the different edge nodes over the entire distributed system.
The current approach to distributed learning raises some critical issues in the application of distributed edge AI namely, data privacy, data security, data access rights and access to heterogeneous data. We dive deeper here to understand why and how federated learning can solve these issues.
Classical approach to distributed learning
The classical approach to iterative, distributed learning relies on data to learn from that is typically generated locally on sensors or by users (mobile phone input, health bands, etc.). In the last decade, the general approach was to have a centralized machine learning model in a server, that receives data which is sent integrally from the distributed nodes to the central server where the data is aggregated and fed to the machine learning model. The machine learning model learns from the data. In turn, the improved model is then sent to the nodes to use locally to perform predictions.
This architecture requires the owner of the edge device to send their data integrally to the main server. This means risk for privacy, giving up data ownership and security risks. Federated learning solve these issues.
Federated learning provides solution
Federated learning solves the issues of data ownership and privacy by making sure the data never leaves the distributed node devices, whilst ensuring that a central model is updated and shared to all nodes in the network.
Federated learning enables multiple nodes to build a common, robust machine learning model without sharing data. This is achieved in 4 general steps:
- Randomized central model sharing: The central model is shared to all edge nodes by storing local copies of the network.
- Local optimization: The local networks are updated by learning on the local data samples.
- Security measure 1: The “learning” is encoded in neural network weights of the model. This step is hard to reverse, meaning it is hard to take these weights and decode them to get the original data.
- Security measure 2: These learned model weights are then encrypted. To send to the central server.
- Local update sharing: All local updates are sent in an encrypted way to the central server.
- Secure model update: The central server receives the weight updates of all local models and aggregates them without undoing the encryption (based on Homomorphic Encryption encryptions) or decoding the weights to reconstruct original data (since this is very difficult). During the aggregation, the central model takes into account the amount of samples each local used for updating its model. More data means, higher impact on the global model.
- Restart at step 1 by sharing the updated network to all local devices.
State of the art (SoTa)
Federated learning is at this point not widely used. One of the most prominent uses is the Gboard application in which google uses federated learning to improve the performance of your smartphone keyboard. It takes the data generated by all users on their local cell phone and improves the central spelling corrector without ever sending your data to the central server.
Federated learning has also been applied to Unmanned Aerial Vehicles (UAV). UAV swarms must exploit machine learning in order to execute various tasks ranging from coordinated trajectory planning to cooperative target recognition. However, due to the lack of continuous connections between the UAV swarm and ground base stations, using centralized machine learning will be challenging, particularly when dealing with a large volume of data. In the referenced paper, a novel framework is proposed to implement distributed federated learning algorithms within a UAV swarm that consists of a leading UAV and several following UAVs.
The low adoption rate of federated learning is in stark contrast to the technology readiness. The big deep learning frameworks already support federated learning:
- TensorFlow federated: An extension to the TensorFlow deep learning framework to enable federated learning.
- Pysift: An extension to the pytorch deep learning framework
to enable federated learning.
Future applications of federated learning
There are 4 ingredients to unleash the power of federated learning:
- A distributed network of nodes.
- The nodes are performing a local algorithm for prediction.
- The nodes can gather local data for learning.
- Data privacy, security and ownership are critical.
Any machine learning model used on personal devices like smartphones, cars, smart home applications or even smart appliances can make use of federated learning to improve the user experience of each individual user:
- In the coming decade, transportation will get more autonomous. The driving factor in this technology will be the data that is generated and gathered by car makers to develop their algorithms. Federated learning allows to learn from the cameras in your Tesla, without sharing your actual footage to the main model.
- Image a smart fridge that tracks your consumption and advises you in your grocery planning. The model might learn patterns in consumption that are general to human behaviour in general. These general learning can be shared with all smart fridge users safely without sharing your personal consumption data.
- Image a smartwatch that is used for workout tracking. The applications can detect your level of exertion by analyzing your movement and heart rate. The data you are generating locally could be used to improve the application for all users, by means of machine learning. Federated learning allows to share these learnings to all smart watches without ever having the data leave your device.
- Image a smart thermostat that can learn patterns that are common to all users to improve the overall experience, without ever sharing your personal data to a central point.
AI and big data systems will need to adapt in the future and wide acceptance and adoption will only happen when end users feel the AI and data systems is trustworthy and safe to handle that means having upmost respect for the consumer.
This requires organisation to take ownership of the usage of end user data by having a strong data strategy and governance framework in place. Federated learning can bean important part and solve part of the promise we end users expect from these systems:
- Protecting the individual’s right to privacy
- Making the individual aware of how their data is being used
- Giving the individual greater control over their data
- Ensuring the individual’s data is stored securely
Download the perspective