Improving Data Privacy Through Federated Machine Learning
February 19, 2024
An introduction to the concept of federated ML and Additive Secret Sharing.
The AI market is dominated by tech giants such as Google, Amazon, and Microsoft, offering cloud-based AI solutions and APIs. In traditional AI methods, sensitive user data are sent to the servers where models are trained.
This centralized model comes with a problem, user’s privacy. User or customer data is exposed to the centralized model with other raw labeled data. As a result, a local user has no control over it.
The solution? Federated Machine Learning
With the advent of billions of smartphones with incredible computing powers, the platform where machine learning models can run is no more confined to high computational powered CPUs.
It was about time to devise an architecture where a centralized model makes use of all the local models in order to form a super-efficient model that can be used by all contributing devices.
This decentralizes the model training phase, which comes with a few advantages:
1. A central model sharing platform/cloud, which contains the initial model that is being broadcasted to all the devices.
2. The devices connected to the central model apply local optimization using the local data.
3. The central model updates its own model using the models from other local devices.
4. Step 1 is repeated.
Figure 1: Federated Machine Learning (Image Credit: Medium Article)
Why is this an incredible design?
This design helps you to get the best and updated model from the cloud which is being fine-tuned by other collaborators building the model.
Not only can you use your own local model to do the prediction but also use the aggregated and more refined model from others to do better predictions.
Even if your device is not contributing enough to the central model, you are actually able to consume the best model once you are connecting to the cloud.
Sounds great, doesn’t it?
What is the connection to data privacy?
While the design above is very efficient, it is not preserving privacy.
The model being fetched from the cloud by the other devices for local optimizations contains information from each device. Reverse engineering the model can substantially put the data privacy at risk:
- A local device can get information about the data from other devices when it fetches the central model.
- The central model has exposure to the data from the local devices.
Additive Secret Sharing for rescue?
In order to solve this problem, a cryptography method called Additive Secret Sharing (ASS) is introduced.
The model update would require calculations of several hyper-parameters. Ass provides a method of encrypted calculation while aggregating the models from different devices in the central model aggregator(cloud) and passes over the encrypted outcome back to the devices to be decrypted privately by the devices which are connected.
In this way, the devices are unaware of any ideas/interpretations of weights from the other models used in the calculation.
This way, the basic privacy concern can be minimized. The user data can be protected from the cloud and the calculations would now be less vulnerable to leak information about the data.
How can an organization benefit
After going through the article, you might wonder how an organization can benefit from the concepts being discussed above. What if an organization wants to use your model but doesn’t want to share the real data? Also, what if they want to run your model locally and not in the server where the model is deployed? That’s where the concepts like Federated Learning and Encrypted Machine Learning can be a handy tool: both for model tuning and model sharing.
Lastly, Omdena has a huge scope for exploring this concept since organizations like Omdena can come around with requirements where they might require to apply machine learning on data that they are not allowed to see. For those situations, encrypted machine learning and federated learning can be a handy tool.
You can learn more about Additive Secret Sharing from this article.
Google is using federated machine learning in their GBoard while Apple is using it in Siri.
You can visit OpenMined who provides python library PySyft to implement federated learning, a great tool to get started with data privacy.
Udacity has a course on Private AI which is a MUST HAVE course in your bucket if you want to implement PySyft in your application.
Federated Learning is still in its early stages and faces numerous challenges with its design and deployment but yields promising initial results.