Through the use of AI applications, we now have the ability to transform healthcare for all children, including those who are not geographically or socio-economically lucky, with artificial intelligence applications (https://timothy-ck-chou.medium.com/why-artificial-intelligence-for-childrens-medicine-bd95236dd2d7). That said, the traditional centralized architectures will not work to train and deploy accurate, real-time, privacy-preserving applications for cardiology, orthopedics, neurology, cancer, or any of a whole host of other specialties. We’ve seen this challenge play out in consumer AI as well, as they were faced with similar challenges related to preserving privacy while increasing accuracy.
One of the early innovations on the iPhone’s Siri application was the personalization of its “Hey Siri” wake phrase. More specifically, Siri needed to be trained to respond only to you and not to your friends or family. You might think that Apple would need to collect a lot of your audio data in order to personalize Siri’s response to your voice and your voice alone. But surprisingly, it didn’t.
Had Apple used a traditional centralized architecture approach, your Siri voice request would have had to be sent to Apple’s central cloud, where engineers would have applied neural network technology similar to what was used for Stanford’s 2010 ImageNet competition, which also used a centralized architecture for training an image recognition application. Doing so would have posed two significant challenges. First, having your voice in the Apple cloud would not preserve your privacy. Second, all issues with privacy aside, you wouldn’t be willing to pay for the network bandwidth required to send your voice commands to the Apple cloud every time you wanted Siri to answer a question, would you?
Instead, Apple employed a new technique called federated learning to train and improve Siri’s accuracy. First introduced by Google in 2017, federated learning is a privacy-preserving, machine-learning method that has allowed Apple to train individual copies of a speaker recognition model across all its users’ devices using only locally available audio data (i.e. on each individual device). Rather than having to transmit the original data (in this instance, audio), each iPhone just sends the updated neural network model (but not your audio) back to an aggregation server where it can all be combined into a master model. In this way, audio of your Siri requests never leaves your iPhone or iPad, while Siri is nevertheless able to continuously improve her accuracy.
So how does Siri do this?
Federated learning takes advantage of a decentralized architecture, which enables Siri’s learning to take place on each of the millions of iPhones in parallel. Training on just your voice on just your phone, Siri is able to compute neural network weights. These weights are then sent to an aggregation server where the results of each of these parallel training sessions are combined to create a new consensus model. Each iteration of this process, including parallel training on individual phones, update aggregation on a centralized server, and distribution of new, improved parameters, is known as a federated learning round.
Federated learning is privacy preserving since only the neural network weights, rather than the data that informs them, are shared. It’s network preserving because all that is shared is the neural network weights, where each weight is 4 bytes, not the entire audio. For comparison a 1-minute audio clip is about 1,000,000 bytes.
Given these benefits, you might be left wondering why everyone isn’t switching from centralized to decentralized learning. In the case of consumer applications, it’s because decentralized federated learning can pose some specific challenges. Three important considerations in the application of decentralized federated learning include:
1. Slow Communication
Federated networks have, to date, been comprised of millions of individual devices (such as smart phones or tablets). This number can be massive, placing heavy demands on the network. As a result, communication is much slower than communications between servers in a traditional centralized architecture. To account for this challenge, it is necessary to develop communication-efficient methods that send small messages or model updates more frequently as part of the training process, rather than waiting to send a large dataset in its entirety over the network.
2. Complex Power Management
Unlike training in a centralized architecture, training consumer AI applications using federated learning needs to make sure that it doesn’t use too much power on the individual devices (phone or tablet). It also has to contend with the reality of consumers powering off their devices.
The storage, computational, and communication capabilities of each device in federated networks may differ due to variability in hardware (CPU, memory), network connectivity (3G, 4G, 5G, WiFi), and power (battery level). Constraints on each phone or tablet typically result in only a small fraction of the devices being active at once. For example, only hundreds of phones or tablets at any given time may be active in a million-device network. Each device may also be unreliable, and it is not uncommon for an active device to unexpectedly drop out during a given iteration. These system-level characteristics make technical issues, such as stragglers and fault tolerance, significantly more prevalent than in typical centralized architecture.
Despite the above challenges, federated learning is nevertheless generating significant enthusiasm regarding its ability provide privacy-preserving, network-preserving training for consumer AI applications. Next up: How can this enthusiasm and the privacy-preserving, network-preserving technology be harnessed and applied to training AI in children’s medicine applications?