Outshift | Democratizing federated learning

What is federated learning?

Federated learning (FL) is a machine learning (ML) mechanism where different parties participate in a machine learning task and build a global model without sharing training data with any other parties. You can think of federated machine learning as a group project wherein everyone helps build a model—but never has to share the private data used to achieve that model.

Making federated learning work

Federated learning is a subset of machine learning capabilities, similar to how natural language understanding can fall under the “ML” umbrella.

While there are several different federated learning training modes, a typical setting consists of two types of computing nodes: (1) trainer and (2) aggregator. The trainer node processes a dataset locally to build a model, and a set of trainer nodes share their model parameters with the aggregator node.

Upon receiving these model updates, the aggregator node builds a global model by aggregating the model parameters. The global model is then shared with all the trainer nodes. This process can be repeated for multiple rounds.

The primary goal of federated learning is to preserve data privacy. For that, datasets in a trainer node are not shared with any other node, and only model parameters of the locally trained model are shared via a secure connection. (Note that there is still a risk of leaking private information via reverse engineering of model parameters. Hence, new techniques like differential privacy and homomorphic encryption have been proposed to further enhance privacy preservation, but we leave the discussion of these topics out of this post.)

Federated machine learning vs. distributed learning

Federated learning differs from distributed learning. In distributed learning, privacy is not a main concern. Instead, a key goal of distributed learning is to maximize the parallelism of computation over a large dataset so that a model can be trained as quickly as possible. To leverage this technique, a dataset is often owned by one organization and located in a centralized store. And trainer nodes fetch an equal-size subset from the dataset and carry out an ML training task in parallel. In contrast, in federated learning, datasets are heterogenous by nature because they are collected and curated by different organizations. Thus, these datasets tend to exhibit non-IID characteristics as opposed to the datasets used in distributed learning.

Federated learning use cases

FL can be broadly applied for ML training tasks where data movement is highly discouraged or prohibited due to data privacy or operational costs. Thus, training ML models in healthcare makes FL a perfect method. For example, consider an ML application to detect heart diseases (e.g., aortic stenosis) by using Electrocardiogram (ECG) signals of a patient. Training such a model accurately, a broad set of patients' data is needed from various hospitals, and sharing patients' private data is not an option. Clearly, FL can work well under these kinds of constraints.

The insurance industry can benefit from FL too. For example, ML training for insurability or risk assessment of insurance underwriting can take place without sharing customer data across different insurance institutions.

Another good example is ML training tasks in remote areas (e.g., fault prediction in an offshore wind turbine farm) with limited network access. In this case, the volume of transferred data may greatly slow down ML training in a centralized location; and FL can render itself viable under the situation.

Open questions and challenges in federated learning

Given the broad applicability of FL, democratizing FL is key to its success. However, there are still open challenges and missing blocks from several aspects such as systems, communication cost, security, bias, etc.

Ease of Use: First and foremost, ease of use is often neglected in developing technology. FL is not an exception. Several existing FL frameworks pay little attention to the complexity involved in managing underlying heterogeneous infrastructures, especially given that those infrastructures may be owned by different organizations or entities (e.g., different hospitals). The ease of use also means that an FL framework should holistically support a set of core functionalities such as model lineage tracking, training observability, etc. While there are many isolated solutions for individual features, no solution approaches FL's requirements holistically. A holistic FL framework may involve rethinking architectural designs and extensive system integration with existing solutions.
Incentives and trust: The next issue relates to incentives and trust. Since FL involves different parties of which interests and motivations may or may not be aligned, it is crucial to ensure that multiple parties participate in the FL training process honestly and genuinely. What would be good ways (or incentives) to keep the participants in the FL training? What would be the right ways to detect and discourage cheaters who want to take advantage of a global model with a little contribution? These are the questions that we need to answer for a meaningful FL framework.
Data management: Data management is also a bigger issue in FL than in other ML training settings. In many cases, training datasets are private, and a model training module must not leak private data directly and indirectly. An FL system needs to provide some way of assurance that data is at least not leaked in an apparent way. Also, if the private data needs to be loaded (or streamed) into a trainer node from a data source over the network, the FL framework should be able to offer secure means to access the private data.
Bias detection and management: Needless to say, datasets in FL are likely to have a non-IID nature because participants can have different size of datasets from different populations. Therefore, bias detection and management mechanisms should be incorporated into the system throughout the entire lifecycle of data management and training. In addition, it's equally important to track how bias creeps into a model version by lineage tracking.

Making federated learning work through “democratization”

To truly democratize FL, the systems challenges must be completely out of the equation so that data scientists can solely focus on the ML parts and not worry about the systems issues. While some of the challenges are not unique in FL, they are more challenging because of the heterogenous nature of FL. Therefore, building a holistic FL system is an absolute necessity to ensure that FL can be truly at the disposal of data scientists and machine learning engineers.

Learn more about how we’re marching toward the democratization of federated learning.

Published on 00/00/0000

Last updated on 00/00/0000

Published on 00/00/0000

Last updated on 00/00/0000

Our Work

Our Collaborators

Company

Apply

Connect

Categories

Resource Hub

by

Myungjin Lee

Published on 09/07/2021

Last updated on 02/12/2025

Published on 09/07/2021

Last updated on 02/12/2025

Democratizing federated learning

Get emerging insights on innovative technology straight to your inbox.

What is federated learning?

Making federated learning work

Federated machine learning vs. distributed learning

Federated learning use cases

Open questions and challenges in federated learning

Making federated learning work through “democratization”

Welcome to the future of agentic AI: The Internet of Agents

Related articles

AI/ML

CrewAI, AG2, Browserbase and 50 other companies join AGNTCY

Inside Outshift

Swisscom and Outshift team up to build agentic AI-enabled networks

AI/ML

Agentic AI for troubleshooting and change management