What is Flame?
Flame is an open source project for federated learning (FL) and end-to-end FL system that covers all aspects of federated learning lifecycle including compute resource and dataset management, monitoring, privacy, job scheduling, etc. The vision of Flame is to democratize federated learning (FL). Thus, its primary goal is to make FL training as easy and productive as possible for data scientists and machine learning engineers. It provides an end-to-end systematic support for Federated Learning Operations (FLOps). In addition, as applications and use-cases are emerging in the edge computing space, Flame also aims to facilitate FL at the edge. Hence, the name stands for Federated Learning, AI/ML at the Edge (FLAME).
* Note that the UI is under development and will be relased later in the open source repository
Federated Learning (FL) is a fast-evolving technology. Many variants of state-of-the-art FL techniques make it challenging for FLOps engineers to keep up with the pace of technological innovations. Flame is highly flexible and extensible; at the same time, its developer interface is quite intuitive, which enables fast prototyping of state-of-the-art FL techniques.
Flame offers a systematic way of managing datasets and compute resources so that FLOps tasks can be automated. Unlike several well-known usecases in smartphones, an FL job is not typically executed in places where data is generated. Instead, datasets are curated and stored in locations different from the data sources and computes for model training are unlikely to be ones of the data store. The problem of assigning dataset to compute is largely left as a manual process in existing federated learning solutions. This leads to tight coupling between compute and dataset, meaning that a federated learning job can’t be defined until both compute and dataset are identified and associated with each other. Flame overcomes this restriction by decoupling the composition of a machine learning job from its deployment and management
Flame also supports a diverse of set of topologies (e.g., central, hierarchical, vertical, hybrid, etc.) and different backend communication protocols (e.g., mqtt, point-to-point) out of the box. Developing new FL mechanisms for new topologies is possible with no modification of Flame’s SDK, which offers a plug-and-play capability.
Algorithms and Mechanisms
This document is a good starting point. The project supports a single box test/dev environment called FIAB (Flame In A Box). Users can set it up in a MacOS or Linux (e.g., Ubuntu 20.04). For other OS platforms, a Linux VM can be used.
Jaemin Shin (PhD student at KAIST, Korea, Internship period: Mar-present 2023)
Gustav Baumgart (MS student at University of Minnesota, Internship period: Dec-present 2023)
Ganghua Wang (PhD student at University of Minnesota, Internship period: May-Aug 2022)
He worked on a problem of group bias under federated learning setting and proposed a globally fair training method named FedGFT for proper group-based metrics.
Dhruv Garg (Master student at Georgia Institute of Technology, Internship period: May-Aug 2022)
Contributed to the design and implementation of multi-cluster architecture of Flame on top of Kubernetes.
Gaoxiang Luo (PhD student at University of Pennsylvania, Internship period: Feb-Jul 2022)
Contributed to implementation of various federated learning algorithms (e.g., FedYogi, FedAdam, etc) and mechanisms (hierarchical FL, hybrid FL, and so on).
Harshit Daga (PhD student at Georgia Institute of Technology, Internship period: May-Aug 2021)
Contributed to the design and implementation of Flame’s initial architecture. Also, he worked on the topology abstraction graph (TAG) expansion algorithm to enable diverse set of topologies. He made numerous suggestions to improve the quality of the project’s codebase.