PhD defense Aël Quélennec: Energy and Memory-Efficient AI for On-Device Learning
Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi Estaunié and in videoconferencing
Jury
- Ngoc-Son Vu, Lecturer, ENSEA, Reviser
- Dewey Yi, Associate Professor, University of Aberdeen, Reviser
- Amel Bouzeghoub, Professor, Télécom SudParis, Examiner
- Umut Şimşekli, Researcher, INRIA, Examiner
- Van-Tam Nguyen, Professeur, Télécom Paris (LTCI), Thesis Director and Examiner
- Pavlo Mozharovskyi, Professeur, Télécom Paris (LTCI), Thesis Co-Supervisor and Examiner
- Enzo Tartaglione, Associate Professor, Télécom Paris (LTCI), Thesis Co-Supervisor and Invited Member
- Vito Paolo Pastore, Assistant Professor, MaLGa, Invited Member
Abstract
On-device learning enables neural networks to continuously adapt on edge devices, offering enhanced privacy, reduced latency, and improved energy efficiency. However, limited memory and computational resources pose significant challenges, particularly during backpropagation. This thesis addresses these bottlenecks through two complementary approaches: strategic subnetwork selection for efficient fine-tuning and activation map compression for memory-efficient training.
The second line of work addresses the activation memory bottleneck in backpropagation through tensor decomposition-based compression. We propose compression using High-Order Singular Value Decomposition (HOSVD) with controlled information loss and convergence guarantees. To overcome HOSVD’s computational overhead, we develop ASI (Activation Subspace Iteration), which leverages activation map stability. By performing rank selection once before training and utilizing single subspace iterations with warm starts, ASI achieves significant memory reduction (up to 120× compression) and speedup (up to 91× faster) while maintaining comparable performance.
Theoretical contributions include formal analysis of fine-tuning dynamics, convergence guarantees for compressed activation training, and complexity analysis of tensor decomposition methods. Extensive validation across diverse architectures, datasets, and real-world scenarios including Raspberry Pi implementations demonstrates practical effectiveness.