PhD defense Victor Quetu: From Weights to Layers: Deep Neural Network Compression for Efficient Inference
Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi 2 and in videoconferencing
Jury
- Vincent Gripon, Research Director, IMT Atlantique, France (Reviewer)
- Holger Fröning, Full Professor, Universität Heidelberg, Germany (Reviewer)
- Giovanni Iacca, Associate Professor, Università di Trento, Italy (Examiner)
- Florence d’Alché-Buc, Full Professor, Télécom Paris, France (Examiner)
- Gaël Richard, Full Professor, Télécom Paris, France (Thesis Director)
- Enzo Tartaglione, Full Professor, Télécom Paris, France (Thesis Co-Supervisor)
Abstract
Deep learning models continue to grow in depth and computational cost, yet modern inference pipelines remain constrained by latency, memory, and energy budgets. This thesis investigates where redundant computation hides in overparameterized architectures, and how to remove it safely. We first analyze the Sparse Double Descent phenomenon and show how aggressive sparsification can paradoxically enhance generalization.
We characterize this behavior and propose regularization and distillation-based approaches supported by an entropy-based metric. Building on this metric, we introduce three families of depth-reduction strategies: entropy-based pruning (EGP, EASIER), BatchNorm-guided layer collapse (TLC), and Optimal Transport–based inductive regularization (LaCoOT). Together, these methods reduce up to 70% of network depth across CNNs, Transformers, and diffusion models, often with minimal performance degradation, and sometimes even gains in accuracy. Finally, we extend the notion of redundancy to the operand by proposing FOLDER, a training-free token-pruning module that accelerates multimodal LLMs by up to 2.4x with preserved or improved performance.
Collectively, these contributions advance the understanding of redundancy in deep networks and propose general strategies for improving inference efficiency, paving the way toward more sustainable and adaptive deep learning models.