Agenda

PhD defense Victor Letzler: Multiple Choice Learning from Ambiguous Signals

Thursday 24 March, 2026, at 14:30 (Paris time) at Télécom Paris

Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi 5 and in videoconferencing

Jury

  • Reviewer: François Fleuret, Professor and Scientist, University of Geneva and Meta (FAIR), Switzerland
  • Reviewer: Alain Rakotomanonjy, Principal Researcher, Criteo AI Lab, France
  • President: Rémi Flamary, Professor, Ecole Polytechnique (CMAP), Institut Polytechnique de Paris, France
  • Examiner: Irina Rish, Professor, University of Montreal (MILA), Canada
  • Thesis Director: Gaël Richard, Professor, Télécom Paris (LTCI), Institut Polytechnique de Paris, France
  • Thesis Advisor: Andrei Bursuc, Senior Scientist and Deputy Scientific Director, Valeo.ai, France
  • Invited: Mathieu Fontaine, Associate Professor, Télécom Paris (LTCI), Institut Polytechnique de Paris, France
  • Invited: Slim Essid, Senior Scientist, NVIDIA, France

Abstract

Machine-learning-based predictive systems are faced with limitations when the data is ambiguous. If a one-to-many relationship between inputs and outputs exists, a single prediction may not be sufficient, and multiple predictions may be required. In this context, practical constraints often lead one to produce a small set of representative samples from the conditional output distribution using a trained neural network.

Learn more

Multiple Choice Learning (MCL) addresses this by using a multi-head network that outputs one hypothesis per head and is trained with a Winner-Takes-All (WTA) scheme. While MCL has already been applied to numerous tasks, its probabilistic interpretation is not fully understood yet. Furthermore, MCL is known to suffer from limitations, such as overconfidence and collapse.
To mitigate overconfidence at inference time, where hypotheses that correspond to rare events tend to be over-represented, we learn score heads to predict the probability of each scenario. We show that the resulting model can be interpreted as a geometry-aware conditional density estimator with truncated kernels, by viewing MCL as a quantization method. We validate this on synthetic data and sound event localization.
To prevent collapse during training, where only a few heads are selected and the others are not updated, we used deterministic annealing, which enhances the exploration of the hypothesis space during training through a temperature parameter. This is validated on synthetic datasets, where we observe phase transitions: performance suddenly improves at predictable temperature levels.
With these tools, we demonstrate the applicability of MCL for sequence modeling. We applied MCL to general time-series and motion forecasting, showcasing the performance of MCL at a light computational cost. Finally, we applied MCL to language modeling. We show how MCL can be adapted in the context of Large Language Models fine-tuning with multiple low-rank adapters. We demonstrate that the method can capture modes in synthetically generated mixtures of Markov chains. We then apply our method to audio and visual captioning, as well as machine translation, showing that our method achieves high diversity and relevance in generated outputs.