[Ideas] When AI learns to listen: sound revolution and machine listening

When AI learns to listen: sound revolution and machine listening

Gaël Richard, professor at Télécom Paris, audio processing specialist and Hi! Paris scientific co-director, March 2025

Machine listening is currently experiencing a meteoric rise in popularity. This field of research, based on the intersection of machine learning and deep learning, applies to a wide variety of sound signals for uses as varied as speech synthesis, sound source separation and automatic recognition of instruments and voices.

While its applications are becoming more widely available to the general public, they are rooted in complex scientific advances, driven in particular by figures such as Gaël Richard, who was awarded an European ERC grant in 2022 for his Hi-Audio project. This ambitious programme aims to develop hybrid approaches combining signal processing and deep learning to analyse and understand sounds with unrivalled precision.

Hi-Audio (Hybrid and Interpretable Deep Neural Audio Machines) is an European Research Council “Advanced Grant” (AdG) project supported by the European Union’s Horizon 2020 research and innovation program under Grant Agreement-101052978.

Interview by Isabelle Mauriac, in French with English subtitles

Hybrid deep models and asynchronous recording platform

References

Victor Deng, Changhong Wang, Gael Richard, Brian McFee. Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025, Hyderabad, India. ⟨hal-04904470v2⟩
Louis Bahrman, Mathieu Fontaine, Gael Richard. A Hybrid Model for Weakly-Supervised Speech Dereverberation. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025, Hyderabad, India. ⟨hal-04931672⟩
Xiaoyu Bie, Xubo Liu, Gaël Richard. Learning Source Disentanglement in Neural Audio Codec. 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Apr 2025, Hyderabad, India. ⟨hal-04902131⟩
Manvi Agarwal, Changhong Wang, Gael Richard. F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, Hyderabad, India. ⟨hal-04935674⟩
Teysir Baoueb, Xiaoyu Bie, Hicham Janati, Gael Richard. WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion. 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2024), Sep 2024, London, UK

When AI learns to listen: sound revolution and machine listening

<img decoding="async" class="alignright size-full wp-image-215493" src="https://www.telecom-paris.fr/wp-content-EvDsK19/uploads/2025/05/loupe-revue-picto-lisere.png" alt="picto loupe revue (avec liseré)" width="46" height="46" />References

References