Agenda

ICE seminar: « From Markov to Laplace: A Markovian Tale of Large Language Models »

Thursday 15 Jan. 2026, 14.00 (Paris time), Télécom Paris, amphi 7 and online

Abstract

Large Language Models based on transformers and state-space models (Mamba) are at the forefront of recent breakthroughs in a variety of disciplines, including natural languages. Despite their remarkable empirical success, our theoretical understanding of how these models learn and represent sequential structure, and how in-context learning (ICL) capabilities emerge on such tasks, remains limited. This raises a fundamental question: How do modern sequence models learn from sequential data?
To address this question, in this talk I will present our new framework for a principled theoretical and empirical analysis of LLMs via Markov chains. The key idea underpinning our approach is the modeling of sequential input data as a Markov process, inspired by the Markovianity of natural languages. We utilize this framework to systematically study the interplay between the Markov order and model depth,  revealing sharp contrasts between transformers and state-space models. In particular, our analysis reveals the curious phenomena that (i) even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator, whereas transformers require two layers, and (ii) while a single layer transformer cannot efficiently represent a conditional $k$-gram model, a two layer transformer can represent it for any Markov order $k \geq 1$. Together, our results provide the first formal connection between Mamba and optimal statistical estimators, and yield the tightest known characterization of the interplay between transformer depth and Markov order for ICL. Deepening our understanding of LLMs and their ICL capabilities, we believe our framework provides a new avenue for a principled study of LLMs with plenty of interesting open questions abound, which I will discuss in the end.

 

Bio

Ashok Vardhan Makkuva is an Associate Professor of Mathematical Data Science at Telecom Paris, Institut Polytechnique de Paris. His main research vision is to build Foundations of Reliable and Interpretable AI, bridging information theory and large language models (LLMs).
Prior to joining Telecom, he was a Postdoctoral Researcher at EPFL and received his Ph.D. degree in Electrical and Computer Engineering from the University of Illinois at Urbana–Champaign (UIUC), and the B.Tech. degree in Electrical Engineering with a Minor in Mathematics from the Indian Institute of Technology Bombay. He is the co-inventor of a U.S. patent on KO codes and has delivered invited talks at Stanford University, UC Berkeley, Microsoft Research, and invited tutorials at ICTS ’25 and NeurIPS ’24.
His research has appeared in leading machine learning venues such as NeurIPS, ICLR, and ICML, and has been recognized with the DAAD AInet Fellowship, Spotlight Awards at ICLR and NeurIPS, the ACM MobiHoc Best Paper Award, the Joan and Lalit Bahl Fellowship (twice), the Sundaram Seshu International Student Fellowship, and the Qualcomm Innovation Fellowship (for two mentored students).
He is actively looking for motivated students, both Masters and PhD, for exciting new projects on AI Reasoning and Interpretable AI. Please feel free to reach out to him at ashok.makkuva@telecom-paris.fr to hear more!