Agenda

PhD defense Alisa Barkar: Automatically interpreting LLM judgments using linguistic insights: case of public speaking

Wednesday 10 December, 2025, at 10:00 (Paris time) at Télécom Paris

Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi 3 and in videoconferencing

Jury

  • Thomas François, Associate Professor, Université catholique de Louvain, Belgium (Reviewer)
  • Vincent Guigue, Professor, AgroParisTech, France (Reviewer)
  • Iris Eshkol-Taravella, Professor, Université Paris Nanterre, Laboratoire MoDyCo UMR 7114, France (Examiner)
  • Lynda Tamine Lechani, Professor, Université Toulouse III – Paul Sabatier, France (Examiner)
  • Mathieu Chollet, Senior Lecturer, University of Glasgow / IMT Atlantique (LS2N), United Kingdom / France (Co-supervisor)
  • Chloe Clavel, Senior Researcher, Inria (ALMAnaCH), France (Thesis supervisor)

Invited members:

  • Matthieu Labeau, Assistant Professor, Télécom Paris (LTCI), France (Invited co-supervisor)
  • Beatrice Biancardi, Associate Professor, CESI (LINEACT), France (Invited co-supervisor)

Abstract

Generative AI is increasingly integrated into everyday tasks, and Large Language Models (LLMs) now power commercial tools that claim to evaluate public speaking and provide personalised feedback. Yet their judgments remain poorly understood, especially in subjective domains where “ground truth” is hard to define. This thesis examines how LLMs evaluate public speaking and which linguistic cues shape their decisions.

Learn more

First, we propose a criterion-based framework for textual public speaking assessment, grounded in theory and expert practice, and we build a French corpus annotated in detail by a professional coach. Second, using this corpus, we introduce a three-step protocol to evaluate LLMs: analysis of systematic evaluative biases and self-consistency, agreement with expert annotations, and case studies of model-generated explanations. Third, we design an extended set of automatically extractable linguistic features to interpret LLM evaluations and compare them with expert preferences.

Our results show that LLMs favour neutral or positive judgments, exhibit low agreement with the expert, and overemphasise surface-level emotional and structural cues, whereas the expert relies more on global rhetorical organisation, topic framing, and effective openings and conclusions. Overall, LLMs implement a distinct persuasion model that only partially overlaps with expert practice, with important implications for research on automatic public speaking assessment and for commercial training systems that deploy LLMs “off the shelf” to evaluate communication skills.