AAAI-22 outstanding student paper01 March 2022
Pierre Colombo, PhD in computer science at Télécom Paris, is a post-doctoral researcher at CNRS/MILA.
Chloé Clavel is a professor in emotional computing at Télécom Paris.
Pablo Piantadida is an associate professor in information theory at CentraleSupélec.
Abstract: Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.
Header image source Kjpargeter/Freepik