Maria Boritchev, Associate Professor, Télécom Paris, Institut Polytechnique de Paris

ContactContact

Équipe de recherche :Research Team:
Signal, Statistique et Apprentissage (S2A)Signal, Statistics and Learning (S2A)
Laboratoire :Laboratory:
Laboratoire Traitement et Communication de l'Information (LTCI)Information Processing and Communication Laboratory (LTCI)
Département :Department:
Image, Données, Signal (IDS)Image, Data, Signal (IDS)

Repères biographiquesShort Biography

Maria Boritchev is an associate professor in the Signal, Statistics and Learning (S2A) team in the Information Processing and Communications Laboratory (LTCI) of Télécom Paris. She is a computational linguist, interested in working on all things that are related to linguistics of human-generated data. She works on semantics and meaning representation of natural languages (mainly French, English, but also Russian, Italian, Polish, Spanish, Chinese) and she has lately been specialising in Abstract Meaning Representation.

After studying at the ENS Lyon and completing a Master of Science in Natural Language Processing at the Université de Lorraine, she defended a PhD in Computer Science, entitled Modeling dialogues in a dynamic theory of types, in November 2021, at the Loria, under the supervision of Maxime Amblard and Philippe de Groote. After her PhD, she was a postdoctoral researcher at the Mathematical Institute of the Polish Academy of Sciences, in Warsaw, Poland, and then in Orange Labs, Lannion, France.

Activités : enseignement, recherche, projetsActivities : Teaching, Research, Projects

Research activities:

Mapping and exploring ethics of AI: an interdisciplinary work of a team of sociologists, computer scientists, and computational linguists. Collection and analysis of charters, documents and manifestos about « ethical AI », and their usage as a defining corpus for AI ethics topics of interest. More information here: https://mapaie.telecom-paris.fr/
AMR annotation of the DinG corpus: annotation of a semantic corpus in French. Annotation of the DinG corpus of transcriptions of spontaneous dialogues between players of Catan (https://gitlab.inria.fr/semagramme-public-projects/resources/ding/), in Abstract Meaning Representation (AMR), one of the most popular meaning representation frameworks. Since AMR does not cover some features of dialogue dynamics, extension of the framework to better represent spoken language as well as sentence structures specific to French. Annotated corpus here: https://zenodo.org/records/16638515 and browsable version here: https://semantics.grew.fr/?corpus=ding-01
Error analysis, questions, and AMR: conduction of an error analysis of AMR parsers’ outputs in order to characterise these errors and come up with pre- and post- processing heuristics to patch these. Data in English, French, German, Spanish, Italian, and Polish. Analysis of how adding few sentence-type specific annotations can steer the model to improve parsing in the case of questions in English.

Teaching activities:

Télécom Paris:

- [CSC_5AI01_TP] Logics and Symbolic AI
- [CSC_5DS25_TP] Natural Language Processing and Sentiment Analysis
- [PDV_5DA05_TP] Softskills seminar
- [APM_0EL06_TP] Bases de l’apprentissage : approche algorithmique
- [APM_4AI12_TP] Machine Learning for Text Mining
- [BGD709] Données du Web
- [BGDIA701] Statistiques
- [IA717] Natural Language Processing

CPES Lycée International de Palaiseau:

- [SD1] Data Science first year

PublicationsPublications

Interrogation du serveur HAL en cours...Waiting for HAL server...