PhD defense François Amat: Mining Patterns on Tabular Data
Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi 3 and in videoconferencing
Jury
PhD Advisors
- Prof. Fabian Suchanek, Télécom Paris
- Prof. Pierre-Henri Paris, Université Paris-Saclay
Reviewers
- Prof. Fatiha Sais, Université Paris-Saclay
- Prof. Pierre Senellart, École normale supérieure
Examiners
- Prof. Bernd Amann, Sorbonne Université
- Prof. Nathalie Pernelle, Université Sorbonne Paris Nord
Abstract
This thesis addresses the gap between what relational database schemas declare and the richer set of cross-table rules that actually govern real-world data. It introduces MATILDA, the first deterministic system capable of mining expressive first-order tuple-generating dependencies (FO-TGDs) with multi-atom heads, existential witnesses, and recursion directly from arbitrary relational databases, using principled, database-native definitions of support and confidence.
MATILDA uncovers FO-TGDs; hidden business rules, workflow constraints, and multi-relation regularities that schemas alone cannot capture, while ensuring reproducible results through canonicalized search and tractable pruning guided by a constraint graph. To understand when simpler formalisms suffice, the thesis also presents MAHILDA, a relational Horn-rule baseline equipped with disjoint semantics to prevent self-justifying recursion. Overall, the work shows that expressive rule mining on realistic databases is both feasible and insightful, enabling more systematic, explainable, and schema-grounded analyses of complex relational data.