A new way to make black box models interpretable

Recent advances in Artificial Intelligence (AI) have provided us with very powerful predictive models, which can outperform humans in certain tasks. These models usually come with one major flaw: a lack of transparency. This means that it is not possible to understand their decision-making process.

In their recent work, Nedeljko Radulovic and his advisors Albert Bifet and Fabian Suchanek from Télécom Paris introduce a new method to interpret these opaque models.

The rapid improvements in computing power in recent years and the large amounts of available training data have led to very powerful predictive models. These models take as input a dataset (say, a set of data records about patients), and a label for each data point (say, what illness each patient was diagosed with). After the model has been trained on this dataset, it can predict the label also for unknown datapoints (i.e., diagnose illnesses also in other patients). These models are so powerful that they can often perform a task better than a human.

The main drawback of these models is that the models are opaque: A human cannot understand how the model arrived at a certain conclusion. This poses fundamental problems in areas where decisions have to be explainable: in finance, medicine, justice, traffic etc. We thus have very powerful techniques at our disposal, but we cannot use them, because they are opaque.

Several approaches try to provide an explanation for a model result a posteriori, i.e., they try to come up with reasons that could explain why the model predicted a certain label. For example, an explanation for a prediction of “obesity” could be that the patient’s weight is larger than this specific value, while at the same time the height is lower than a specific value. This line of research is known as “Explainable AI” (xAI).

In their latest work, the authors systematically investigate the criteria that such explanations have to fulfill in order to be satisfactory. These are:

  • high fidelity: the explanations should correspond to what the opaque model predicted
  • low complexity: the explanations should be simple, i.e., have not too many conditions
  • high confidence: all datapoints concerned by the explanation should carry the same label.
  • high generality: the explanation should apply to many datapoints.

A user study with real and synthetic explanation shows that these criteria are indeed what users prefer. The authors then develop a method for xAI that targets exactly these desiderata. The central idea is to train several decision trees. These are simple predictive models that classify a datapoint according to a learned set of (human-interpretable) conditions. The crucial idea is to train not one such tree (as previous works have done), but several. In this way, the method can choose for each datapoint the explanation that maximizes the 4 desiderata. The method is called STACI (Surrogate Trees for A posteriori Confident Interpretations).

The training algorithm ensures that each tree overestimates data points belonging to one class optimizing at the same time confidence and generality criteria for a given class.

Experiments and the user study show that STACI provides explications that outperform other state-of-the-art approaches. For further information on this work, please consult the paper and the Github repository.



By Nedeljko Radulovic, Fabian Suchanek, Albert Bifet, Télécom Paris, Institut Polytechnique de Paris