Reliability & Robustness

Image recognition systems and confidence indicators

A trustworthy classifier should be able to indicate whether its prediction is reliable or not, e.g. when it is preferable to refrain using the model’s prediction and give the hand to a human.

Several cases should necessitate an abstention at test time:

  • in-distribution outliers,
  • data coming from a rare class,
  • out- of-distribution data and
  • highly noisy data.

It is important to stress that each of these cases may call for a different pre or post processing: for instance, if a subpopulation is underrepresented at training time, a sample from this subpopulation can be considered at test time as an in-distribution outlier and should cause an alarm.

The problem is well identified and related to biased training samples. Out-of-distribution data may result from the drift of a distribution (covariate shift) and requires some form of transfer learning. Highly noisy data may be passed through a denoiser before entering into the decision process.


For each of these cases we will develop test scenarii and evaluation metrics to properly measure the local reliability of classifiers with a reject option.

A special attention will be paid on the counterpart of local reliability, which is local robustness.

A model should also be robust to a small transformation of the input whether it is adversarial or it results from the acquisition context. A too stringent abstention mechanism will severely damage accuracy. An abstention mechanism should also respect some form of robustness up to some level.

Methods for model’s prediction reliability divide into two categories:

  • methods that capture the model’s confidence about a prediction, once the model is trained and use it a posteriori to determine rejection and,
  • methods that learn a predictive model with a reject function, referred here as abstention by design.


A promising framework consists of incorporating a reject option within the design of the predictive model, either as an additional “reject” class or more interestingly as a separate function. This framework takes its roots in the early works of iiChow (1970) that emphasized that prediction with rejection is governed by the trade- off between error rate and rejection rate, e.g. accuracy versus coverage. However learning a classifier with an additional reject class boils down to focus only on the boundary between real classes.


Many works (Corbiere et al. 2019, iWang et al. 2017 ) have reported that using the confidence measure reported by the classifier itself relying on posterior probabilities provided by neural networks (even calibrated) fails to cover all the cases listed above. A higher confidence measure from the model itself does not necessarily imply a higher probability that the classifier is correct (Nguyen et al. 2015).


A more general framework, learning with abstention (LA), has emerged from the work of iiiCortes and col. (2016) that differs from the confidence-based approach. In LA, a pair of functions, one for prediction, one for abstention, are jointly learned to minimize a loss that takes into account the price of abstaining and the price of misclassification. This approach developed so far in the case of large margin classifiers, boosting and structured prediction (Garciaiv et al. 2018) has been explored very recently in the context of deep neural networks (Croce et al. 2018) while the recent ConfidNet (Corbiere et al. 2019) also combine close ideas with confidence estimation. Interestingly the loss function proposed in LA has not been yet re-visited under the angle of robustness by design nor tested on the different abstention cases listed above.

Based on the general framework developed, we will design a learning with abstention approach devoted to deep neural networks with new criteria leveraging robustness as well as reliability.

Robustness of face encoding to blurry and non-face images

DCNNs for face encoding learn how to project images of faces in a high dimension space. This projection should not put all the blurry images or non-faces in the same part of the space or it would lead to high impostor scores for images from this category. For noisy images (blur, occlusion …), the influence on DCNN has been widely studied like in Hendrycks et al (2019).

However, for face recognition, which is a projection problem and not a classification problem, the influence of blur on the position in the N-dimension and loss function to constrain this position has not been widely studied.