The ACPR’s guidelines on explainability: clarifications and ambiguities

Machine learning has the potential to significantly improve the effectiveness of anti-money laundering models. It could help uncover complex patterns of financial crime that current AML systems cannot detect. It could also reduce the currently high false positive rate of these systems, and/or streamline treatment of alerts.

The problem is that the most accurate models, which would be the most helpful for AML, are also not the most straightforward to explain. As the performance of an AI model increases, so does its opacity and inexplicability. A deep neural network, for example, can achieve very impressive accuracy, but the reasons behind its predictions are impossible for a human mind to fully understand. Additional tools need to be added to make educated guesses on how the predictor model reached its decision. Because of the opacity of some black box algorithms, many fear that highly accurate but inscrutable black-box models could not be used in critical bank functions such as AML, leading to an accuracy vs. explainability tradeoff.

The latest publication of the French Regulator (the ACPR) on AI governance introduces new guidance on this subject. The report provides recommendations for AI evaluation in the financial services industry, including explainability requirements. It provides a framework for assessing the level of explainability required for an AI model and presents two examples of AI use in AML processes. The ACPR’s guidelines, which were until recently open to public consultation, are the most detailed we have seen on algorithmic governance and explainability in the financial services sector.

Among other things, these guidelines address the tradeoff between explainability and prediction accuracy, and suggest innovative ways for supervisory authorities to scrutinize black-box models used by banks, including through use of “challenger models”.

What is the ACPR framework for AI explainability?

The ACPR defines four levels of explainability, from the simplest to the most detailed: “level1 – observation”, “level 2 – justification”, “level 3 – approximation” and “level 4 – replication”. The idea is that a given algorithm can be assigned a particular level of explainability based on the algorithm’s impact and the intended audience. Tailoring the explanation to the level of impact and intended audience is similar to the approach proposed in a recent paper by Beaudouin et al (2020).[1]

The audience of the explanation will be the first factor to take into account. For end consumers, a simple level 1 explanation suffices, for internal control officers, a level 2-3 explanation is needed and for auditors, a level 4 explanation would be typically required.

The second factor is the level of risk associated with “the replacement of a human-controlled process by an AI component”. We can think of this as the level of impact and potential responsibility (i.e. accountability) associated with the algorithm. If the algorithm makes a decision that raises a significant compliance risk, or a risk for individuals, the level of impact and responsibility is high. If it does low-impact work (e.g. identifying targets for a marketing campaign) it is quite risk-free. For AML, the risk will generally be a compliance risk — for example the risk of a false negative in a sanctions/asset freeze scenario.

Clarifications brought by the new guidelines

  • Who needs what explanation?

The ACPR framework stresses the importance of context in the design of an explainability solution, including the importance of the audience. However, it goes even further and also specifies the needs of various specific populations. This is a significant clarification as it was uncertain before who could be satisfied with a simple – and partial – explanation and who would require an in-depth one. For example, in the ACPR’s approach the people likely to review the algorithm’s predictions such as compliance officers are not the ones requiring the highest level of explanation. In the AML example, they can be satisfied with a level 1-2 explanation. This means they do not necessarily have to be data scientists or to have a data science background. More controversially, the ACPR indicates that end consumers such as citizens impacted by the algorithm only need the lowest explanation level. This point is certainly not always the case, as we will see later in the post.

  • How to implement the explanation? When does it intervene?

Facing the novelty of AI deployment, financial institutions may feel confused on how and when to implement explainability.  For the ACPR, explainability intervenes for the model validation, for the monitoring of the deployed model, for corrective maintenance and for periodic reviews such as internal or external audits. The diagram of section 4 of the ACPR’s document doesn’t specify the need to integrate explainability in the design and training stage. Therefore, the ACPR does not want to impose pre-modelling or modelling explanatory approaches, leaving open the possibility to use post-hoc methods.

  • AI models for AML do not always need to be 100% explainable

Level 4, the highest level of explainability in the ACPR’s approach, does not correspond to a requirement of full transparency: “explanation level 4 (replication) aims to identically reproduce the model’s behaviour, and not to understand its inner workings in their fullest detail – which may prove impossible for certain models, typically deep neural networks.” That means even black boxes could potentially satisfy this criteria. This is a major clarification, because many compliance professionals have questioned whether black box models can be used at all for critical bank functions. For the ACPR, the black-box nature of the algorithm is not the key determinant, but whether instead the algorithm can meet the four key performance criteria (data management, explainability, performance, stability) defined by the ACPR, and whether sufficient governance mechanisms are in place.

An example in the report illustrates that explainability constraints do not necessarily undermine the introduction of a black box model. The AI use case concerns the filtering of transactional messages to determine whether or not a transaction involves a person on a sanctions list. A machine learning algorithm is injected to distribute alerts directly to the appropriate level of review. In this example, the algorithm used is a neural network, known to be very difficult to explain. However, in this case, the algorithm does not take sensitive decisions and the associated risks are low. According to the ACPR, “there is no requirement to motivate the decisions made by the algorithm which impacts an individual.”

Limits of the ACPR framework

The ACPR is the first to admit that its four-level system of explainability is an over-simplification, and that a given algorithm and situation may require more than one level of explanation. Each explanation will have to take multiple factors into account, including legal cases that determine what level of explanation is required for a given situation. Our recent paper on AI-based AML systems and fundamental rights discusses a legal case in the Netherlands that links explainability to fundamental rights, finding that individuals have a right to understand the model even if its recommendations are validated by humans[2]. Legal explanation requirements go way beyond the GDPR, and judges will not feel bound by the ACPR’s four-level classification.  Nevertheless, the ACPR’s approach has the merit of focusing attention on the right factors that will dictate the required level of explanation.

However, regardless of these admitted simplifications, a few points in the ACPR framework for explainability could be adjusted. First, the ACPR underestimates in our view the need for an explanation for an end user, such as an individual impacted by the algorithm. As mentioned above, a Netherlands’ Court recently stated that individuals have the right to understand how an AI model works and how it affects them. Second, the ACPR does not develop specifications on the storage and access to the data used for explainability and evaluation purposes. Explainability may require storage of decision logs, which could potentially raise significant costs for financial institutions along with potential frictions with GDPR data disposal principles. Third, the ACPR does not discuss explainability needs of other regulators, such as the CNIL (the French data protection authority), who may want to verify that AI algorithms deployed by banks do not discriminate against certain categories of the population.

The ACPR report on AI governance also raises other questions that are unrelated to explainability issues. For example, the report discusses AML use cases where AI adds relatively little to existing systems. A discussion of more audacious AI use cases, for example AI to detect new criminal patterns, would have been appreciated. Also, the ACPR suggests the possibility of testing banks’ models, yet this appears difficult to set up in practice, due to the multiplicity of AI use cases.

 

 

_____________________________________________________________________

By Astrid Bertrand, Télécom Paris – Institut Polytechnique de Paris

_____________________________________________________________________

[1] Beaudouin, Valérie and Bloch, Isabelle and Bounie, David and Clémençon, Stéphan and d’Alché-Buc, Florence and Eagan, James and Maxwell, Winston and Mozharovskyi, Pavlo and Parekh, Jayneel, Identifying the ‘Right’ Level of Explanation in a Given Situation (May 13, 2020). Available at SSRN: https://ssrn.com/abstract=3604924.

[2]  Are AI-based systems compatible with European fundamental rights?, W. Maxwell, A. Bertrand, X. Vamparys (2020)