Data sharing issues in the finance industry

The 3rd session of AI and finance with the ACPR and Télécom Paris tackled data sharing issues in the finance industry.

David Bounie, Economics Professor at Télécom Paris, introduced the seminar by highlighting the increasing reliance of companies on data to create market power. Data sharing is therefore critical to avoid anti-competitive practices. This seminar looked at the opportunities and constraints to share data as well as the technical approaches to make collaborations possible.

1st Roundtable: Share data for the general public interest

Olivier Fliche, Director of the Fintech-Innovation Department of the ACPR, opened the first roundtable of this webinar about the public interest incentives to share data.

As a first speaker, we had the pleasure to listen to Kristen Alma, Policy Analyst at the Financial Action Task Force. The FATF is the Global money laundering and terrorist financing watchdog that sets international standards for AML/CFT (Anti-Money Laundering and Counter Financing Terrorism). Ms. Alma first presented the opportunity of data pooling for AML-CFT. By combining data from different sources and parties, financial institutions can rely on a more complete data set and increase the performance of their models in detecting money-laundering schemes that typically involve multiple financial institutions in multiple countries. The FATF published a detailed report on data sharing by private entities in 2017. Ms. Alma then presented the first results of the FATF’s survey on the technologies being deployed by banks for data sharing. This report, that is to be made public by June 2021, highlights the recent nature of data sharing considerations for institutions and the conflict occurring between data sharing needs and privacy requirements.  It also lists the technologies being considered to overcome this barrier (Homomorphic encryption, Zero-knowledge proofs, Secure-multiparty computation …). The FATF is planning a second phase of study to look at ways and environments to foster data sharing with privacy constraints.

Next, we welcomed Bertrand Pailhès, Director of Innovation and Technology at the CNIL, the French authority on data protection, as the second speaker of the roundtable. Mr. Pailhès presented the new trends of the 21st century on data sharing issues, including increased transparency, data-driven business models and advanced analytics feeding on large amounts of data. He gave an overview on the ways data protection authorities address these new issues and develop data sharing initiatives.

In particular, he presented four scenarios for data sharing in which data produced outside the scope of the public service would be valuable to fulfill missions of general interest:

  • “mandatory private open data”,
  • “enhanced general private data”,
  • “data reuse platform” and,
  • “citizen portability”.

He also pointed out that companies are usually reluctant to share data, and that sectoral approaches are more effective to foster collaborations.

The last panelist to join this roundtable was Antoine Dubus, Economist at ECARES at the Free University of Brussels. He talked us through the economic principles of data sharing. He began by presenting data as a competitive asset, quoting Cremer, de Montjoye, Schweitzer (2019):  “The competitiveness of firms will increasingly depend on timely access to relevant data and the ability to use that data to develop new, innovative applications and products.” Firms with a dominant data position therefore have an undeniable advantage, but some even go further and create exclusionary practices by preventing data access to other competitors (ex. Facebook in 2017). As a result only few dominant firms are left in charge of shaping market competition. In this context, data pooling and sharing can help enhance market competition and boost cooperation to pursue general public interest goals such as combating fraud. In order to comply with data protection regulation at the same time, Antoine Dubus explains it will require more coordination between data protection agencies and competition authorities.

2nd roundtable: Technical approaches to sharing of sensitive data

The second phase of this webinar aimed at presenting a few technical approaches that make data sharing possible while preserving privacy constraints. Leading the debate of this session was Laurent Dupont, Senior Data Scientist at the ACPR. He introduced our first speaker Dr. Catuscia Palamidessi, Researcher at the LIX-Inria lab of Ecole Polytechnique, who unrolled the concepts underlying differential privacy methods.

Dr. Palamidessi first outlined the trade-off occurring between privacy and the notion of utility. In order to develop accurate models, that can be useful for society, algorithms need data to train, sometimes at the expense of privacy. What we’re looking for is therefore mechanisms that optimize this utility-privacy trade-off. One is differential privacy (DP). Consider a dataset that comprises Bob’s and Alice’s data. The 𝞮-DP mechanism introduces a little bit of static noise 𝞮 such that you are still able to exploit the data outcomes (data is still useful), but you are no longer able to differentiate between Bob and Alice inputs. Standard Differential Privacy (aka central model) is the typical architecture of the DP mechanism: you first regroup all your datasets into one and then apply an 𝞮-differentially private mechanism. Local Differential Privacy (LDP) is another architecture used by Apple and Google for instance, in which each individual data input n is made private with a privacy level 𝞮n, and the collected dataset is the grouping of these noised inputs. Catuscia Palamidessi also explained the k-Randomized Response (kRR) and the d-privacy (d stands for distance) mechanisms and finally presented her hybrid approach that exhibits a better trade-off privacy-utility than LDP.

The next speaker of this technical roundtable was Sandrine Murcia, CEO and Co-Founder at Cosmian, an enterprise software editor that provides confidential collaborative data processing secured by cryptography. She explained Cosmian’s approach : providing companies with a tool that incorporates advanced cryptography techniques from research, to leverage findings from data pooling with other parties. More precisely, Cosmian provides two ways to make collaborative computing. The first one is when companies want to launch collaborative computing over protected data without revealing the underlying owned data. The technique used by Cosmian in that case is Fully Encrypted Data Processing. The second one is for companies that want to move their data to another environment like the cloud to make computations there, but don’t want their data nor their requests to be shared to this environment. Multi-Party Computation (MPC) would be used there.

Laurent Dupont then welcomed Maxime Agostini, CEO and Co-Founder at Sarus, an enterprise aimed at helping companies extract value from any sensitive data using differential privacy. Maxime began by outlining the reasoning behind the use of differential privacy. The first thought that comes to mind to guarantee anonymity of the data is to simply remove PII (Personally Identifiable Information). For instance, you would remove the names, birthdates, addresses, of the clients in a Netflix database. However, by doing simply that, you are left with the unique parts of the data, in our example, the unique ordered list of the films watched by the clients, which makes it possible to get back to the identity of the clients. This led scientists to look for a better, mathematical definition of anonymous information: differential privacy. This definition guarantees that anonymous information cannot lead back to information on any given individual, no matter what the attacker already knows. Finally, Maxime shared the idea that they like to put forward at Sarus: forget about sharing data, even anonymous, the most important is to share knowledge.

All the presentation decks of the panelists are available here, and the replay of the entire webinar including roundtable discussions is displayed as well.

Watch the replay

Useful links to complete the reading: