Télécom PhD researcher speaks about algorithmic bias in the United States

As artificial intelligence becomes increasingly commonplace in industries that affect all facets of everyday life, AI experts and watchdogs are calling attention to the inherent biases present in the different stages of AI projects, and for good reason. It is no secret that that discrimination is a widespread problem in countries like the United States, where many of the algorithms and data sets used in mainstream AI are developed.

I grew up in the United States a Hispanic black immigrant – unfortunately, the racial division, xenophobia, and daily injustices were impossible to ignore and often overwhelming. My experiences in the United States inform much of my work and motivate me to take on the challenge of combating discrimination in my field. These issues seep into the world of AI, often in the most innocuous ways.

Inadvertent discrimination can occur in prison sentencing, medical diagnoses and treatment, bank loans and insurance through decisions made by AI algorithms – all have an immense impact in the lives of everyday citizens. Discrimination by an algorithm – albeit unintentional – is still discrimination, and it unfairly penalizes groups of people that are already systemically marginalized and disadvantaged.

One of the main reasons that racial bias occurs in AI predictions is that it is present in the data itself. The reason the infamous “smart” soap dispenser did not recognize darker-skinned hands is because the data set used to train the algorithm did not include enough images of darker-skinned people to properly register them.

An algorithm for validating loan applications that uses historical data would penalize black Americans because it reflects historical decisions made by bank loan officers with personal biases.

An algorithm that determines prison sentences would heavily penalize young black men because they are overwhelmingly sentenced more often and harsher than their white peers arethis scenario has already occurred in the United States. The model makes decisions without accounting for the effects of segregation, cultural vilification, and personal prejudice could have on the decisions of the juries or judges involved in the cases used for the training data.

When analyzing employee data within a company, machine learning models may find gender, age, and parenthood to be the most determinant factors when analyzing employee absenteeism. The model works only with the data it receives. It does not know that worldwide, mothers still traditionally carry the daily burdens of raising children, from doctor’s visits to school events, to the constant work of running a home. It is a data scientist’s job to not only create the model, but also properly interpret the results. Someone who is not aware of the social issues behind the historical data could improperly interpret the results of the models as “mothers are more absent,” and be tempted to adjust the company’s recruitment policies accordingly. Such an adjustment would violate employment and non-discrimination laws, but the point is that without an ethical filter, conclusions from machine learning models can amplify prejudices and imbalances already present in society and in the data. The tech world in the west is still overwhelmingly white and male but we can no longer plead ignorance as an excuse. Without intervention, we risk perpetuating the worst of society’s ills.

Already in Europe, certain personal characteristics, such as race, ethnic background, medical information, and political affiliation are officially designated sensitive data, and their usage is strictly controlled by the General Data Protection Regulation (GDPR) passed in 2016. These characteristics align closely with those protected by Article 14 of the Human Rights Act passed in 1998.  In many cases, this sensitive data is not necessary for a machine-learning model, and GDPR requires us data scientists to not use it at all in these instances. But what happens when sensitive data is relevant to the subject at hand, or if sensitive data is inferred by the model from other data? Our safest bet is to assume that there is inherent bias in the data, and automatically take steps to remedy it.

Fortunately, researchers all over the world, including here at Télécom Paris, have dedicated themselves to finding solutions to these shortcomings in AI. Explainable AI and correcting algorithmic bias are fields of research that aims to render AI algorithms transparent and its conclusions ethical and justifiable.  We aim not only to improve the algorithms we use, but also to establish proper awareness and good practice guidelines for data scientists in all industries and fields to help make the future of AI ethical and equitable.


Par Dilia Carolina Olivo, Institut Polytechnique de Paris


Dilia Carolina Olivo PhD Student in Explicability of artificial intelligence in financial security

Can you describe your educational background? Why did you choose this path ? And why France in particular?

I’ll preface this by saying that there was no set path to where I am today. I wish I could say that I always dreamed of doing a PhD in Economics, but that wouldn’t be true. Truth be told, I started this whole thing thinking I would become a radiologist.

Like many other nerdy immigrant kids in the United States, my career options were decided for me by my parents. I would be either a doctor or a lawyer when I grew up. While in high school, I decided that lawyers were sleazy (clearly not a fair assessment, but in my defense, I was in high school), so therefore I must have become a doctor. Except that once I began reading about the machines doctors used – MRI machines in particular – I became fascinated by the physics behind the technology. So when I got to Stanford, I declared physics as major as soon as I was allowed. I quickly developed a passion for astrophysics, and I dreamt of doing a PhD in astrophysics so I could study the universe and maybe one day becoming a professor to teach what I’ve learned to other curious minds.

So what am I doing here ?

The truth is that, for various reasons I won’t get into here, I never built up the confidence to apply for that physics PhD, and even up to the very end of senior year, I had no idea what to do after college. End of senior year, I still hadn’t finished my degree on time, my dream of going to graduate school was dead and buried, and though job prospects in the Bay Area were generally good for a Stanford graduate, I had become disillusioned with start-up culture in Silicon Valley. I felt it was trite, insincere, and manipulative (again, not the most fair assessment, but not unfounded). I was actually just afraid I wasn’t smart or competitive enough to make it in tech. In short, I had become jaded, which for a self-declared optimist like me is the ultimate nightmare. To make things worse, this was 2015, and racial conflict in the US was becoming overwhelming for those of us emotionally invested in ending racial injustice. The start of the presidential primaries that summer did not help the situation.

And so I found myself that summer with no degree, no job, completely lost, and losing faith in a country that had treated me so well. You can imagine my state of mind when I received an email – a real-life deus ex machina – informing me that I’d been accepted to participate in a program to teach English in France, and not just in France, I was going to Paris ! Suddenly I found myself buying a one-way trip to Charles de Gaulle airport and making a Pinterest board of places to visit in France.

By why France ?

This question deserves a longer answer – there are definitely cultural and moral factors at play – but in the interest of time, I’ll focus simply on my genuine curiosity regarding the French approach to problem-solving, in particular as a foil to the American mindset. Americans tend to have a « go-go-go » mentality to solving problems – try something, anything, and see if it sticks. If it doesn’t work, try something else. Just get in and fix it, dammit. And nine times out of ten, someone will be creative, innovative, or ingenious enough to find the answer, but the actual reasons for the problem in the first place are not discussed or understood. The French on the other hand, will debate an issue to within an inch of its life, and nine times out of ten, after much (and I mean much) debate, there is still no solution on the table. However, thanks to the constant debating and corrections and philosophizing, everyone leaves the table with a much more thorough understanding of the root causes of the problem at hand.

I personally believe that there is an optimal balance between these two approaches – a juste milieu, you might say. I knew that whatever work I ended up doing, I wanted my thought process to function in this space.

So I got to work finding work.

As it turns out an American four-year bachelor’s is not enough to find a job in STEM, I had to get a master’s to do that here in France. A friend of mine from my physics class back in California seemed to be doing well working as a data scientist with only a physics degree so I thought I could give data science a shot. At the time, it was only starting to gain ground, even in the U.S., so it seemed like a good field to apply the analytical thinking I’d learned as a physics major. But I had to first get a « masters spécialisé » in Data Science at business school (my dream of going to grad school came true, just not in the way I thought), which required a six-month internship. I didn’t really know what industry I wanted to work in, and accepted an interview for an internship a friend of mine recommended me for, in the newly-formed Data Lab at La Banque Postale.

I never envisioned working in a bank. I don’t really like banks. But I understood that La Banque Postale provided a necessary service to the most vulnerable among us, and I could genuinely support their mission. Most importantly, my boss was going to be an exceptionally good-natured and intelligent woman. It was a win all around for me.

That is how I found myself working on anti-money laundering algorithms for La Banque Postale, and how I was eventually offered this post to conduct research on the subject.

What is your PhD topic? What are the issues and applications?

In recent years, the concept of data privacy has gone mainstream – Facebook in the U.S. and Cambridge Analytica in the U.K. are two of the most well-known names, but the truth is that data science and « big data » have become increasingly ubiquitous in nearly every industry. Everyone from your grocery store to your bank is using your data in some way or another.

EU regulators have been some of the most reactive to the backlash against data-farming and have imposed restrictions to the kind of data that can be used by companies. « Sensitive data » – or data that denotes race, gender, age, and political status – is very strictly regulated. More recently, attention is being paid to the algorithms themselves and our ability to interpret them. It’s no surprise that the algorithms used in detecting fraud and money laundering are incredibly complex and uninterpretable to a human. In the industry we call these types of algorithms « black boxes » – you put your data in and the predictions pop out, but you don’t really know what is happening in between. The opaque nature of these algorithms poses problems down the line when regulators ask for the exact reasoning behind a particular decision, especially if the model uses sensitive data.

My research is exploring the economic and social costs and benefits associated with increased or decreased transparency of these algorithms and the associated fluctuations in performance. In particular, I’m focusing on the AI algorithms used in anti-money laundering and anti-terrorism financing operations and the economic and social implications particular to the success of these activities.

A bank’s decision to freeze a small business’s account, for example, can have serious financial implications, and if AI informed that decision in any way, the bank should be able to thoroughly defend themselves. However, making these algorithms more transparent such that a human can interpret them usually means decreasing their performance, which can mean the difference between detecting fraud or a money laundering scheme in the first place. For La Banque Postale, a mistake like that two years ago cost them 50 million euros in fines, and a terrorist cell received a large sum of money to further their operations.

The goal would to find the optimal balance between explainability and performance for different use cases in anti-money laundering operations, such that banks, regulators, and clients alike are satified.

What are your research interests?

Working out the « cost » of explainability – so to speak – transcends multiple fields. There are implications on fundamental human rights and equality, not just laws and regulations. The solution requires exploration of ethical, political, and economic theories, as well as technical developments. To me, this question touches on a basic philosophical idea – the social contract. This is a question of collective safety or financial benefit at the cost of individual privacy or social discrimination and vice versa. The specific field to which we are applying these questions might be rather niche and technical, but the concepts and global and the implications large. I hope my research can follow these questions further and spread awareness of these ideas in the data science and AI community.

What are your plans for the future?

I can’t pretend I know where I’ll be and what I’ll be doing. History has taught me that plans are more like guidelines, anyways. I do know that I want to stay in France, and I hope to delve deeper into the economic and regulatory policy around data science, ethical AI, or even anti-terrorism work. I still dream of teaching someday, of course.



Photo by GR Stocks on Unsplash