As ChatGPT enthralls the public with promises of limitless potential, there is no better time to talk about what AI tools might also erase. Enter Joy Buolamwini, an MIT researcher, poet, and emphatic social activist, who studies how artificial intelligence inserts bias into its results. Her research focuses on what she calls the “coded gaze.” For example, she found AI used by IBM, Facebook, and Face++ did fairly well at identifying genders from looking at a face — but was more successful when those images were of people who were lighter-skinned and male. That certainly creates an equity bias, but it’s a business problem too, Buolamwini believes.
“With over-representation, you might be masking important problems,” she says. “You could also be overshadowing key insights that can give you an edge to know your data and to know your customers.”
Signal360’s John Battelle spoke with Buolamwini at Signal2021 about her work, watch here or read this lightly-edited conversation below:
As we discussed, artificial intelligence is a major force in society and only growing. So it’s good to know that there are people who are really keeping you know their focus on the impact of AI, and we have one of those people with us next. I’m pleased to welcome as our next guest, Joy Boulamwini, a computer scientist and a digital activist from the MIT Media Lab. She founded the Algorithmic Justice League, which is an organization looking to challenge bias in decision-making software like AI. She is featured in the documentary, “Coded Bias,” which can be seen on Netflix and I suggest you all check it out. We spoke earlier this summer, as Joy has been very focused on new research in the past few weeks. Joy, welcome to the Signal stage.
Hi, I’m Boulamwini, the founder of the Algorithmic Justice League, an Olay brand ambassador and also a poet of code. I tell stories that make daughters of diasporas dream and sons of privilege pause. Oftentimes when I say I’m a poet of code, I get a puzzled look. People wonder how is it that I blend research with art. So I want to start this talk with a poem called “AI, Ain’t I A Woman?” which is both a spoken word piece and an algorithmic audit that shows algorithmic bias.
So as you see in the video that just played, AI is not all that neutral, and it reflects what I call the coded gaze. So you might have heard of the male gaze or the white gaze or the Post-colonial gaze. Well to that lexicon, I add the coded gaze, and it reflects the priorities, preferences, and at times prejudices of those who have the power and privilege to shape technology. In this clip that’s rolling, you’ll see that my own encounter with the coded gaze came when I literally had to put a white mask on my dark-skinned face to have my face detected while working on a project as a student at MIT.
I shared this experience of coding and white bias on the TED platform. I thought people might check my claim. So let me check myself. So I took my TED profile image, ran it through many different AI systems, some didn’t detect my face, and others misgendered me. This actually gave me something quite in common with the women of Wakanda. You see, when I ran their faces, either they weren’t detected, or they were misgendered. Or if you see the red column, the age guesses weren’t quite on track. So it’s algorithmically verified, Black don’t crack. But beyond this being a bit amusing, where I became concerned was the use of facial recognition technologies by law enforcement. We see what the case of Robert Williams, who was arrested in front of his two young girls, held in jail for around 30 hours, that there are real world consequences with facial identification, mismatches. It’s not just people like Robert Williams.
We really have to think about all of our faces when you have companies like ClearView AI, scraping billions of photos from social media, that can be searched by all kinds of entities, including law enforcement. So I like to say if you have a face, you have a place in this conversation. Not just because of law enforcement, but because algorithmic decision making is infiltrating so many parts of our lives. So deciding if you get hired or fired, even what kind of medical treatment you might have access to, where your kids go to school. Because so much is at stake, this is why I started the Algorithmic Justice League to build a movement towards more equitable and accountable AI, to mitigate the risk of these technologies.
For this talk, I want to focus on my own research, and lessons we can learn from the face space, as we think towards mitigating the risk of artificial intelligence. So before diving in deep, I want to give you a little bit of an overview of terms you might hear me using. I’m going to be talking about different kinds of facial analysis tasks. Just think of various ways in which machines can read faces and images or videos. And there are a few fundamental questions that are being asked. The first question is, is their face this falls under face detection. And the coding in a white mask that was a face detection fail. Now you might think of what kind of face think of the women of Wakanda different face attributes. So you might try to guess the age or gender my own work focuses on. guessing gender, gender classification.
Then we get to the big behemoth, which is known as facial recognition, technically speaking, and it comes in two flavors. So we have facial verification, one-to-one matching, so think about your opening up an iPhone, perhaps. Then the big one, which is one-to-many facial identification. Those are two forms of facial recognition. Technically speaking, the work that I do, and the work I’ll speak to covers a range of different tasks. But regardless of the task, the question often remains the same, how accurate is a particular system at accomplishing a specific task? What I’ve learned in my research is accuracy is relative, and we really have to question gold standards.
Case in point in 2014, there was much rejoicing in the computer vision field. Why? Well, Facebook using some faces that might have been borrowed from you, created and released a paper that use some of the most advanced techniques to detect faces in images and then to actually verify these faces. On their particular benchmark the gold standard at the time, they achieved 97% accuracy, which was a major leap from prior techniques. But again, we have the question gold standards.
When we look at this gold standard, we see that it was majority male and majority white, or what I like to call, pale male data. So if we are measuring progress with data or benchmarks that don’t reflect the rest of society we’re actually destined to exclude and destined to discriminate. We really have to be asking does our data reflect the world.
As I learned about the need to be more inclusive, I also started looking at how I might curate more inclusive data. And this informed my MIT research called, Gender Shades. With Gender Shades, I decided to test the accuracy of AI systems that try to guess the gender of a face. To begin, I had to create a dataset that was a little less male and a little less pale. I was able to achieve that. And I labeled that dataset with binary gender labels, not because gender is binary, but because those are the labels the companies were using. I also labeled by the skin type, not the race. Of course skin can definitely correlate with arbitrary racial categories. But since I was focused on computer vision, I wanted a more objective measure. This was as close as I could get. I could finally get to my research question, how accurate are different companies when it comes to guessing the gender of a face.
If we look at the overall accuracy, which is oftentimes where analysis stops, we see that Microsoft got an A, 94%, IBM, maybe a B, at around 88%. And Face++, a billion-dollar tech company in China, depending on the grading curve at a 90%, an A or B. Where it gets interesting is starting to do the breakdown. When we break it down by gender, we see that male-labeled faces had higher accuracy than female-labeled faces for all of the companies evaluated. When we break it down by skin type, we see that lighter faces had higher accuracy overall than darker-skinned faces.
Then I took the analysis a step further learning from Kimberly Crenshaw, foundational work on the importance of moving beyond six single axis analysis, so we can get a fuller picture of how different technologies or different policies impact different groups of people. What do we see here with the intersectional analysis? Well, perfection is possible, 100% for the pale males, not so much for everybody else. We see that the worst case here are darker females, and these are the good results.
Moving on to China with Face++, in this case, we see that darker male faces had the higher performance just slightly. It’s also important to point out that we have to look at each system individually and not assume the trends are the same. What does remain the same is worse performance here for darker females. Then we moved to IBM, pale males take the lead again, darker females last place, a little switcheroo with darker males, and lighter females.
I decided to share these results with the companies and I got a range of responses from no response, to Microsoft and IBM coming back to us. I want to focus on IBM for time. What we saw with IBM is that when I presented the paper, they actually released a new system that internally they reported had much better accuracy. We did an external evaluation, and we had different results. But the overall trend was an improvement. I point this out to say that, yes, we want to do internal evaluation, but external evaluation can also give us important insights as well.
What are some key lessons to take away from the study that might inform some of the work you do? One, intersectionality matters, right? Aggregate statistics can mask important information. I should certainly not be representing all of humanity. Another major lesson is change as a matter of priority. The laws of physics did not change between when I sent IBM, the first results and when they released the new systems,. What did change was making these issues more of a priority. We also saw that companies can choose not to proceed with specific technologies. Three of the companies that we audited actually stepped away from selling facial recognition to the police in varying degrees.
I do want to end this by leaving you with some inclusion imperatives that draw on what we’ve learned from our work with facial analysis technologies, and in particular, the Gender Shade series of work. The first imperative is to dare to ask. Dare to ask uncomfortable questions about who’s being excluded, about what kind of have biases are lurking in the products and services you have, and also ask if the systems are being evaluated for harms. The other thing is to dare to ask intersectional questions. With over-representation, you might be masking important problems. You could also be overshadowing key insights that can give you an edge to know your data and to know your customers.
Finally, we want to dare to listen to silenced voices. We also want to understand why people aren’t being heard. For AJL, one of the ways we dare to listen is we have Bias In The Wild reports that are submitted to us.
One I’d like to share with you is a AJL Bias In The Wild report, where it goes, ‘A friend of mine working at some large tech company, won’t be named, had issues being recognized with their facial recognition system for their teleconference solution. And while there were some units that worked on her face, she had to specifically reserved those rooms.’
So this is something I like to call the exclusion overhead. It’s not that without some extra effort, you can’t get a system to work. But the fact you have to put in that extra effort, that extra overhead, can definitely impede your ability to contribute and also lead to more of a feeling of exclusion. That’s also something I would highly encourage all of you to do, is to think about how you might reduce the exclusion overhead that may exist wherever you find yourself. I wish we had more time, but we are going to get into some questions and answers. I just want to end by saying, please, if this work resonates with you, learn more about the Algorithmic Justice League. Follow us on social media. And if you’re have Netflix, check out “Coded Bias. Thank you for your time.
Thank you so much. Do you have time for a question or two?
Awesome. You’ve made so much progress. You even got the biggest platforms involved in this AI-driven facial recognition to respond to your research and improve their results. Do you hold out hope that this problem can be solved? Entirely or at least in parity? And if so, what are the steps that need to be taken to get there?
I think it really depends on what we view as the problem. Because if we think solving for problems around algorithmic bias, or algorithmic risk just are focused on accuracy, we missed the point. Because how these systems are used as just as important as how well they work. When we look at the company stepping away from selling facial recognition technologies, even if you had accurate systems right here, now you can create a world with mass surveillance. You can think of drones with guns with facial recognition. The question isn’t just how accurate is it? It’s should we have it in the first place? The normative questions. Really, we do have to be thinking about what questions were addressing and why.
As far as can we fix issues of algorithmic bias, I try to think of bias mitigation as opposed to bias elimination. We’re humans, as long as humans are involved, there will be problems. So I’ve been thinking about an approach more like algorithmic hygiene. You wouldn’t just floss once shower once, maybe in 2020. But 2021 and beyond, you want something a bit more continuous. So is there affirmative consent? Do people have a voice and a choice in using these systems, particularly in more of the private sector deployment? Is there meaningful transparency? So we actually have a good understanding of the limitations and the capabilities. What we were seeing with some of the benchmarks, they gave us a false sense of progress. So we didn’t actually truly have a good understanding of those limitations. Then we need continuous oversight. If it’s out there, and systems fell, you want to know that that is going on, instead of having the Algorithmic Justice League descend upon you in some in some way. So think about what does it look like to have affirmative consent? What does it look like to have meaningful transparency? And what does it look like to have continuous oversight? So we are looking at processes and not thinking just end products.
One of the pillars of the conference, is government and policy and we’re exploring the intersection of that with business. It’s clear that that has become a pretty high-priority issue for a lot of it businesses to understand where policy might be going. Because increasingly, it’s intersecting with business realities. Do you think there is a policy remedy here at a federal level? We’ve already seen some policy steps being taken by cities, municipalities, around the issue of facial recognition. Do you have ideas for what policies might make sense from a larger government point of view?
Yes. I actually had the opportunity to present some of my research at a few congressional hearings and give very specific recommendations there. I’m also encouraged to see the Facial Recognition and Remote Biometric Moratorium Act of 2020. Because I think it takes a very important precautionary approach, which is to say, we’ve already seen the ways in which these systems fell, we’ve already seen the real world harms. Let’s put a moratorium for now on high risk uses, as we sort through what could be appropriate and what’s off bounds. Because right now, we’re not clear on the red lines, nor do we have guidelines. And it’s too risky, there’s too much at stake to be operating in that kind of environment. I’ve also done a much deeper dive into what we can learn from the FDA when it comes to thinking about governance of some of these highly complex systems. But we need to understand the various risk levels, and we also definitely want a situation where you verify before you deploy.
There we go. Absolutely. Finally, a short question here, but I’ve seen the film, I highly recommend it. What is the best way for our audience to go check out “Coded Bias”?
If you have a Netflix login, search “Coded Bias,” it should come up there. Or if you want to host a screening, you can go to Women Make Movies or Codedbias.com, we’ll have that information on how to do a watch party. You can always visit the Algorithmic Justice League AJL.org, as well, we have our own little breakdown of the “Coded Bias” film.
Thank you Joy so much for joining us at Signal that was really enlightening, and we appreciate your work.
Absolutely. And I would be remiss if I didn’t point out that I’m also an ambassador for Olay, and I’ve truly appreciated that opportunity, because part of thinking through how we address these problems is thinking about who is able to address them. Part of the goals of actually closing the STEM gap, getting more women at the table, getting more women of color, more people of color, is so crucial to these issues. I’m really excited to continue that partnership and we have something really big related to algorithmic bias and inclusive and just AI coming out in the fall. So please stay tuned.
We very much look forward to that. Again, thanks so much for being with us. Joy.