Incident 103: Twitter’s Image Cropping Tool Allegedly Showed Gender and Racial Bias
Suggested citation format
Twitter‘s algorithm for automatically cropping images attached to tweets often doesn’t focus on the important content in them. A bother, for sure, but it seems like a minor one on the surface. However, over the weekend, researchers found that the cropping algorithm might have a more serious problem: white bias.
Several users posted a lot of photos to show that in an image that has people with different colors, Twitter chooses to show folks with lighter skin after cropping those images to fit its display parameters on its site and embeds. Some of them even tried to reproduce results with fictional characters and dogs.
If you tap on these images, you’ll see an uncropped version of the image which includes more details such as another person or character. What’s odd is that even if users flipped the order of where dark-skinned and light-skinned people appeared in the image, the results were the same.
However, some people noted that there might be other factors than the color of the skin. And they who tried different methods found inconsistent results.
Twitter’s Chief Design Officer (CDO), Dantley Davis, said that the choice of cropping sometimes takes brightness of the background into consideration.
In a thread, Bianca Kastl, a developer from Germany, explained that Twitter’s algorithm might be cropping the image based on saliency — an important point or part in an image that you’re likely to look at first when you see it.
Probably Twitters Crop algorithm is a pretty simple Saliency. We will see… pic.twitter.com/q4R0R8h3vh
— Bianca Kastl (@bkastl) September 20, 2020
Her theory is backed by Twitter’s 2018 blog post that explained its neural network built for image cropping. The post notes that earlier, the company took facial detection into account to crop images. However, that approach didn’t work for images that didn’t have a face in them. So the social network switched to a saliency-based algorithm.
[Read: Are EVs too expensive? Here are 5 common myths, debunked]
Even if Twitter’s algorithm is not ‘racist,’ enough people have posted examples showing the algorithm appears biased towards lighter skin tones, and the results are problematic.. The company definitely needs to do some digging into their algorithm to understand the bias in its neural network. Anima Anandkumar, Director of AI research at Nvidia, pointed out that the saliency algorithm might be trained using eye-tracking of straight male participants, and that would insert more bias into the algorithm.
Recording straight men where their eyes veer when they view female pictures is encoding objectification and sexualization of women in social media @Twitter No one asks whose eyes are being tracked to record saliency. #ai #bias https://t.co/coXwngSjiW
— Prof. Anima Anandkumar (@AnimaAnandkumar) September 20, 2020
Twitter spokesperson Liz Kelly tweeted that the firm tested the model and didn’t find any bias. She added that the company will open-source its work for others to review and replicate. It might be possible that Twitter has ignored some factors while testing, and open-sourcing the study might help them find those blind spots.
The company’s Chief Technology Officer, Parag Agarwal, said that the model needs continuous improvements and the team is eager to learn from this experience.
Light skin bias in algorithms is well documented in fields ranging from healthcare to law enforcement. So large companies like Twitter need to continuously work on their systems to get rid of it. Plus, it needs to start an open dialog with the AI community to understand its blind spots.
A study of 10,000 images found bias in what the system chooses to highlight. Twitter has stopped using it on mobile, and will consider ditching it on the web.
LAST FALL, CANADIAN student Colin Madland noticed that Twitter’s automatic cropping algorithm continually selected his face—not his darker-skinned colleague’s—from photos of the pair to display in tweets. The episode ignited accusations of bias as a flurry of Twitter users published elongated photos to see whether the AI would choose the face of a white person over a Black person or if it focused on women’s chests over their faces.
At the time, a Twitter spokesperson said assessments of the algorithm before it went live in 2018 found no evidence of race or gender bias. Now, the largest analysis of the AI to date has found the opposite: that Twitter’s algorithm favors white people over Black people. That assessment also found that the AI for predicting the most interesting part of a photo does not focus on women’s bodies over women’s faces.
Previous tests by Twitter and researcher Vinay Prabhu involved a few hundred images or fewer. The analysis released by Twitter research scientists Wednesday is based on 10,000 image pairs of people from different demographic groups to test whom the algorithm favors.
Researchers found bias when the algorithm was shown photos of people from two demographic groups. Ultimately, the algorithm picks one person whose face will appear in Twitter timelines, and some groups are better represented on the platform than others. When researchers fed a picture of a Black man and a white woman into the system, the algorithm chose to display the white woman 64 percent of the time and the Black man only 36 percent of the time, the largest gap for any demographic groups included in the analysis. For images of a white woman and a white man, the algorithm displayed the woman 62 percent of the time. For images of a white woman and a Black woman, the algorithm displayed the white woman 57 percent of the time.
On May 5, Twitter did away with image cropping for single photos posted using the Twitter smartphone app, an approach Twitter chief design officer Dantley Davis favored since the algorithm controversy erupted last fall. The change led people to post tall photos and signaled the end of “open for a surprise” tweets.
The so-called saliency algorithm is still in use on Twitter.com as well as for cropping multi-image tweets and creating image thumbnails. A Twitter spokesperson says excessively tall or wide photos are now center cropped, and the company plans to end use of the algorithm on the Twitter website. Saliency algorithms are trained by tracking what people look at when they look at an image.
Other sites, including Facebook and Instagram, have used AI-based automated cropping. Facebook did not respond to a request for comment.
Accusations of gender and race bias in computer vision systems are, unfortunately, fairly common. Google recently detailed efforts to improve how Android cameras work for people with dark skin. Last week the group Algorithm Watch found that image-labeling AI used on an iPhone labeled cartoon depictions of people with dark skin as “animal.” An Apple spokesperson declined to comment.
Regardless of the results of fairness measurements, Twitter researchers say algorithmic decisionmaking can take choice away from users and have far-reaching impact, particularly for marginalized groups of people.
In the newly released study, Twitter researchers said they did not find evidence that the photo cropping algorithm favors women’s bodies over their faces. To determine this, they fed the algorithm 100 randomly chosen images of people identified as women, and found that only three centered bodies over faces. Researchers suggest this is due to the presence of a badge or jersey numbers on people’s chests. To conduct the study, researchers used photos from the WikiCeleb dataset; identity traits of people in the photos were taken from Wikidata.
The Twitter paper acknowledges that by limiting the analysis to Black or white or male and female comparisons, it can exclude people who identify as nonbinary or mixed race. Researchers said they had hoped to use the Gender Shades dataset created to assess the performance of facial recognition systems based on skin tone, but licensing issues arose.
Twitter published the study on the preprint repository arXiv. A Twitter spokesperson said it had been submitted to a research conference to be held in October.
Twitter research scientists suggest that the racial bias found in the analysis may be a result of the fact that many images in the WikiCeleb database have dark backgrounds and the saliency algorithm is drawn to the higher contrast of photos showing people with light skin against a dark background. They also suggest that dark eye color on light skin played a role in saliency algorithms favoring people with light skin.
Coauthors of the paper come from Twitter’s ML Ethics, Transparency, and Accountability (META) Team, which Twitter launched last month. Rumman Chowdhury, founder of algorithm auditing startup Parity and a former adviser to tech companies and governments, directs the team.
In a blog post last month, Twitter said it created the team to take responsibility for Twitter’s use of algorithms, provide transparency into internal decisionmaking about AI that impacts hundreds of millions of people, and hold the company accountable. Some questions remain about how the META team will operate, such as who makes the final call about whether Twitter uses certain kinds of AI.
The Twitter spokesperson said cross-functional teams decide what actions are taken on algorithms, but did not address the question of who has final authority to decide when a form of AI is considered too unfair for use.
In the coming months, META plans to assess how the Twitter home page recommendation algorithms treat certain racial groups and how Twitter’s AI treats people based on political ideologies.
The creation of the META team came amid questions about the independence and viability of ethical AI teams in corporate environments. In a move that has since led AI groups to turn down funding and thousands of Googlers to revolt, Google and former Ethical AI team lead Timnit Gebru parted ways in December 2020. In an interview shortly after, Chowdhury said the episode has consequences for responsible AI and the entire AI industry.
As Chowdhury pointed out earlier this year, there are many ways to define an algorithm audit. What’s not included in Twitter’s saliency research: analysis of data used to train Twitter’s saliency algorithm, or more detailed information about the analysis Twitter carried out before the saliency algorithm came into use.
When asked how the Twitter saliency controversy changed company policy, a company spokesperson said that the company conducts risk assessments around privacy and security, and that the META team is creating fairness metrics for the company’s model experimentation platform and standards for the ethical review of algorithms.
In October 2020, we heard feedback from people on Twitter that our image cropping algorithm didn’t serve all people equitably. As part of our commitment to address this issue, we also shared that we'd analyze our model again for bias. Over the last several months, our teams have accelerated improvements for how we assess algorithms for potential bias and improve our understanding of whether ML is always the best solution to the problem at hand. Today, we’re sharing the outcomes of our bias assessment and a link for those interested in reading and reproducing our analysis in more technical detail.
The analysis of our image cropping algorithm was a collaborative effort together with Kyra Yee and Tao Tantipongpipat from our ML Ethics, Transparency, and Accountability (META) team and Shubhanshu Mishra from our Content Understanding Research team, which specializes in improving our ML models for various types of content in tweets. In our research, we tested our model for gender and race-based biases and considered whether our model aligned with our goal of enabling people to make their own choices on our platform.
How does a saliency algorithm work and where might harms arise?
Twitter started using a saliency algorithm in 2018 to crop images. We did this to improve consistency in the size of photos in your timeline and to allow you to see more Tweets at a glance. The saliency algorithm works by estimating what a person might want to see first within a picture so that our system could determine how to crop an image to an easily-viewable size. Saliency models are trained on how the human eye looks at a picture as a method of prioritizing what's likely to be most important to the most people. The algorithm, trained on human eye-tracking data, predicts a saliency score on all regions in the image and chooses the point with the highest score as the center of the crop.
In our most recent analysis of this model, we considered three places where harms could arise:
- Unequal treatment based on demographic differences: People on Twitter noted instances where our model chose white individuals over Black individuals in images and male-presenting images over female-presenting images. We tested the model on a larger dataset to determine if this was a problem with the model.
- Objectification biases, also known as “male gaze”: People on Twitter also identified instances where image cropping chose a woman’s chest or legs as a salient feature. We tested the model on a larger dataset to determine if this was a systematic flaw.
- Freedom to take action: An algorithmic decision doesn't allow people to choose how they'd like to express themselves on the platform, resulting in representation harm.
How did we test it and what did we find?
To quantitatively test the potential gender and race-based biases of this saliency algorithm, we created an experiment of randomly linked images of individuals of different races and genders. (Note: In our paper, we share more details around the tradeoffs between using identity terms and skin tone annotations in our analysis.) If the model is demographically equal, we'd see no difference in how many times each image was chosen by the saliency algorithm. In other words, demographic parity means each image has a 50% chance of being salient.
Here’s what we found:
- In comparisons of men and women, there was an 8% difference from demographic parity in favor of women.
- In comparisons of black and white individuals, there was a 4% difference from demographic parity in favor of white individuals.
- In comparisons of black and white women, there was a 7% difference from demographic parity in favor of white women.
- In comparisons of black and white men, there was a 2% difference from demographic parity in favor of white men.
We also tested for the “male gaze” by randomly selecting 100 male- and female-presenting images that had more than one area in the image identified by the algorithm as salient and observing how our model chose to crop the image. We didn't find evidence of objectification bias — in other words, our algorithm did not crop images of men or women on areas other than their faces at a significant rate. Here’s what we found:
- For every 100 images per group, about three cropped at a location other than the head.
- When images weren't cropped at the head, they were cropped to non-physical aspects of the image, such as a number on a sports jersey.
We qualitatively considered the saliency algorithm within the fairness in ML literature, including those on technological harms to society. Even if the saliency algorithm were adjusted to reflect perfect equality across race and gender subgroups, we’re concerned by the representational harm of the automated algorithm when people aren't allowed to represent themselves as they wish on the platform. Saliency also holds other potential harms beyond the scope of this analysis, including insensitivities to cultural nuances.
What actions are we taking?
We considered the tradeoffs between the speed and consistency of automated cropping with the potential risks we saw in this research. One of our conclusions is that not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people.
In March, we began testing a new way to display standard aspect ratio photos in full on iOS and Android — meaning without the saliency algorithm crop. The goal of this was to give people more control over how their images appear while also improving the experience of people seeing the images in their timeline. After getting positive feedback on this experience, we launched this feature to everyone. This update also includes a true preview of the image in the Tweet composer field, so Tweet authors know how their Tweets will look before they publish. This release reduces our dependency on ML for a function that we agree is best performed by people using our products. We’re working on further improvements to media on Twitter that builds on this initial effort, and we hope to roll it out to everyone soon.
We want to thank you for sharing your open feedback and criticism of this algorithm with us. As we discussed in our recent blog post about our Responsible ML initiatives, Twitter is committed to providing more transparency around the ways we’re investigating and investing in understanding the potential harms that result from the use of algorithmic decision systems like ML. You can look forward to more updates and published work like this in the future.
How can you be involved?
We know there's a lot of work to do and we appreciate your feedback in helping us identify how we can improve. Tweet us using the hashtag #AskTwitterMETA. You can also access our code and our full academic paper is available on arXiv here.
Twitter has laid out plans for a bug bounty competition with a difference. This time around, instead of paying researchers who uncover security issues, Twitter will reward those who find as-yet undiscovered examples of bias in its image-cropping algorithm.
Back in April, Twitter said it would study potential “unintentional harms” created by its algorithms, beginning with its image-cropping one. It started using the algorithm in 2018 in an attempt to focus on the most interesting parts of images in previews. Some users criticized how Twitter handled automated cropping, claiming that the algorithm tends to focus on lighter-skinned people in photos.
"In May, we shared our approach to identifying bias in our saliency algorithm (also known as our image cropping algorithm), and we made our code available for others to reproduce our work," Twitter wrote in a blog post. "We want to take this work a step further by inviting and incentivizing the community to help identify potential harms of this algorithm beyond what we identified ourselves."
Twitter says this is the "industry’s first algorithmic bias bounty competition" and it's offering cash prizes of up to $3,500. Rumman Chowdhury, director of Twitter's Machine Learning Ethics, Transparency and Accountability team, wrote in a tweet that the company is running the contest "because we believe people should be rewarded for identifying these issues, and we can’t solve these challenges alone." The winners will be announced at a Twitter-hosted DEF CON AI Village workshop on August 8th.
Twitter's first bounty program for AI bias has wrapped up, and there are already some glaring issues the company wants to address. CNET reports that grad student Bogdan Kulynych has discovered that photo beauty filters skew the Twitter saliency (importance) algorithm's scoring system in favor of slimmer, younger and lighter-skinned (or warmer-toned) people. The findings show that algorithms can "amplify real-world biases" and conventional beauty expectations, Twitter said.
This wasn't the only issue. Halt AI learned that Twitter's saliency algorithm "perpetuated marginalization" by cropping out the elderly and people with disabilities. Researcher Roya Pakzad, meanwhile, found that the saliency algorithm prefers cropping Latin writing over Arabic. Another researcher spotted a bias toward light-skinned emojis, while an anonymous contributor found that almost-invisible pixels could manipulate the algorithm's preferences
Twitter has published the code for winning entries.
The company didn't say how soon it might address algorithmic bias. However, this comes as part of a mounting backlash to beauty filters over their tendency to create or reinforce unrealistic standards. Google, for instance, turned off automatic selfie retouching on Pixel phones and stopped referring to the processes as beauty filters. It wouldn't be surprising if Twitter's algorithm took a more neutral stance on content in the near future.
Did our AI mess up? Flag the unrelated incidents