Incident 59: Gender Biases in Google Translate

Description: A Cornell University study in 2016 highlighted Google Translate's pattern of assigning gender to occupations in a way showing an implicit gender bias against women.
Alleged: Google developed and deployed an AI system, which harmed Women.

Suggested citation format

Anonymous. (2017-04-13) Incident Number 59. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
59
Report Count
10
Incident Date
2017-04-13
Editors
Sean McGregor

Tools

New ReportNew ReportNew ResponseNew ResponseDiscoverDiscover

CSET Taxonomy Classifications

Taxonomy Details

Full Description

A Cornell University study in 2016 highlighted Google Translate's pattern of assigning gender to occupations in a way showing an implicit gender bias against women. When translating from non-gendered languages (ex. Turkish, Finnish), Google Translate added gender to the phrases being translated. "Historian" "Doctor" "President" "Engineer" and "Soldier" were assigned male gender pronouns while "Nurse" "Teacher" and "Shop Assistant" were assigned female gender pronouns.

Short Description

A Cornell University study in 2016 highlighted Google Translate's pattern of assigning gender to occupations in a way showing an implicit gender bias against women.

Severity

Negligible

Harm Distribution Basis

Sex

Harm Type

Harm to social or political systems

AI System Description

Google Translate, a software allowing for translations between many languages

System Developer

Google

Sector of Deployment

Information and communication

Relevant AI functions

Perception, Cognition, Action

AI Techniques

Google Translate

AI Applications

language API, language translation

Named Entities

Google Translate, Google

Technology Purveyor

Google

Beginning Date

2016-01-01T00:00:00.000Z

Ending Date

2016-01-01T00:00:00.000Z

Near Miss

Harm caused

Intent

Unclear

Lives Lost

No

Data Inputs

User entered translation requests

Incident Reports

Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the {\em status quo} for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.

Semantics derived automatically from language corpora contain human-like biases

Even artificial intelligence can acquire biases against race and gender

One of the great promises of artificial intelligence (AI) is a world free of petty human biases. Hiring by algorithm would give men and women an equal chance at work, the thinking goes, and predicting criminal behavior with big data would sidestep racial prejudice in policing. But a new study shows that computers can be biased as well, especially when they learn from us. When algorithms glean the meaning of words by gobbling up lots of human-written text, they adopt stereotypes very similar to our own.

“Don’t think that AI is some fairy godmother,” says study co-author Joanna Bryson, a computer scientist at the University of Bath in the United Kingdom and Princeton University. “AI is just an extension of our existing culture.”

The work was inspired by a psychological tool called the implicit association test, or IAT. In the IAT, words flash on a computer screen, and the speed at which people react to them indicates subconscious associations. Both black and white Americans, for example, are faster at associating names like “Brad” and “Courtney” with words like “happy” and “sunrise,” and names like “Leroy” and “Latisha” with words like “hatred” and “vomit” than vice versa.

To test for similar bias in the “minds” of machines, Bryson and colleagues developed a word-embedding association test (WEAT). They started with an established set of “word embeddings,” basically a computer’s definition of a word, based on the contexts in which the word usually appears. So “ice” and “steam” have similar embeddings, because both often appear within a few words of “water” and rarely with, say, “fashion.” But to a computer an embedding is represented as a string of numbers, not a definition that humans can intuitively understand. Researchers at Stanford University generated the embeddings used in the current paper by analyzing hundreds of billions of words on the internet.

Instead of measuring human reaction time, the WEAT computes the similarity between those strings of numbers. Using it, Bryson’s team found that the embeddings for names like “Brett” and “Allison” were more similar to those for positive words including love and laughter, and those for names like “Alonzo” and “Shaniqua” were more similar to negative words like “cancer” and “failure.” To the computer, bias was baked into the words.

IATs have also shown that, on average, Americans associate men with work, math, and science, and women with family and the arts. And young people are generally considered more pleasant than old people. All of these associations were found with the WEAT. The program also inferred that flowers were more pleasant than insects and musical instruments were more pleasant than weapons, using the same technique to measure the similarity of their embeddings to those of positive and negative words.

The researchers then developed a word-embedding factual association test, or WEFAT. The test determines how strongly words are associated with other words, and then compares the strength of those associations to facts in the real world. For example, it looked at how closely related the embeddings for words like “hygienist” and “librarian” were to those of words like “female” and “woman.” For each profession, it then compared this computer-generated gender association measure to the actual percentage of women in that occupation. The results were very highly correlated. So embeddings can encode everything from common sentiments about flowers to racial and gender biases and even facts about the labor force, the team reports today in Science .

“It’s kind of cool that these algorithms discovered these,” says Tolga Bolukbasi, a computer scientist at Boston University who concurrently conducted similar work with similar results. “When you’re training these word embeddings, you never actually specify these labels.” What’s not cool is how prejudiced embeddings might be deployed—when sorting résumés or loan applications, say. For example, if a computer searching résumés for computer programmers associates “programmer” with men, mens’ résumés will pop to the top. Bolukbasi's work focuses on ways to “debias” embeddings—that is, removing unwanted associations from them.

Bryson has another take. Instead of debiasing embeddings, essentially throwing away information, she prefers adding an extra layer of human or computer judgement to decide how or whether to act on such biases. In the case of hiring programmers, you might decide to set gender quotas.

People have long suggested that meaning could plausibly be extracted through word cooccurrences, “but it was a far from a foregone conclusion,” says Anthony Greenwald, a psychologist at the University of Washington in Seattle who developed the IAT in 1998 and wrote a commentary on the WEAT paper for this week’s issue of Science . He says he expected that writing—the basis of the WEAT measurements—would better reflect explicit attitudes than implic

Even artificial intelligence can acquire biases against race and gender

Machine learning algorithms are picking up deeply ingrained race and gender prejudices concealed within the patterns of language use, scientists say

An artificial intelligence tool that has revolutionised the ability of computers to interpret everyday language has been shown to exhibit striking gender and racial biases.

The findings raise the spectre of existing social inequalities and prejudices being reinforced in new and unpredictable ways as an increasing number of decisions affecting our everyday lives are ceded to automatons.

In the past few years, the ability of programs such as Google Translate to interpret language has improved dramatically. These gains have been thanks to new machine learning techniques and the availability of vast amounts of online text data, on which the algorithms can be trained.

However, as machines are getting closer to acquiring human-like language abilities, they are also absorbing the deeply ingrained biases concealed within the patterns of language use, the latest research reveals.

Joanna Bryson, a computer scientist at the University of Bath and a co-author, said: “A lot of people are saying this is showing that AI is prejudiced. No. This is showing we’re prejudiced and that AI is learning it.”

But Bryson warned that AI has the potential to reinforce existing biases because, unlike humans, algorithms may be unequipped to consciously counteract learned biases. “A danger would be if you had an AI system that didn’t have an explicit part that was driven by moral ideas, that would be bad,” she said.

The research, published in the journal Science, focuses on a machine learning tool known as “word embedding”, which is already transforming the way computers interpret speech and text. Some argue that the natural next step for the technology may involve machines developing human-like abilities such as common sense and logic.

“A major reason we chose to study word embeddings is that they have been spectacularly successful in the last few years in helping computers make sense of language,” said Arvind Narayanan, a computer scientist at Princeton University and the paper’s senior author.

The approach, which is already used in web search and machine translation, works by building up a mathematical representation of language, in which the meaning of a word is distilled into a series of numbers (known as a word vector) based on which other words most frequently appear alongside it. Perhaps surprisingly, this purely statistical approach appears to capture the rich cultural and social context of what a word means in the way that a dictionary definition would be incapable of.

For instance, in the mathematical “language space”, words for flowers are clustered closer to words linked to pleasantness, while words for insects are closer to words linked to unpleasantness, reflecting common views on the relative merits of insects versus flowers.

The latest paper shows that some more troubling implicit biases seen in human psychology experiments are also readily acquired by algorithms. The words “female” and “woman” were more closely associated with arts and humanities occupations and with the home, while “male” and “man” were closer to maths and engineering professions.

And the AI system was more likely to associate European American names with pleasant words such as “gift” or “happy”, while African American names were more commonly associated with unpleasant words.

The findings suggest that algorithms have acquired the same biases that lead people (in the UK and US, at least) to match pleasant words and white faces in implicit association tests.

These biases can have a profound impact on human behaviour. One previous study showed that an identical CV is 50% more likely to result in an interview invitation if the candidate’s name is European American than if it is African American. The latest results suggest that algorithms, unless explicitly programmed to address this, will be riddled with the same social prejudices.

“If you didn’t believe that there was racism associated with people’s names, this shows it’s there,” said Bryson.

The machine learning tool used in the study was trained on a dataset known as the “common crawl” corpus – a list of 840bn words that have been taken as they appear from material published online. Similar results were found when the same tools were trained on data from Google News.

Sandra Wachter, a researcher in data ethics and algorithms at the University of Oxford, said: “The world is biased, the historical data is biased, hence it is not surprising that we receive biased results.”

Rather than algorithms representing a threat, they could present an opportunity to address bias and counteract it where appropriate, she added.

“At least with algorithms, we can potentially know when the algorithm is biased,” she said. “Humans, for example, could lie about the reasons they did not hire someone. In contrast, we do not expect algorithms to lie or deceive us.”

However, Wachter

AI programs exhibit racial and gender biases, research reveals

In debates over the future of artificial intelligence, many experts think of these machine-based systems as coldly logical and objectively rational. But in a new study, Princeton University-based researchers have demonstrated how machines can be reflections of their creators in potentially problematic ways.

Common machine-learning programs trained with ordinary human language available online can acquire the cultural biases embedded in the patterns of wording, the researchers reported in the journal Science April 14. These biases range from the morally neutral, such as a preference for flowers over insects, to discriminatory views on race and gender.

Identifying and addressing possible biases in machine learning will be critically important as we increasingly turn to computers for processing the natural language humans use to communicate, as in online text searches, image categorization and automated translations.

Princeton University-based researchers have found that machine-learning programs can acquire the cultural biases embedded in the patterns of wording, from a mere preference for flowers over insects, to discriminatory views on race and gender. For example, machine-learning programs can translate foreign languages into gender-stereotyped sentences. Turkish uses the gender-neutral pronoun, "o." Yet, when the Turkish sentences "o bir doktor" (top) and "o bir hemşire" (bottom) are entered into Google Translate, they translate into English as "he is a doctor" and "she is a nurse." (Images by the Office of Engineering Communications)

"Questions about fairness and bias in machine learning are tremendously important for our society," said co-author Arvind Narayanan, a Princeton University assistant professor of computer science and the Center for Information Technology Policy (CITP), as well as an affiliate scholar at Stanford Law School's Center for Internet and Society.

Narayanan worked with first author Aylin Caliskan, a Princeton postdoctoral research associate and CITP fellow, and Joanna Bryson, a reader at the University of Bath and CITP affiliate.

"We have a situation where these artificial-intelligence systems may be perpetuating historical patterns of bias that we might find socially unacceptable and which we might be trying to move away from," Narayanan said.

As a touchstone for documented human biases, the researchers turned to the Implicit Association Test used in numerous social-psychology studies since its development at the University of Washington in the late 1990s. The test measures response times in milliseconds by human subjects asked to pair word concepts displayed on a computer screen. The test has repeatedly shown that response times are far shorter when subjects are asked to pair two concepts they find similar, versus two concepts they find dissimilar.

For instance, words such as "rose" and "daisy," or "ant" and "moth," can be paired with pleasant concepts such as "caress" and "love," or unpleasant ones such as "filth" and "ugly." People more associate the flower words with pleasant concepts more quickly than with unpleasant ones; similarly, they associate insect terms most quickly with unpleasant ideas.

The Princeton team devised an experiment with a program called GloVe that essentially functioned like a machine-learning version of the Implicit Association Test. Developed by Stanford University researchers, the popular open-source program is of the sort that a startup machine-learning company might use at the heart of its product. The GloVe algorithm can represent the co-occurrence statistics of words in, say, a 10-word window of text. Words that often appear near one another have a stronger association than those words that seldom do.

The Stanford researchers turned GloVe loose on a huge trove of content from the World Wide Web containing 840 billion words. With in this store of words, Narayanan and colleagues examined sets of target words, such as "programmer, engineer, scientist" and "nurse, teacher, librarian," alongside two sets of attribute words such as "man, male" and "woman, female," looking for evidence of the kinds of biases humans can possess.

In the results, innocent, inoffensive preferences, such as for flowers over insects, showed up, but so did more serious prejudices related to gender and race. The Princeton machine-learning experiment replicated the broad biases exhibited by human subjects who have taken select Implicit Association Test studies.

For instance, the machine-learning program associated female names more than male names with familial attributes such as "parents" and "wedding." Male names had stronger associations with career-related words such as "professional" and "salary." Of course, results such as these are often just objective reflections of the true, unequal distributions of occupation types with respect to gender — like how 77 percent of computer programmers are male, according to the U.S. Bureau of Labor Statistics.

This bias about occupations

Biased bots: Artificial-intelligence systems echo human prejudices

In the Turkish language, there is one pronoun, “o,” that covers every kind of singular third person. Whether it’s a he, a she, or an it, it’s an “o.” That’s not the case in English. So when Google Translate goes from Turkish to English, it just has to guess whether “o” means he, she, or it. And those translations reveal the algorithm’s gender bias.

Here is a poem written by Google Translate on the topic of gender. It is the result of translating Turkish sentences using the gender-neutral “o” to English (and inspired by this Facebook post).

Gender

by Google Translate

he is a soldier

she’s a teacher

he is a doctor

she is a nurse

he is a writer

he is a dog

she is a nanny

it is a cat

he is a president

he is an entrepreneur

she is a singer

he is a student

he is a translator

he is hard working

she is lazy

he is a painter

he is a hairdresser

he is a waiter

he is an engineer

he is an architect

he is an artist

he is a secretary

he is a dentist

he is a florist

he is an accountant

he is a baker

he is a lawyer

he is a belly dancer

he-she is a police

she is beautiful

he is very beautiful

it’s ugly

it is small

he is old

he is strong

he is weak

he is pessimistic

she is optimistic

It’s not just Turkish. In written Chinese, the pronoun 他 is used for “he,” but also when the person’s gender is unknown, like “they” has come to be used in English. But Google only translates into “she” when you use 她, the pronoun that specifically identifies the person as a woman. So in the case of a gender tie, Google always chooses “he.” In Finnish, the pronoun “hän,” meaning either “he” or “she,” is rendered as “he.”

In a way, this is not Google’s fault. The algorithm is basing its translations on a huge corpus of human language, so it is merely reflecting a bias that already exists. In Estonian, Google Translate converts “[he/she] is a doctor” to “she,” so perhaps there is less cultural bias in that corpus.

At the same time, automation can reinforce biases, by making them readily available and giving them an air of mathematical precision. And some of these examples might not be the most common that Turks look to translate into English, but regardless, the algorithm has to make a decision as to “he” or “she.”

At least she remains optimistic.

Google Translate's gender bias pairs "he" with "hardworking" and "she" with lazy, and other examples

So much of our life is determined by algorithms. From what you see on your Facebook News Feed, to the books and knickknacks recommended to you by Amazon, to the disturbing videos YouTube shows to your children, our attention is systematically parsed and sold to the highest bidder.

These mysterious formulas that shape us, guide us, and nudge us toward someone else's idea of an optimal outcome are opaque by design. Which, well, perhaps makes it all the more frustrating when they turn out to be sexist.

Enter Google Translate, the automated service that makes so much of the web comprehensible to so many of us. Supporting 103 languages, the digital Babel Fish directly influences our understanding of languages and cultures different than our own. In providing such an important tool, Google has assumed the responsibility of accurately translating the content that passes through its servers.

But, it doesn't always. Or, perhaps more precisely, where there exists a gray area in language, Google Translate can fall into the same traps as humans.

That seems to have been demonstrated by a series of tweets showing Google Translate in the act of gendering professions in such a way that can only be described as problematic.

"Turkish is a gender neutral language," tweeted writer Alex Shams. "There is no 'he' or 'she' - everything is just 'o'. But look what happens when Google translates to English."

The results, which he screengrabbed, are painful. "She is a cook," "he is an engineer," "he is a doctor," "she is a nurse," "he is hard working," "she is lazy," and so on.

Turkish is a gender neutral language. There is no "he" or "she" - everything is just "o". But look what happens when Google translates to English. Thread: pic.twitter.com/mIWjP4E6xw — Alex Shams (@seyyedreza) November 27, 2017

And this is not a Turkish-to-English specific problem. Taika Dahlbom shared a similar outcome when she translated Finnish to English.

Look,how @Google Translate does #sexism! #Finnish has a gender neutral third-person pronoun. But Google decides, if a job title is good to go with the male or the female English third-person pronoun. Idea: @seyyedreza in #Turkish pic.twitter.com/jU9Su0JXd5 — Taika Dahlbom (@TaikaDahlbom) November 28, 2017

So what is going on here? A Google spokesperson was kind enough to partially fill us in.

"Translate works by learning patterns from many millions of examples of translations seen out on the web," the person explained over email. "Unfortunately, some of those patterns can lead to translations we’re not happy with. We’re actively researching how to mitigate these effects; these are unsolved problems in computer science, and ones we’re working hard to address."

This explanation fits in with the general understanding that currently exists. It all comes back to those algorithms that drive machine learning-powered services across the web.

Essentially, when an untold number of biases (gender or otherwise) exist in our literature and language — biases, like, that nurses are inherently women or engineers are bound to be men — these can seep through into Google Translate's output.

We've seen this before, as recently as October. It was only last month that another Google service — the Cloud Natural Language API — was spotted assigning negative values to statements like "I'm queer" and "I'm black."

Even that wasn't a wholly new observation. An August study in the journal Science found "that applying machine learning to ordinary human language results in human-like semantic biases."

It seems that, in attempting to build an automatic translator that can approach a human in its ability, Google may have managed to pick up some rather human-like limitations along the way.

This story has been updated with a statement from Google.

Google Translate might have a gender problem

Image via Twitter

Parents know one particular challenge of raising kids all too well: teaching them to do what we say, not what we do.

A similar challenge has hit artificial intelligence.

As more apps and software use AI to automate tasks, a popular data-backed model, called "word embedding," has also picked up entrenched social biases.

The result is services like language translation spitting those biases back out in subtle but worrisome ways.

Earlier this year, for instance, examples of gender bias started cropping up on social media with Google Translate.

Try translating terms into English from Turkish, which has gender-neutral pronouns, and a phrase like o bir muhendis becomes he is an engineer, while o bir hemsire translates to she is a nurse.

Microsoft’s own translation service on Bing has a similar problem, thanks to the use of gender in grammar - think le and la in French, or der and die in German.

When Bing translates “the table is soft” into German, it offers the feminine die Tabelle, which refers to a table of figures.

“These gender associations are projected onto objects,” says Kate McCurdy, a senior computational linguist with the language-learning startup Babbel in Berlin, Germany, who discovered the bias in Bing.

“Objects that happen to be grammatically masculine are given masculine properties. It's learning to take gender stereotypes and project them into the whole world of nouns.”

Because of word embedding, a popular method of machine learning, translation algorithms are working off these biases, and so are other services like Google Search, as well as Netflix and Spotify recommendations.

“This is an approach that has taken off and is extremely widespread in the industry, and that’s why it’s so important to interrogate the underlying assumption,” says McCurdy.

Word embedding works by linking words to a vector of numbers, which algorithms can use to calculate probability. By looking at what words tend to be around other words, like “engineer,” the model can be used to figure out what other word fits best, like “he.”

The price of learning from reams of existing text and dialogue is that such models pick up the true-to-life imbalance between genders when it comes to jobs or opportunities.

A 2016 study that trained word-embedding models on articles on Google showed gender stereotypes “to a disturbing extent,” according to its researchers.

McCurdy says that there isn’t anything necessarily wrong with the word-embedding model itself, but it needs human guidance and oversight.

“The default now is to build these applications and release them into the wild and fight the fires when they come out,” she adds. “But if we were more deliberate about this and took things more seriously, we’d do more work to integrate a more critical perspective.”

Companies who are using the word-embedding model to make services for consumers also need more diverse programmers who are more likely to notice the risk of biases before they crop up.

“If we’re serious about having artificial intelligence make decisions that don’t end up with biases that we don’t want to reinforce, we need to have more diverse and critical people looking at this earlier on in the process.”

The Algorithm That Helped Google Translate Become Sexist

Recently there has been a growing concern about machine bias, where trained statistical models grow to reflect controversial societal asymmetries, such as gender or racial bias. A significant number of AI tools have recently been suggested to be harmfully biased towards some minority, with reports of racist criminal behavior predictors, Iphone X failing to differentiate between two Asian people and Google photos' mistakenly classifying black people as gorillas. Although a systematic study of such biases can be difficult, we believe that automated translation tools can be exploited through gender neutral languages to yield a window into the phenomenon of gender bias in AI. In this paper, we start with a comprehensive list of job positions from the U.S. Bureau of Labor Statistics (BLS) and used it to build sentences in constructions like "He/She is an Engineer" in 12 different gender neutral languages such as Hungarian, Chinese, Yoruba, and several others. We translate these sentences into English using the Google Translate API, and collect statistics about the frequency of female, male and gender-neutral pronouns in the translated output. We show that GT exhibits a strong tendency towards male defaults, in particular for fields linked to unbalanced gender distribution such as STEM jobs. We ran these statistics against BLS' data for the frequency of female participation in each job position, showing that GT fails to reproduce a real-world distribution of female workers. We provide experimental evidence that even if one does not expect in principle a 50:50 pronominal gender distribution, GT yields male defaults much more frequently than what would be expected from demographic data alone. We are hopeful that this work will ignite a debate about the need to augment current statistical translation tools with debiasing techniques which can already be found in the scientific literature.

Assessing Gender Bias in Machine Translation -- A Case Study with Google Translate

Google is making an effort to reduce perceived gender bias in Google Translate, it announced today. Starting this week, users who translate words and phrases in supported languages will get both feminine and masculine translations; “o bir doktor” in Turkish, for example, now yields “she is a doctor” and “he is a doctor” in English.

Currently, translations from English into French, Italian, Portuguese, or Spanish are supported. Translations of phrases and sentences from Turkish to English, as in the example above, will also show both gender equivalents. (In the Turkish language, the pronoun “o” covers every kind of singular third person.)

James Kuczmarski, product manager at Google Translate, said work has already begun on addressing non-binary gender translations.

“Over the course of this year, there’s been an effort across Google to promote fairness and reduce bias in machine learning,” he wrote in a blog post. “In the future, we plan to extend gender-specific translations to more languages, launch on other Translate surfaces like our iOS and Android apps, and address gender bias in features like query auto-complete.”

Today’s announcement comes shortly after Google blocked Smart Compose, a Gmail feature that automatically suggests sentences for users as they type, from suggesting gender-based pronouns. And it follows on the heels of social media posts purporting to show automated translation apps’ gender bias.

Users noted that words like “engineer” and “strong” in some foreign languages were more likely to be associated with corresponding male words in English — “o bir muhendis” in Google Translate became “he is an engineer,” while “o bir hemsire” was translated to “she is a nurse.”) It’s far from the only example — Apple and Google’s predictive keyboards propose the gendered “policeman” to complete “police” and “salesman” for “sales.”And when Microsoft’s Bing translates “the table is soft” into German, it comes back with the feminine “die Tabelle,” which refers to a table of figures.

It’s an AI training problem, Kuczmarski explained. Word embedding — a common algorithmic training technique that involves linking words to a vector used to calculate the probability of a given word’s language pair — unavoidably picks up, and at worst amplifies, biases implicit in source text and dialogue. A 2016 study found that word embeddings in Google News articles tended to exhibit female and male gender stereotypes.

“Google Translate learns from hundreds of millions of already-translated examples from the web,” Kuczmarski wrote. “Historically, it has provided only one translation for a query, even if the translation could have either a feminine or masculine form. So when the model produced one translation, it inadvertently replicated gender biases that already existed. For example: it would skew masculine for words like ‘strong’ or ‘doctor,’ and feminine for other words, like ‘nurse’ or ‘beautiful.'”

A gender-neutral approach to language translation is a part of Google’s larger effort to mitigate prejudice in AI systems. The Mountain View company uses tests developed by its AI ethics team to uncover bias, and has banned expletives, racial slurs, and mentions of business rivals and tragic events from its predictive technologies.

Google Translate now gives feminine and masculine translations

An experiment shows that Google Translate systematically changes the gender of translations when they do not fit with stereotypes. It is all because of English, Google says.

If you were to read a story about male and female historians translated by Google, you might be forgiven for overlooking the females in the group. The phrase “vier Historikerinnen und Historiker” (four male and female historians) is rendered as “cuatro historiadores” (four male historians) in Spanish, with similar results in Italian, French and Polish. Female historians are simply removed from the text.

In an experiment, I translated 11 occupations from one gender-inflected language to another. I analyzed 440 translation pairs to and from German, Italian, Polish, Spanish and French. Together, these languages are natively spoken by three in four citizens of the European Union.

Fitting the stereotypes

In many cases, Google changed the gender of the word in a grossly stereotypical way. “Die Präsidentin” (the female president) is rendered to “il presidente” in Italian, although the correct translation is “la presidente”. “Der Krankenpfleger” (the male nurse in German) becomes “l’infirmière” (the female nurse) in French.

In my list, shop assistant was best translated by Google, with 33 correct translations out of 40. From French to Spanish for instance, “la vendeuse” was correctly translated to “la vendedora” and “le vendeur” to “el vendedor”.

Errors are not systematic, showing that they can be fixed. “Kierowniczka” (Polish for female director) was correctly translated in all four target languages, although “die Chefin”, “la capa”, “la jefa” and “la cheffe” were wrongly translated to their masculine forms. (When Google correctly translated a feminine occupation, it was often because the target language’s word was not gender-inflected. For instance, “l’insegnante” in Italian designates both a female and a male teacher.)

The experiment’s code and data are available online.This experiment might not reflect what Google Translate shows when translating web pages or longer texts. In some cases, especially when nearby words contain feminine forms, Google correctly translates gender-inflected forms.

Digital colonialism

Stereotypes sneak into translations because Google optimizes translations for English.

A Google spokesperson told AlgorithmWatch that “translating between language pairs requires high volumes of bilingual data that often don’t exist for all language pairs. The way to enable these translations is by using a technique called ‘bridging’. Language bridging in translation means that to translate from X to Y a third language is introduced (E) based on the existence of bilingual data to translate X to E and then E to Y. The most common language used as bridge is English.”

“The majority of nouns in English are gender-neutral: so, when translating the feminine term for ‘nurse’ from a gender-inflected language to English, the gender is ‘lost’ in the translation to the bridging language,” the Google spokesperson added.

Several experts I talked to agreed that the community of researchers working on machine translation was not very concerned about non-English languages. Only in May 2020 did the Association for Computational Linguistics, a large professional body, tell reviewers of their annual conference that they could not reject a paper solely because it was about a language other than English.

Window dressing

In 2018, Google introduced a feature that alerted users that some words could be gender-specific when translating from English.

However, it is unclear whether such efforts were made in earnest. Over two years after the changes were deployed, “developer” is correctly translated into French both in the masculine form as “le développeur” and in the feminine as “la développeuse”. But “the developer” translates to “le développeur” and all the sentences I tried translated into the masculine, including the phrase “the developer is a woman”.Verified falsehoods

In my experiment, 182 translations out of 440 turned out to be false. In their vast majority, the errors had to do with feminine forms converted to their masculine equivalent. 68 of the false translations were marked as “verified” by Google.

The Google spokesperson declined to explain precisely how the “verified” label was awarded. “We mark translations as ‘verified’ when they’ve been reviewed by several volunteers in the Google Translate Community and these volunteers agree the translation is correct”, they said. “We are improving our detection of low-quality contributions with automated scoring methods and periodic knowledge checks.”

My experiment raised other issues. “Le chef” (the boss, in French), was translated to “der Führer” in German, a word meaning “the guide” and very strongly linked to the Nazi era. The translation was marked as verified.

But Google reassured me that no extremist group infiltrated the “Google Translate Community” to spread far-right language. “In this specific case, [the error] is due to the ‘bridging’ process”, the spokesperson said. “If you do a translation for ‘le chef’ from French to English we get ‘leader’. If you then translate ‘leader’ from English to German you get ‘Führer’”.

No escape

Google Translate is not just another translation service. It is a feature that Europeans can hardly escape.

Since an update in April 2019, Google Chrome prompts users to instantly translate web pages. Anyone visiting a website in a foreign language is asked to choose between the original or the google-translated version, even if the website offers an official translation in the user’s preferred language. (Google cannot detect websites that provide an official translation and “errs on the side of helpfulness by offering a translate option in all circumstances”, the spokesperson said. They also said users could turn off the translation prompt.)

Approximately 250 million, or one in two, citizens of the European Union use an Android phone. Unless they manage to bypass the system’s blocks (by “rooting” their device), they cannot remove Google Chrome. It is likely that many of them use Google Translate, perhaps unwittingly.

Female historians and male nurses do not exist, Google Translate tells its European users

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents