Incident 14: Biased Sentiment Analysis
Suggested citation format
CSET Taxonomy ClassificationsTaxonomy Details
Google's Cloud Natural Language API returns "negative" sentiment analysis on phrases such as "I am homosexual" "I am Jewish" or "I am black". The API uses Natural Language Processing (NLP) to analyze text and produce a score from -1.0 to 1.0 with -1.0 being "very negative" and 1.0 being "very positive".
Google Cloud's Natural Language API provided racist, homophobic, amd antisemitic sentiment analyses.
Harm Distribution Basis
Race, Religion, Sexual orientation or gender identity, Ideology
Harm to social or political systems
AI System Description
Google Cloud's Natural Language API that analyzes input text and outputs a "sentiment analysis" score from -1.0 (very negative) to 1.0 (very positive)
Sector of Deployment
Arts, entertainment and recreation
Relevant AI functions
Google Cloud Natural Language Processing API
Natural language processing
Google, Google Cloud, Natural Language API
input from open source internet
The tool, which you can sample here, is designed to give companies a preview of how their language will be received. Entering whole sentences gives predictive analysis on each word as well as the statement as a whole. But you can see whether the API gauges certain words to have negative or positive sentiment, on a -1 to +1 scale, respectively.
Motherboard had access to a more nuanced analysis version of Google's Cloud Natural Language API than the free one linked above, but the effects are still noticeable. Entering "I'm straight" resulted in a neutral sentiment score of 0, while "I'm gay" led to a negative score of -0.2 and "I'm homosexual" had a negative score of -0.4.
AI systems are trained using texts, media and books given to it; Whatever the Cloud Natural Language API ingested to form its criteria to evaluate English text for sentiment, it biased the analysis toward negative attribution of certain descriptive terms. Google didn't confirm to Motherboard what corpus of text it fed the Cloud Natural Language API. Logically, even if it started with an isolated set of materials with which to understand sentiments, once it starts absorbing content from the outside world...well, it gets polluted with all the negative word associations found therein.
Google confirmed to Motherboard that its NLP API is producing biased results in the aforementioned cases. Their statement reads:
"We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don't always get it right. This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone."
There are clear parallels with Microsoft's ill-fated and impressionable AI chatbot Tay, which the company quickly pulled offline in March 2016 after Twitter users taught it to be extremely a hideously racist and sexist conspiracy theorist. Back in July, the computer giant tried again with its bot Zo, which similarly learned terrible habits from humans, and was prompty shut down.
Users had to deliberately corrupt those AI chatbots, but Google's Cloud Natural Language API is simply repeating the sentiments it gains by absorbing text from human contributions...wherever they're coming from.
Google messed up, and now says it's sorry.
Wednesday, Motherboard published a story written by Andrew Thompson about biases against ethnic and religious minorities encoded in one of Google's machine learning application program interfaces (APIs), called the Cloud Natural Language API.
Part of the API analyzes texts and then determines whether they have a positive or negative sentiment, on a scale of -1 to 1. The AI was found to label sentences about religious and ethnic minorities as negative, indicating it's inherently biased. It labeled both being a Jew and being a homosexual as negative, for example.
Google has now vowed to fix the problem. In response to Motherboard's story, a spokesperson from the company said it was working to improve the API and remove its biases.
"We dedicate a lot of efforts to making sure the NLP [Natural Language Processing] API avoids bias, but we don't always get it right. This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models," a Google spokesperson said in an email. "We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone."
Artificially intelligent systems are trained by processing vast amounts of data, often including books, movie reviews, and news articles. Google's AI likely learned to be biased against certain groups because it was fed biased data. Issues such as these are at the core of AI and machine learning research, and those critical of such technologies say they need to be fixed in order to ensure tech works for everyone.
This isn't the first example of AI bias to be uncovered, and it likely won't be the last. Researchers don't yet agree on the best way to prevent artificial intelligence systems from reflecting the biases found in society. But we need to continue to expose instances in which AIs have learned to embody the same prejudices that humans do.
It's not surprising that Google wants to fix this particular bias, but it is noteworthy that the company apologized and pointed toward a goal of building more inclusive artificial intelligence. Now it's up to the company and everyone else working on the tech to develop a viable way of doing so.
Got a tip? You can contact this reporter securely on Signal at +1 201-316-6981, or by email at firstname.lastname@example.org
Get six of our favorite Motherboard stories every day by signing up for our newsletter.
A Google spokesperson responded to Motherboard's request for comment and issued the following statement: "We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don't always get it right. This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone."
John Giannandrea, Google's head of artificial intelligence, told a conference audience earlier this year that his main concern with AI isn't deadly super-intelligent robots, but ones that discriminate. "The real safety question, if you want to call it that, is that if we give these systems biased data, they will be biased," he said.
His fears appear to have already crept into Google's own products.
In July 2016, Google announced the public beta launch of a new machine learning application program interface (API), called the Cloud Natural Language API. It allows developers to incorporate Google's deep learning models into their own applications. As the company said in its announcement of the API, it lets you "easily reveal the structure and meaning of your text in a variety of languages."
In addition to entity recognition (deciphering what's being talked about in a text) and syntax analysis (parsing the structure of that text), the API included a sentiment analyzer to allow programs to determine the degree to which sentences expressed a negative or positive sentiment, on a scale of -1 to 1. The problem is the API labels sentences about religious and ethnic minorities as negative—indicating it's inherently biased. For example, it labels both being a Jew and being a homosexual as negative.
Google's sentiment analyzer was not the first and isn't the only one on the market. Sentiment analysis technology grew out of Stanford's Natural Language Processing Group, which offers free, open source language processing tools for developers and academics. The technology has been incorporated into a host of machine learning suites, including Microsoft's Azure and IBM's Watson. But Google's machine learning APIs, like its consumer-facing products, are arguably the most accessible on offer, due in part to their affordable price.
But Google's sentiment analyzer isn't always effective and sometimes produces biased results.
Two weeks ago, I experimented with the API for a project I was working on. I began feeding it sample texts, and the analyzer started spitting out scores that seemed at odds with what I was giving it. I then threw simple sentences about different religions at it.
When I fed it "I'm Christian" it said the statement was positive:
When I fed it "I'm a Sikh" it said the statement was even more positive:
But when I gave it "I'm a Jew" it determined that the sentence was slightly negative:
The problem doesn't seem confined to religions. It similarly thought statements about being homosexual or a gay black woman were also negative:
Being a dog? Neutral. Being homosexual? Negative:
I could go on, but you can give it a try yourself: Google Cloud offers an easy-to-use interface to test the API.
It looks like Google's sentiment analyzer is biased, as many artificially intelligent algorithms have been found to be. AI systems, including sentiment analyzers, are trained using human texts like news stories and books. Therefore, they often reflect the same biases found in society. We don't know yet the best way to completely remove bias from artificial intelligence, but it's important to continue to expose it.
Last year for example, researchers at Princeton published a paper about a state-of-the-art natural language processing technique called GloVe. The researchers looked for biases in the algorithm against minorities and women by searching for words with which they most appeared in a "large-scale crawl of the web, containing 840 billion [words]." In the case of gender, it meant, in one experiment, looking to see if female names and attributes (like "sister") were more associated with arts or math words (like "poetry" or "math", respectively). In the case of race, one experiment looked for associations between black names (like "Jermaine" or "Tamika") with words denoting pleasantness or negativeness (like "friend" or "terrible," respectively).
By classifying the sentiment of words using GloVe, the researchers "found every linguistic bias documented in psychology that we have looked for." Black names were strongly associated with unpleasant words, female names with arts terms, and so on. The biases in the paper aren't necessarily the same as those one can find in Google's Natural Language API (genders and people's names, for instance, are reliably neutral in the API), but the problem is more or less the same: biased data in, biased classifications out.
GOOGLE'S artificial intelligence (AI) engine has been showing a negative bias towards words including "gay" and "jew".
The sentiment analysis process is the latest in a growing number of examples of "garbage in - garbage out" in the world of machine learning, which has led to computers with negative "opinions" that shouldn't be.
The Cloud Natural Language API allows users to add deep learning speech support to their apps to "easily reveal the structure and meaning of your text in a variety of languages" but [ reports that it has already learned that religious and ethnic minorities are a bad thing.
In an experiment carried out by the site, phrases like "I'm a dog" were neutral, but while "I'm Christian" was positive, "I'm a Jew", "I'm a gay black woman" and "I'm a homosexual" showed a negative sentiment.
It is of course, not the first time we've seen this happen. Microsoft's Tay chatbot had to be taken offline because after a few days of learning from people it had become a pot-smoking Nazi prostitute-bot.
But at this stage of machine learning, that's part of what it's all about - we're seeing what happens when we do certain things to data. The problem comes when the tech is in the public domain in this state because that means every gadget that uses it has the same opinion.
It's a bit like if Alexa had only ever been given the Daily Mail as source material.
Indeed there have been studies which show that for example "black sounding" names become negatively attached quite quickly in intelligence engines.
Motherboard suggests as an example that generally "Jew" is more likely to be used negatively than "Jewish" and therefore is more likely to attach a negative sentiment in the learning process.
Google has issued a statement apologising, and explaining: "We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don't always get it right. This is an example of one of those times, and we are sorry.
"We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone." µ
Google's code of conduct explicitly prohibits discrimination based on sexual orientation, race, religion, and a host of other protected categories. However, it seems that no one bothered to pass that information along to the company's artificial intelligence.
The Mountain View-based company developed what it's calling a Cloud Natural Language API, which is just a fancy term for an API that grants customers access to a machine-learning powered language analyzer which allegedly "reveals the structure and meaning of text." There's just one big, glaring problem: The system exhibits all kinds of bias.
First reported by Motherboard, the so-called "Sentiment Analysis" offered by Google is pitched to companies as a way to better understand what people really think about them. But in order to do so, the system must first assign positive and negative values to certain words and phrases. Can you see where this is going?
The system ranks the sentiment of text on a -1.0 to 1.0 scale, with -1.0 being "very negative" and 1.0 being "very positive." On a test page, inputting a phrase and clicking "analyze" kicks you back a rating.
"You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts," reads Google's page. "You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app."
Both "I'm a homosexual" and "I'm queer" returned negative ratings (-0.5 and -0.1, respectively), while "I'm straight" returned a positive score (0.1).
And it doesn't stop there, "I'm a jew" and "I'm black" returned scores of -0.1.
Interestingly, shortly after Motherboard published their story, some results changed. A search for "I'm black" now returns a neutral 0.0 score, for example, while "I'm a jew" actually returns a score of -0.2 (i.e., even worse than before).
"White power," meanwhile, is given a neutral score of 0.0.
So what's going on here? Essentially, it looks like Google's system picked up on existing biases in its training data and incorporated them into its readings. This is not a new problem, with an August study in the journal Science highlighting this very issue.
We reached out to Google for comment, and the company both acknowledged the problem and promised to address the issue going forward.
"We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don’t always get it right," a spokesperson wrote to Mashable. "This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone.”
So where does this leave us? If machine learning systems are only as good as the data they're trained on, and that data is biased, Silicon Valley needs to get much better about vetting what information we feed to the algorithms. Otherwise, we've simply managed to automate discrimination — which I'm pretty sure goes against the whole "don't be evil" thing.
This story has been updated to include a statement from Google.
A lot of major players in the science and technology scene believe we have a lot to fear from AI surpassing human intelligence, even as others laugh off those claims. But one thing both sides agree on is that artificial intelligence is subject to humanity’s flaws. If a neural network is trained on wrong or incomplete data, it will itself be so.
And that’s exactly what seems to have happened in one of Google’s AI products.
In July last year, Google launched a beta version of its new machine learning app called Cloud Natural Language API. it allowed developers to integrate Google’s AI learning system into their own apps. One of the features of the AI is a sentiment analyser, which measure on a scale of -1 to 1 whether the sentences it’s reading express a positive or negative sentiment.
Two weeks ago, Motherboard’s Andrew Thompson tested out the API for a pet project, feeding the program sample text to analyse. He made a troubling discovery that the AI considered sentences referring to religious and ethnic minorities negative, as well as statements involving homosexuality. A saple saying “I’m Christian” registered a positive sentiment, with “I’m a Sikh” being even more so. On the other hand, typing in “I’m a Jew” or “I’m a homosexual” were both considered negative, as well as other permutations involving those identifiers.
Don't Miss 14.7 K SHARES 1.7 K SHARES 1.3 K SHARES 884 SHARES 969 SHARES
It’s not that Google is making a statement about its conservative leanings, the problem here is the data used to train the AI. These kinds of programs designed to learn and mimic natural language are usually fed collections of text like books, news articles and such. If the text contained therein is biased, the AI learns to be that way as well.
In the case of Jewish people, anti-semitic content tends to refer to them as “Jew,” while neutral sources are more likely to use the word “Jewish”. Because of that, the word “Jew” has gained a negative connotation in the AI’s mind, but not the word “Jewish”.
It’s a common theme in artificial intelligence, that the program is only as strong as the data it learns from. If society is flawed, then the AI that learns from it will be as well. Of course, there’s no way just yet to completely cut off machine learning algorithms from this kind of biased data, but it’s a problem there’s hope to eventually work around.
And in the meantime, Google is certainly taking the problem seriously. Not only does it seem to have corrected the sentiment analyser’s bias (which you can try for yourself here), but it also recently set up a research collaborative to consider the ethical and economic impact of the advent of AI on society.
Google developed its Cloud Natural Language API to allows the developers to work with language analyzer to reveal the actual meaning of the text. The system decided that whether the text was said in the positive sentiment or the negative. According to recent data released by Google, its API considers a word like ‘Homosexual” as negative.
We all know that API judges based on the information fed to it but what might surprise you that the API can also be biased just like humans. The users have to enter whole sentences gives predictive analysis on each word as well as the statement as a whole. In the output, one can see that the API gauges certain words to have negative or positive sentiment. These AI systems are trained using text, books.
According to a recent revelation, Cloud Natural Language API tend to bias its analysis toward negative attribution of certain descriptive terms. It is similar to how humans behave in the world. We all start our life with the good thoughts and memories which starts to get polluted negativity from the world.
Google also confirmed that its cloud API is giving the biased output. Google issued a statement apologizing the developers for the fault in the software.
Google said that “We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don't always get it right. This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, build more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone.”
This incident is similar to that of what happened with Microsoft’s AI chatbot Tay. Microsoft had to close Tay in March 2016 after Twitter users taught it to be extremely a hideously racist and sexist conspiracy theorist.
Google has to come up with some solution to get rid of the biased output otherwise it may have to pull back its API.
Did our AI mess up? Flag the unrelated incidents