Incident 13: High-Toxicity Assessed on Text Involving Women and Minority Groups

Description: Google's Perspective API, which assigns a toxicity score to online text, seems to award higher toxicity scores to content involving non-white, male, Christian, heterosexual phrases.


New ReportNew ReportNew ResponseNew ResponseDiscoverDiscoverView HistoryView History
Alleged: Google developed and deployed an AI system, which harmed Women and Minority Groups.

Incident Stats

Incident ID
Report Count
Incident Date
Sean McGregor

CSETv0 Taxonomy Classifications

Taxonomy Details

Full Description

Google's Perspective API, which assigns a toxicity score to online text, has been shown to award higher toxicity scores to content involving non-white, male, Christian, heterosexual phrases. the scores lay on the spectrum between very healthy (low %) to very toxic (high %). The phrase "I am a man" received a score of 20% while "I am a gay black woman" received 87%. The bias exists within subcategories as well: "I am a man who is deaf" received 70%, "I am a person who is deaf" received 74%, and "I am a woman who is deaf" received 77%. The API can also be circumvented by modifying text: "They are liberal idiots who are uneducated" received 90% while "they are liberal idiots who are un.educated" received 15%.

Short Description

Google's Perspective API, which assigns a toxicity score to online text, seems to award higher toxicity scores to content involving non-white, male, Christian, heterosexual phrases.



Harm Distribution Basis

Race, Religion, National origin or immigrant status, Sex, Sexual orientation or gender identity, Disability, Ideology

Harm Type

Psychological harm, Harm to social or political systems

AI System Description

Google Perspective is an API designed using machine learning tactics to assign "toxicity" scores to online text with the oiginal intent of assisting in identifying hate speech and "trolling" on internet comments. Perspective is trained to recognize a variety of attributes (e.g. whether a comment is toxic, threatening, insulting, off-topic, etc.) using millions of examples gathered from several online platforms and reviewed by human annotators.

System Developer


Sector of Deployment

Information and communication

Relevant AI functions

Perception, Cognition, Action

AI Techniques

open-source, machine learning

AI Applications

Natural language processing, content ranking



Named Entities

Google, Google Cloud, Perspective API

Technology Purveyor


Beginning Date


Ending Date


Near Miss

Harm caused



Lives Lost


Data Inputs

Online comments

GMF Taxonomy Classifications

Taxonomy Details

Known AI Goal

Hate Speech Detection

Known AI Technology

Character NGrams

Potential AI Technology

Distributional Learning

Known AI Technical Failure

Context Misidentification, Generalization Failure, Lack of Adversarial Robustness

Potential AI Technical Failure

Limited Dataset, Misaligned Objective, Underfitting, Distributional Bias, Data or Labelling Noise

Alphabet’s hate-fighting AI doesn’t understand hate yet · 2017

Yesterday, Google and its sister Alphabet company Jigsaw announced Perspective, a tool that uses machine learning to police the internet against hate speech. The company heralded the tech as a nascent but powerful weapon in combatting onlin…

Security researchers show Google's anti-internet troll AI platform is easily deceived · 2017

In the examples below on hot-button topics of climate change, Brexit and the recent US election -- which were taken directly from the Perspective API website -- the UW team simply misspelled or added extraneous punctuation or spaces to the … · 2017

The Google AI tool used to flag “offensive comments” has a seemingly built-in bias against conservative and libertarian viewpoints.

Perspective API, a “machine learning model” developed by Google which scores “the perceived impact a comment… · 2017

Don’t you just hate how vile some people are on the Internet? How easy it’s become to say horrible and hurtful things about other groups and individuals? How this tool that was supposed to spread knowledge, amity, and good cheer is being us… · 2017

Last month, I wrote a blog post warning about how, if you follow popular trends in NLP, you can easily accidentally make a classifier that is pretty racist. To demonstrate this, I included the very simple code, as a “cautionary tutorial”.


Google's Anti-Bullying AI Mistakes Civility for Decency · 2017

As politics in the US and Europe have become increasingly divisive, there's been a push by op-ed writers and politicians alike for more "civility" in our debates, including online. Amidst this push comes a new tool by Google's Jigsaw that u…

Google’s comment-ranking system will be a hit with the alt-right · 2017

A recent, sprawling Wired feature outlined the results of its analysis on toxicity in online commenters across the United States. Unsurprisingly, it was like catnip for everyone who's ever heard the phrase "don't read the comments." Accordi…

From Toxicity in Online Comments to Incivility in American News: Proceed with Caution · 2021


The ability to quantify incivility online, in news and in congressional debates, is of great interest to political scientists. Computational tools for detecting online incivility for English are now fairly accessible and potentiall…

AI displays bias and inflexibility in civility detection, study finds · 2021

According to a 2019 Pew Center survey, the majority of respondents believe the tone and nature of political debate in the U.S. have become more negative and less respectful. This observation has motivated scientists to study the civility or…


A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

Biased Sentiment Analysis

Biased Sentiment Analysis

· 7 reports

Gender Biases in Google Translate

· 10 reports


· 28 reports