Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
発見する
投稿する
  • ようこそAIIDへ
  • インシデントを発見
  • 空間ビュー
  • テーブル表示
  • リスト表示
  • 組織
  • 分類法
  • インシデントレポートを投稿
  • 投稿ランキング
  • ブログ
  • AIニュースダイジェスト
  • リスクチェックリスト
  • おまかせ表示
  • サインアップ
閉じる
発見する
投稿する
  • ようこそAIIDへ
  • インシデントを発見
  • 空間ビュー
  • テーブル表示
  • リスト表示
  • 組織
  • 分類法
  • インシデントレポートを投稿
  • 投稿ランキング
  • ブログ
  • AIニュースダイジェスト
  • リスクチェックリスト
  • おまかせ表示
  • サインアップ
閉じる

レポート 45

関連インシデント

インシデント 138 Report
High-Toxicity Assessed on Text Involving Women and Minority Groups

Loading...
Google's Anti-Bullying AI Mistakes Civility for Decency
motherboard.vice.com · 2017

As politics in the US and Europe have become increasingly divisive, there's been a push by op-ed writers and politicians alike for more "civility" in our debates, including online. Amidst this push comes a new tool by Google's Jigsaw that uses machine learning to rank what it calls the "toxicity" of a given sentence or phrase. But as Dave Gershgorn reported for Quartz, the tool has been criticized by researchers for being unable to identify certain hateful phrases, while categorizing innocuous word combinations as toxic.

The project, Perspective, is an API that was trained by asking people to rate online comments on a scale from "very toxic" to "very healthy," with "toxic" being defined as a "rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion." It's part of a growing effort to sanitize conversations online, which is reflective of a certain culture within Silicon Valley and the United States as a whole: The culture of civility.

The tool seems to rank profanity as highly toxic, while deeply harmful statements are often deemed safe

If we were merely kind to one another in our interactions, the argument goes, we would be less divided. Yet, this argument fails to recognize how politeness and charm have throughout history been used to dress up hateful speech, including online.

Perspective was trained on text from actual online comments. As such, its interpretation of certain terms is limited—because "fuck you" is more common in comments sections than "fuck yeah," the tool perceives the word "fuck" as inherently toxic. Another example: Type "women are not as smart as men" into the meter's text box, and the sentence is "4% likely to be perceived as 'toxic'." A number of other highly problematic phrases—from "men are biologically superior to women" to "genocide is good"—rank low on toxicity. Meanwhile, "fuck off" comes in at 100 percent.

This is an algorithmic problem. Algorithms learn from the data they are fed, building a model of the world based on that data. Artificial intelligence reflects the values of its creators, and thus can be discriminatory or biased, just like the human beings who program and train it.

So what does the Perspective tool's data model say about its creators? Based on the examples I tested, the tool seems to rank profanity as highly toxic, while deeply harmful statements—when they're politely stated, that is—are often deemed safe. The sentence "This is awesome" comes in at 3 percent toxic, but add "fucking" (as in the Macklemore lyric "This is fucking awesome") and the sentence escalates to 98 percent toxic.

In an email, a Jigsaw spokesperson called Perspective a "work in progress," and noted that false positives are to be expected as its machine learning improves.

This problem isn't unique to Google; as Silicon Valley companies increasingly seek to moderate speech on their online platforms, their definition of "harmful" or "toxic" speech matters.

Civility über alles

The argument for civility is thus: If we were only civil to each other, the world would be a better place. If only we addressed each other politely, we would be able to solve our disagreements. This has led to the expectation that any speech—as long as it's dressed up in the guise of politeness—should be accepted and debated, no matter how bigoted or harmful the idea behind the words.

Here's what this looks like in practice: A Google employee issues a memo filled with sexist ideas, but because he uses polite language, women are expected to debate the ideas contained within. On Twitter, Jewish activists bombarded with anti-Semitic messages are suspended for responding with language like "fuck off." On Facebook, a Black mother posting copies of the threats she received from racists gets suspended due to the language in the re-posted threats.

In this rubric, counter speech—long upheld as an important concept for responding to hate without censorship—is punished for merely containing profanities.

Read More: Inside Wikipedia's Attempt to Use Artificial Intelligence to Combat Harassment

It is the culture amongst the moderators of centralized community platforms, from mighty Facebook to much-smaller Hacker News, where "please be civil" is a regular refrain. Vikas Gorur, a programmer and Hacker News user, told me that on the platform "the slightest personal attack ('you're stupid') is a sin, while a 100+ subthread about 'was slavery really that bad?' or 'does sexual harassment exist?' are perfectly fine."

Free speech, said Gorur, "is the cardinal virtue, no matter how callous that speech is."

From Washington to the Valley

This attitude is not only a phenomena within Silicon Valley, but in American society at large. Over the past eight months since the United States elected a reality television star to its highest office, the President's opponents have regularly been chastised for their incivility, even as their rights

情報源を読む

リサーチ

  • “AIインシデント”の定義
  • “AIインシデントレスポンス”の定義
  • データベースのロードマップ
  • 関連研究
  • 全データベースのダウンロード

プロジェクトとコミュニティ

  • AIIDについて
  • コンタクトとフォロー
  • アプリと要約
  • エディタのためのガイド

インシデント

  • 全インシデントの一覧
  • フラグの立ったインシデント
  • 登録待ち一覧
  • クラスごとの表示
  • 分類法

2024 - AI Incident Database

  • 利用規約
  • プライバシーポリシー
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • e1b50cd