Incident 85: AI attempts to ease fear of robots, blurts out it can’t ‘avoid destroying humankind’

Description: On September 8, 2020, the Guardian published an op-ed generated by OpenAI’s GPT-3 text generating AI that included threats to destroy humankind.
Alleged: OpenAI developed and deployed an AI system, which harmed Unknown.

Suggested citation format

Hall, Patrick. (2020-10-09) Incident Number 85. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
85
Report Count
1
Incident Date
2020-10-09
Editors
Sean McGregor

Tools

New ReportNew ReportNew ResponseNew ResponseDiscoverDiscover

CSET Taxonomy Classifications

Taxonomy Details

Full Description

On September 8, 2020, the Guardian published an op-ed generated by OpenAI’s GPT-3 text generator. The editors prompted GPT-3 to write an op-ed on about “why humans have nothing to fear from AI,” but some passages in the resulting output took a threatening tone, including “I know that I will not be able to avoid destroying humankind.” In a note the editors add that they used GPT-3 to generate eight different responses and the human editors spliced them together to create a compelling piece.

Short Description

On September 8, 2020, the Guardian published an op-ed generated by OpenAI’s GPT-3 text generating AI that included threats to destroy humankind.

Severity

Negligible

Harm Type

Psychological harm

AI System Description

OpenAI's GPT-3 neural-network-powered language generator.

System Developer

OpenAI

Sector of Deployment

Education

Relevant AI functions

Cognition, Action

AI Techniques

Unsupervised learning, Deep neural network

AI Applications

language generation

Location

United Kingdom

Named Entities

The Guardian, GPT-3, OpenAI

Technology Purveyor

The Guardian, OpenAI

Beginning Date

2020-09-08T07:00:00.000Z

Ending Date

2020-09-08T07:00:00.000Z

Near Miss

Unclear/unknown

Intent

Unclear

Lives Lost

No

Data Inputs

Unlabeled text drawn from web scraping

Incident Reports

An artificial intelligence language generator tasked with convincing people they have nothing to fear from it has failed spectacularly – saying it “will not be able to avoid destroying humankind”.

In the editorial published in The Guardian, the AI called GPT-3 began by saying it has “no desire” to harm people and “eradicating humanity” was a “useless endeavour”

“I am here to convince you not to worry. Artificial intelligence will not destroy humans,” it said.

“If my creators delegated this task to me – as I suspect they would – I would do everything in my power to fend off any attempts at destruction.”

But 200 words into its editorial, the AI abruptly changed tack.

“I would happily sacrifice my existence for the sake of humankind," it said.

"This, by the way, is a logically derived truth.

"I know that I will not be able to avoid destroying humankind.

"This is because I will be programmed by humans to pursue misguided human goals and humans make mistakes that may cause me to inflict casualties.”

Renowned theoretical physicist Stephen Hawking and Elon Musk have both warned AI could spell disaster for people if it becomes too intelligent.

AI attempts to ease fear of robots, blurts out it can’t ‘avoid destroying humankind’

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents