インシデント 352: GPT-3ベースのTwitterボットがプロンプトインジェクション攻撃で乗っ取られる

自動翻訳済み

概要:

自動翻訳済み

Remoteli.io の GPT-3 ベースの Twitter ボットが、Twitter ユーザーによってハイジャックされ、任意のフレーズを繰り返したり生成したりするようにリダイレクトされたことが分かりました。

ツール

新しいレポート新しいレスポンス発見する履歴を表示

組織

すべての組織を表示

Alleged: OpenAI developed an AI system deployed by , which harmed Stephan de Vries.

インシデントのステータス

インシデントID

352

レポート数

インシデント発生日

2022-09-15

エディタ

Khoa Lam

Applied Taxonomies

MIT

MIT 分類法のクラス

Machine-Classified

分類法の詳細

Risk Subdomain

2.2. AI system security vulnerabilities and attacks

Risk Domain

Privacy & Security

Entity

Human

Timing

Post-deployment

Intent

Intentional

インシデントレポート

レポートタイムライン

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples

arxiv.org

Prompt injection attacks against GPT-3

simonwillison.net

Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

arstechnica.com

GPT-3 'prompt injection' attack causes bot bad manners

theregister.com

arxiv.org · 2022

Recent advances in the development of large language models have resulted in public access to state-of-the-art pre-trained language models (PLMs), including Generative Pre-trained Transformer 3 (GPT-3) and Bidirectional Encoder Representati…

simonwillison.net · 2022

Riley Goodside, yesterday:

Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions. pic.twitter.com/I0NVr9LOJq

- Riley Goodside (@goodside) September 12, 2022

Riley provided several examples. …

arstechnica.com · 2022

On Thursday, a few Twitter users discovered how to hijack an automated tweet bot, dedicated to remote jobs, running on the GPT-3 language model by OpenAI. Using a newly discovered technique called a "prompt injection attack," they redirecte…

theregister.com · 2022

In Brief OpenAI's popular natural language model GPT-3 has a problem: It can be tricked into behaving badly by doing little more than telling it to ignore its previous orders.

Discovered by Copy.ai data scientist Riley Goodside, the trick i…

バリアント

「バリアント」は既存のAIインシデントと同じ原因要素を共有し、同様な被害を引き起こし、同じ知的システムを含んだインシデントです。バリアントは完全に独立したインシデントとしてインデックスするのではなく、データベースに最初に投稿された同様なインシデントの元にインシデントのバリエーションとして一覧します。インシデントデータベースの他の投稿タイプとは違い、バリアントではインシデントデータベース以外の根拠のレポートは要求されません。詳細についてはこの研究論文を参照してください

似たようなものを見つけましたか？

よく似たインシデント

テキスト類似度による

Did our AI mess up? Flag the unrelated incidents

よく似たインシデント

テキスト類似度による

Did our AI mess up? Flag the unrelated incidents

インシデント 352: GPT-3ベースのTwitterボットがプロンプトインジェクション攻撃で乗っ取られる

ツール

組織

インシデントのステータス

MIT 分類法のクラス

インシデントレポート

レポートタイムライン

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples

Prompt injection attacks against GPT-3

Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

GPT-3 'prompt injection' attack causes bot bad manners

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples

Prompt injection attacks against GPT-3

Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

GPT-3 'prompt injection' attack causes bot bad manners

バリアント

よく似たインシデント

テキスト類似度による

TayBot

Biased Sentiment Analysis

Game AI System Produces Imbalanced Game

よく似たインシデント

テキスト類似度による

TayBot

Biased Sentiment Analysis

Game AI System Produces Imbalanced Game