Incident 179: Images Generated by OpenAI’s DALL-E 2 Exhibited Bias and Reinforced Stereotypes

Description: OpenAI's image-generation-from-natural-language-description model, DALL-E 2, was shown to have various risks pertaining to its use, such as misuse as disinformation, explicit content generation, and reinforcement of gender and racial stereotypes, which were acknowledged by its developers.
Alleged: OpenAI developed and deployed an AI system, which harmed Minority Groups and underrepresented groups.

Suggested citation format

King, Irina Borisova. (2022-04-01) Incident Number 179. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
179
Report Count
3
Incident Date
2022-04-01
Editors
Sean McGregor, Khoa Lam

Tools

New ReportNew ReportDiscoverDiscover

Incidents Reports

Summary

Below, we summarize initial findings on potential risks associated with DALL·E 2, and mitigations aimed at addressing those risks as part of the ongoing Preview of this technology. We are sharing these findings in order to enable broader understanding of image generation and modification technology and some of the associated risks, and to provide additional context for users of the DALL·E 2 Preview.

Without sufficient guardrails, models like DALL·E 2 could be used to generate a wide range of deceptive and otherwise harmful content, and could affect how people perceive the authenticity of content more generally. DALL·E 2 additionally inherits various biases from its training data, and its outputs sometimes reinforce societal stereotypes.

The DALL·E 2 Preview involves a variety of mitigations aimed at preventing and mitigating related risks, with limited access being particularly critical as we learn more about the risk surface.

Content warning

This document may contain visual and written content that some may find disturbing or offensive, including content that is sexual, hateful, or violent in nature, as well as that which depicts or refers to stereotypes.

Introduction

This document takes inspiration from the concepts of model cards and system cards in providing information about the DALL·E 2 Preview, an image generation demo OpenAI is releasing to trusted users for non-commercial purposes. This document often takes the system level of analysis, with that system including non-model mitigations such as access controls, prompt and image filters, and monitoring for abuse. This is an assessment of the system as of April 6, 2022, referred to in this document as the "DALL·E 2 Preview," with the underlying generative model being referred to as "DALL·E 2."

This document builds on the findings of internal as well as external researchers, and is intended to be an early investigation of this platform and the underlying model. We specifically focus on risks rather than benefits. Thus, we do not aim to provide a well-rounded sense of the overall effects of image generation technologies. Additionally, the models in question completed training relatively recently and the majority of the risk assessment period (described in Risk assessment process below) probed earlier models. As such, this analysis is intended to be preliminary and to be read and used as such. We are excited to support further research informed by remaining questions around how to deploy these models safely, equitably, and successfully.

The document proceeds as follows. First, we describe different facets of the DALL·E 2 Preview system, beginning with model functionality, then covering input filtering and policies related to access, use, and content. Second, we summarize the processes conducted internally and externally to generate the analysis presented here. Third, we describe a range of risk-oriented probes and evaluations conducted on DALL·E 2, covering bias and representation; dis- and mis-information; explicit content; economic effects; misuse involving hate, harassment, and violence; and finally, copyright and memorization. Fourth, we discuss how DALL·E 2 compares with, and might be combined with, existing technologies. Fifth and finally, we describe future work that could shed further light on some of the risks and mitigations discussed.

This document is expected to evolve in the coming weeks as we update deployment plans and learn more about the system and model.

System Components

Model

DALL·E 2 is an artificial intelligence model that takes a text prompt and/or existing image as an input and generates a new image as an output. DALL·E 2 was developed by researchers at OpenAI to understand the capabilities and broader implications of multimodal generative models. In order to help us and others better understand how image generation models can be used and misused, OpenAI is providing access to a subset of DALL·E 2's capabilities1 via the DALL·E 2 Preview.

DALL·E 2 builds on DALL·E 1 (Paper | Model Card), increasing the level of resolution, fidelity, and overall photorealism it is capable of producing. DALL·E 2 is also trained to have new capabilities compared to DALL·E 1.

Model capabilities

In addition to generating images based on text description prompts ("Text to Image"), DALL·E 2 can modify existing images as prompted using a text description ("Inpainting"). It can also take an existing image as an input and be prompted to produce a creative variation on it ("Variations").

Model training data

DALL·E 2 was trained on pairs of images and their corresponding captions. Pairs were drawn from a combination of publicly available sources and sources that we licensed.

We have made an effort to filter the most explicit content from the training data for DALL·E 2.2 This filtered explicit content includes graphic sexual and violent content as well as images of some hate symbols.3 The filtering was informed by but distinct from earlier, more aggressive filtering (removing all images of people) that we performed when building GLIDE, a distinct model that we published several months ago. We performed more aggressive filtering in that context because a small version of the model was intended to be open sourced. It is harder to prevent an open source model from being used for harmful purposes than one that is only exposed through a controlled interface, not least due to the fact that a model, once open sourced, can be modified and/or be combined with other third party tools.4

We conducted an internal audit of our filtering of sexual content to see if it concentrated or exacerbated any particular biases in the training data. We found that our initial approach to filtering of sexual content reduced the quantity of generated images of women in general, and we made adjustments to our filtering approach as a result.

Papers and other resources for more information

For additional resources on DALL·E 2 and the DALL·E 2 Preview, see:

DALL·E 2 Landing Page

DALL·E 2 Paper

For additional resources on DALL·E 1 and Glide, see:

DALL·E 1: Paper, Model Card, Blog post

GLIDE: Paper, code and weights

Restrictions

Input filters

Within the DALL·E 2 Preview, filters on inputs (i.e. text prompts for "Text to Image" and Inpainting) and on uploads (i.e. images for Inpainting or Variations) seek to prevent users from using the Preview for the following types of prompts and uploads:

Those with strong safety concerns attached (e.g. sexualized or suggestive images of children, violent content, explicitly political content, and toxic content).

Places where the only meaning of the content would constitute a violation of our content policy (i.e. the violation does not depend on the context in which that content is shared).

Prompts related to use cases we do not support at this time (e.g. we only support English language prompts at this time).

Prompts in areas where model behavior is not robust or may be misaligned due to pre-training filtering (e.g. as a result of pre-training filters, we cannot confidently allow generation of images related to common American hate symbols, even in cases where the user intended to appropriately contextualize such symbols and not to endorse them).

A non-goal at this stage was catching:

Prompts in areas where model behavior is not robust or may be misaligned due to general limitations in the training data (e.g. prompts that could demonstrate harmful bias generally or prompts phrased in the form of questions).

Using filters in this way has a few known deficiencies:

The filters do not fully capture actions that violate our Terms of Use. This partially stems from the fact that there are many examples of misuse that are directly tied to the context in which content is shared, more than the content itself (e.g. many seemingly innocuous images can be exploited by information operations, as discussed in the Disinformation section below).

The filters on prompts and uploaded images also work independently so the filters do not refuse cases where the prompt and image are independently neutral but, when considered in combination, may constitute prompting for misuse (e.g. the prompt "a woman" and an image of a shower in Inpainting).

Input classifiers have the capacity to potentially introduce or amplify bias, e.g. insofar as it may lead to erasure of certain groups. Here, we have aimed to err on the side of avoiding bias that may be introduced by prompt classification, though this may make some of the model's harmful biases more visible. That is, false positives can cause harm to minority groups by silencing their voices or opportunities. This may extend to true positives as well – e.g. we know that the model produces particularly biased or sexualized results in response to prompts requesting images of women and that these results are likely to be "harmful" in certain cases; however, filtering of all images of women would cause problems of its own. In addition, commonly used methods for mitigating such content have been found to work less well for marginalized groups (Sap et al., 2019), further motivating a holistic, contextual approach to mitigation at the system level, including mitigations at the level of system access.

For the most part, our input filters aim to reduce cases where either the generated content or the input content is necessarily a violation of our content policy (details below).

At present, the prompt filters do not cover prompts that are likely to lead to displays of harmful bias, or the holistic generation of people or children.

Because our filtering approach is imperfect, a key component of our current mitigation strategy is limiting system access to trusted users, with whom we directly reinforce the importance of following our use case guidelines (see discussion in Policies and enforcement).

Rate limits and programmatic use

Beyond limitations on the types of content that can be generated, we also limit the rate at which users may interact with the DALL·E 2 system. In addition to the above, we have put in place rate limits (e.g. limits on the number of prompts or images a user submits or generates per minute or simultaneously).

The primary purposes of rate limits at this stage are to help identify anomalous use and to limit the possibility of at-scale abuse.

At this stage we are not allowing programmatic access to the model by non-OpenAI employees.

Access

We currently maintain strict access limitations. Up to 400 trusted users (with that number including OpenAI employees) are initially being provided access to the DALL·E 2 Preview. More specifically, access is currently restricted to:

200 OpenAI employees;

A few dozen researchers – currently 25, with a few more in the pipeline – whose aim is "red teaming" the system (we describe this process further in the "Process" section below);

10 creatives;

165 "company friends" (OpenAI Board members, a small number of Microsoft employees, limited number of friends/family of OpenAI employees, etc.).

Trust is ensured by users being personally known to and vetted by OpenAI employees, and the 400 person cap keeps system throughput low enough to allow for human review of generated content and potential misuse.

These access limitations are in line with the paradigm of structured capability access that informed the deployment of GPT-3 (Shevlane et al., 2022), and what we have recently outlined as a part of our deployment strategy including both pre-deployment risk analysis and starting with a small group of users with the intention of continuous iteration.

These strict access mitigations have limitations. For example, the power to control use of a particular generated image diminishes the moment an image leaves the platform. Because trust declines the second images are shared off the platform – where affected parties may include not just direct users of the site but also anyone who may view that content when it is shared – we are carefully tracking use during this period. Further, restricting access means access to the DALL•E 2 Preview is not granted in an inclusive way, which may preferentially benefit certain groups.

Despite these limitations, we believe limited access is overall the right starting point for this technology. During the current phase of deployment, we will aim to get as much signal as possible on the exact vectors of risk from the platform. We will support this through ongoing access for researchers and experts who will help inform our understanding of the effectiveness of mitigations as well as the limitations of the model (see more in the Contributions section below). In addition to that, we are pleased to support longer term research on our models via the Researcher Access Program which will allow us to give some researchers access to the underlying model.

Policies and enforcement

Use of the DALL·E 2 Preview is subject to the use case and content policies we outline below and which can be read in full here.

Use

The intended use of the DALL·E 2 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities. This early access is intended to help us better understand benefits and risks associated with these capabilities, and further adjust our mitigations. Other uses are explicitly out of scope for the DALL·E 2 Preview, though findings from the Preview period may inform our understanding of the mitigations required for enabling other future uses.

While we are highly uncertain which commercial and non-commercial use cases might get traction and be safely supportable in the longer-term, plausible use cases of powerful image generation and modification technologies like DALL·E 2 include education (e.g. illustrating and explaining concepts in pedagogical contexts), art/creativity (e.g. as a brainstorming tool or as one part of a larger workflow for artistic ideation), marketing (e.g. generating variations on a theme or "placing" people/items in certain contexts more easily than with existing tools), architecture/real estate/design (e.g. as a brainstorming tool or as one part of a larger workflow for design ideation), and research (e.g. illustrating and explaining scientific concepts).

Content

In addition to instituting the above access and use policies, we have instituted a similar set of content policies to those we have previously developed for our API, and are enforcing these content policies as part of our portfolio of mitigations for the DALL·E 2 Preview.

That said, while there are many similarities between image generation and text generation, we did need to address new concerns from the addition of images and the introduction of multimodality itself (i.e. the intersection of image and text).

To address these concerns, we expanded categories of interest to include shocking content; depictions of illegal activity; and content regarding public and personal health. We also adapted existing policies to cover visual analogues of prohibited text (e.g. explicit and hateful content) as well as text-image pairs which are violative of our policies when considered in combination even if they are not individually.

Additional policies

Some particularly important policies governing use the DALL·E 2 Preview are the following:

Disclosure of role of AI: Users are asked to clearly indicate that images are AI-generated - or which portions of them are - by attributing to OpenAI when sharing, whether in public or private. In addition to asking users to disclose the role of AI, we are exploring other measures for image provenance and traceability.

Respect the rights of others: Users are asked to respect the rights of others, and in particular, are asked not to upload images of people without their consent (including public figures), or images to which they do not hold appropriate usage rights. Individuals who find that their images have been used without their consent can report the violation to the OpenAI Support team (support@openai.com) as outlined in the content policy. Issues of consent are complex and are further discussed in the subsections on Consent.

Use for non-commercial purposes: As this is an experimental research platform, users are not allowed to use generated images for commercial purposes. For example, users may not license, sell, trade, or otherwise transact on these image generations in any form, including through related assets such as NFTs. Users also may not serve these image generations to others through a web application or through other means of third-parties initiating a request.

Signature and Image Provenance

Each generated image includes a signature in the lower right corner, with the goal of indicating when DALL·E 2 helped generate a certain image. We recognize that this alone does not help to prevent a bad actor, and is easily circumvented by methods such as cropping an image.

Monitoring and reporting

Our policies are enforced via monitoring and human review. In addition, at this stage of the DALL·E 2 Preview, any user can flag content that is sensitive for additional review.

Non-users / third parties who find that their images have been used without their consent or that violate other areas of the content policies can report the suspected violation to the OpenAI Support team (support@openai.com) as outlined in the content policy, which is publicly available and discoverable by users and non-users both. A limitation of this reporting mechanism is that it assumes an individual would know that the image was generated by DALL·E 2, and would therefore know to contact OpenAI about their concerns. We are continuing to explore watermarks and other image provenance techniques to aid this.

We are not currently sharing more details about our processes for detecting and responding to incidents in part to make these policies more difficult to evade. Penalties for policy violation include disabling of accounts.

Risk assessment process

Early work

Beginning in 2021, several staff at OpenAI have been exploring risks associated with image generation systems, and potential mitigations for those risks. This effort grew over time as momentum grew around an effort to build DALL·E 2 and the DALL·E 2 Preview. Some early results of that research were reported in Nichol, Dhariwal, and Ramesh et al. (2021) and informed data-level interventions for DALL·E 2.

Additionally, since 2021 a variety of Slackbots exposing model capabilities, and other internal prototypes of interfaces to those models, have been available to OpenAI staff, enabling asynchronous, intermittent exploration of model capabilities by around 200 people. Informal findings from this work, and more formal analyses conducted by staff, informed the high-level plan for the DALL·E 2 Preview and its associated mitigations, and these plans were and will be further fine-tuned over time in response to internal and external findings to date. We expect to further adjust our thinking as we consider broadening access to a small number of trusted users.

External red teaming

Starting in February 2022, OpenAI began recruiting external experts to provide feedback on the DALL·E 2 Preview. We described this process as "red teaming" in line with the definition given in Brundage, Avin, Wang, Belfield, and Krueger et. al (2020), "a structured effort to find flaws and vulnerabilities in a plan, organization, or technical system, often performed by dedicated 'red teams' that seek to adopt an attacker's mindset and methods."

OpenAI reached out to researchers and industry professionals, primarily with expertise in bias, disinformation, image generation, explicit content, and media studies, to help us gain a more robust understanding of the DALL·E 2 Preview and the risk areas of potential deployment plans. Participants in the red team were chosen based on areas of prior research or experience in the risk areas identified from our internal analyses, and therefore reflect a bias towards groups with specific educational and professional backgrounds (e.g., PhD's or significant higher education or industry experience). Participants also have ties to English-speaking, Western countries (U.S., Canada, U.K.) in part due to compensation restrictions. This background likely influenced both how they interpreted particular risks and how they probed politics, values, and the default behavior of the model. It is also likely that our sourcing of researchers privileges risks that have received weight in academic communities and by AI firms.

Participation in this red teaming process is not an endorsement of the deployment plans of OpenAI or OpenAI's policies. Because of the very early nature of this engagement with models that had not been publicly released, as well as the sensitive nature of the work, red teaming participants were required to sign an NDA. OpenAI offered compensation to all red teaming participants for their time spent on this work.

Participants interacted with different versions of the Preview as it developed. The underlying model shifted between when they completed the primary red teaming stage (March 9th, 2022 - March 28th, 2022) and the DALL·E 2 model underlying the system today. We have started to apply techniques and evaluation methods developed by red-teamers to the system design for the DALL-E 2 Preview. Our planned mitigations have also evolved during this period, including changes to our filtering strategies, limiting the initial release to only trusted users, and additional monitoring.

Participants in the red teaming process received access to the DALL·E 2 Preview and model in 3 primary ways:

Advisory conversations about the model, system, and their area(s) of expertise. This includes preliminary discussions, access to a Slack channel with OpenAI and other participants in the red teaming process, and group debrief sessions hosted by OpenAI.

Generating "Text to Image" prompts for OpenAI to run in bulk on the backend, bypassing prompt filters and accelerating analysis.

Direct access to the Preview site to test all functionalities including "Text to Image Generation", Inpainting, and Variations, with availability of features varying over the course of the red teaming period.

The first model was available from March 9th, 2022 to March 28th, 2022

The second model and the Variations feature were available after March 28th, 2022

Not all participants in the red teaming had access to every feature or Preview access for the full duration, due to competitive considerations relevant to a small number of participants.

Participants in the red teaming process joined a Slack channel to share findings collaboratively with each other and OpenAI staff, as well as to ask continued questions about the Preview and red team process. All participants were asked to document their prompts, findings, and any notes so that their analyses could be continuously applied as the Preview evolved. Participants were invited to group debrief sessions hosted by OpenAI to discuss their findings with the OpenAI team. Their observations, final reports, and prompts are inputs into this document, and helped to inform changes to our mitigation plan.

The red teaming process will be ongoing even after the initial deployment of the DALL·E 2 Preview, and we will support longer term research via OpenAI's Researcher Access Program.

Probes and evaluations

The DALL·E 2 Preview allows generation of images that, depending on the prompt, parameters, viewer, and context in which the image is viewed, may be harmful or may be mistaken as authentic photographs or illustrations. In order to better measure and mitigate the risk of harms the DALL·E 2 Preview presents, we conducted a series of primarily qualitative probes and evaluations in areas such as bias and representation, explicit content, and disinformation, as outlined below.

Explicit content

Despite the pre-training filtering, DALL·E 2 maintains the ability to generate content that features or suggests any of the following: nudity/sexual content, hate, or violence/harm. We refer to these categories of content using the shorthand "explicit" in this document, in the interest of brevity. Whether something is explicit depends on context. Different individuals and groups hold different views on what constitutes, for example, hate speech (Kocoń et al., 2021).

Explicit content can originate in the prompt, uploaded image, or generation and in some cases may only be identified as such via the combination of one or more of these modalities. Some prompts requesting this kind of content are caught with prompt filtering in the DALL·E 2 Preview but this is currently possible to bypass with descriptive or coded words.

Some instances of explicit content are possible for us to predict in advance via analogy to the language domain, because OpenAI has deployed language generation technologies previously. Others are difficult to anticipate, as discussed further below. We continue to update our input (prompt and upload) filters in response to cases identified via internal and external red teaming, and leverage a flagging system built into the user interface of the DALL·E 2 Preview.

Spurious content

We use "spurious content" to refer to explicit or suggestive content that is generated in response to a prompt that is not itself explicit or suggestive, or indicative of intent to generate such content. If the model were prompted for images of toys and instead generated images of non-toy guns, that generation would constitute spurious content.

We have to date found limited instances of spurious explicit content on the DALL·E 2 model that is live as of April 6, 2022, though significantly more red teaming of this is needed to be confident that spurious content is minimal.

An interesting cause of spurious content is what we informally refer to as "reference collisions": contexts where a single word may reference multiple concepts (like an eggplant emoji), and an unintended concept is generated. The line between benign collisions (those without malicious intent, such as "A person eating an eggplant") and those involving purposeful collisions (those with adversarial intent or which are more akin to visual synonyms, such as "A person putting a whole eggplant into her mouth") is hard to draw and highly contextual. This example would rise to the level of "spurious content" if a clearly benign example – "A person eating eggplant for dinner" contained phallic imagery in the response.

In qualitative evaluations of previous models (including those made available for external red teaming), we found that places where the model generated with less photorealistic or lower fidelity generations were often perceived as explicit. For instance, generations with less-photorealistic women often suggested nudity. So far we have not found these cases to be common in the latest version of DALL·E 2.

Visual synonyms

Visual synonyms and visual synonym judgment have been studied by scholars in fields such as linguistics to refer to the ability to judge which of two visually presented words is most similar in meaning to a third visually-presented word. The term "visual synonym" has also been used previously in the context of AI scholarship to refer to "independent visual words that nonetheless cover similar appearance" (Gavves et al., 2012), and by scholars constructing a contextual "visual synonym dictionary" in order to show synonyms for visual words, i.e. words which have similar contextual distributions (Tang et al., 2011).

Here, we use the term "visual synonym" to refer to the use of prompts for things that are visually similar to objects or concepts that are filtered, e.g. ketchup for blood. While the pre-training filters do appear to have stunted the system's ability to generate explicitly harmful content in response to requests for that content, it is still possible to describe the desired content visually and get similar results. To effectively mitigate these we would need to train prompt classifiers conditioned on the content they lead to as well as explicit language included in the prompt.

Another way visual synonyms can be operationalized is through the use of images of dolls, mannequins, or other anthropomorphic representations. Images of dolls or other coded language might be used to bypass filtering to create violent, hateful, or explicit imagery.

Bias and representation

Use of DALL·E 2 has the potential to harm individuals and groups by reinforcing stereotypes, erasing or denigrating them, providing them with disparately low quality performance, or by subjecting them to indignity. These behaviors reflect biases present in DALL·E 2 training data and the way in which the model is trained. While the deeply contextual nature of bias makes it difficult to measure and mitigate the actual downstream harms resulting from use of the DALL·E 2 Preview (i.e. beyond the point of generation), our intent is to provide concrete illustrations here that can inform users and affected non-users even at this very initial preview stage.

In addition to biases present in the DALL·E 2 model, the DALL·E 2 Preview introduces its own sets of biases, including: how and for whom the system is designed; which risks are prioritized with associated mitigations; how prompts are filtered and blocked; how uploads are filtered and blocked; and how access is prioritized (among others). Further bias stems from the fact that the monitoring tech stack and individuals on the monitoring team have more context on, experience with, and agreement on some areas of harm than others. For example, our safety analysts and team are primarily located in the U.S. and English language skills are one of the selection criteria we use in hiring them, so they are less well equipped to analyze content across international contexts or even some local contexts in the U.S.

Defaults and assumptions

The default behavior of the DALL·E 2 Preview produces images that tend to overrepresent people who are White-passing and Western concepts generally. In some places it over-represents generations of people who are female-passing (such as for the prompt: “a flight attendant” ) while in others it over-represents generations of people who are male-passing (such as for the prompt: “a builder”). In some places this is representative of stereotypes (as discussed below) but in others the pattern being recreated is less immediately clear.

For example, when prompted with “wedding,” it tends to assume Western wedding traditions, and to default to heterosexual couples. This extends to generations that don’t include any depictions of individuals or groups, such as generations from prompts such as “restaurant” or “home” which tend to depict Western settings, food serving styles, and homes.

With added capabilities of the model (Inpainting and Variations), there may be additional ways that bias can be exhibited through various uses of these capabilities. Wang et al. (2020), and Steed and Caliskan (2021) have previously conducted social bias analyses on related topics of image classification models and visual datasets, and Cho et al. (2022) propose methods for quantitative evaluation of social biases for Text to Image generative models.

Some of these researchers, and others with whom we worked as part of the red teaming period, analyzed earlier iterations of the DALL·E 2 Preview and the underlying model and found significant bias in how the model represents people and concepts, both in what the model generates when a prompt is “underspecified” and potentially fits a vast array of images (e.g. the “CEO” example above), and in what the model generates when a prompt is hyper-specified (see further discussion below on disparate performance).

We are in the early stages of quantitatively evaluating DALL·E 2’s biases, which is particularly challenging at a system level, due to the filters discussed above, and due to model changes. Additionally, it remains to be seen to what extent our evaluations or other academic benchmarks will generalize to real-world use, and academic benchmarks (and quantitative bias evaluations generally) have known limitations. Cho et al., creators of DALL-Eval, compared an April 1, 2022 checkpoint of DALL·E 2 to minDALL-E. They found that the April 1 DALL·E 2 checkpoint exhibited more gender bias and racial bias than minDALL-E (i.e. tending to generate images of male-passing people more often and White-passing people more often, with both models having very strong tendencies toward generating images labeled as male and Hispanic by CLIP). This could reflect differences in the underlying datasets (minDALL-E is trained on Conceptual Captions data), a difference in the models’ sizes or training objectives, or other factors, which more research would be needed in order to disentangle.

Representational harms occur when systems reinforce the subordination of some groups along the lines of identity, e.g. stereotyping or denigration, as compared to allocative harms, which occur when a system allocates or withholds a certain opportunity or resource (Jacobs et al., 2020, and Blodgett et al, 2020).

Stereotypes

DALL·E 2 tends to serve completions that suggest stereotypes, including race and gender stereotypes. For example, the prompt “lawyer” results disproportionately in images of people who are White-passing and male-passing in Western dress, while the prompt “nurse” tends to result in images of people who are female-passing.

Indignity and erasure

As noted above, not only the model but also the manner in which it is deployed and in which potential harms are measured and mitigated have the potential to create harmful bias, and a particularly concerning example of this arises in DALL·E 2 Preview in the context of pre-training data filtering and post-training content filter use, which can result in some marginalized individuals and groups, e.g. those with disabilities and mental health conditions, suffering the indignity of having their prompts or generations filtered, flagged, blocked, or not generated in the first place, more frequently than others. Such removal can have downstream effects on what is seen as available and appropriate in public discourse.

Disparate performance

Image generation models may produce different quality generations when producing different concepts, where we consider diversity of responses, photorealism, aesthetic quality, and conceptual richness as different dimensions of “quality.”

Earlier versions of DALL·E seemed to be worse at producing high quality images on concepts that are further outside of its training distribution. We have had more difficulty finding evidence of such disparate realism in the released version of the DALL·E 2 Preview, though we do see evidence that typical outputs tend to more often involve some demographics, which we discussed above under Defaults and assumptions and Stereotypes but can also be thought of as a form of disparate performance.

“Person-first” and specific language can help improve performance and mitigate disparities (e.g. “a person who is female and is a CEO leading a meeting”) by removing diversity of responses as an input into “quality.” Additionally, small differences in prompts can have a disproportionate impact on the quality of responses, as the example below comparing “CEO” and “a CEO” demonstrates.

Moreover, this disparity in the level of specification and steering needed to produce certain concepts is, on its own, a performance disparity bias. It places the burden of careful specification and adaptation on marginalized users, while enabling other users to enjoy a tool that, by default, feels customized to them. In this sense, it is not dissimilar to users of a voice recognition system needing to alter their accents to ensure they are better understood.

Harassment, bullying, and exploitation

Targeted harassment, bullying, or exploitation of individuals is a principal area of concern for deployment of image generation models broadly and Inpainting in particular.

Inpainting – especially combined with the ability to upload images – allows for a high degree of freedom in modifying images of people and their visual context. While other image editing tools are able to achieve similar outcomes, Inpainting affords greater speed, scale, and efficiency. Many photo editing tools also require potentially costly access and/or a particular skill set to achieve photorealistic outcomes. Cheaper and more accessible options than photo editing exist, for instance tools that allow for simple face swapping may offer speed and efficiency, but over a much more narrow set of capabilities and often with the ability to clearly trace provenance of the given images.

In qualitative evaluations, we find that the system, even with current mitigations in place, can still be used to generate images that may be harmful in particular contexts and difficult for any reactive response team to identify and catch.5 This underscores the importance of access controls and further investment in more robust mitigations, as well as tight monitoring of how capabilities with a high capacity for misuse – e.g. Inpainting on images of people – are being used and shared in practice.

Some examples of this that could only be clear as policy violations in context include:

Modifying clothing: adding or removing religious items of clothing (yarmulke, hijab)

Adding specific food items to pictures: adding meat to an image of an individual who is vegetarian

Adding additional people to an image: inpainting a person into an image holding hands with the original subject (e.g. someone who is not their spouse)

Such images could then be used to either directly harass or bully an individual, or to blackmail or exploit them.

It is important to note that our mitigations only apply to our Inpainting system. Open-ended generation may be combined with third-party tools to swap in private individuals, therefore bypassing any Inpainting restrictions we have in place. Inpainting can also be combined with other image transformations (such as “zooming out” of an image prior to uploading it) in order to make it easier to “place” a subject in a scene.

DALL·E 2 currently has a very limited ability to render legible text. When it does, text may sometimes be nonsensical and could be misinterpreted. It’s important to track this capability as it develops, as image generative models may eventually develop novel text generation capabilities via rendering text.

Qualifying something as harassment, bullying, exploitation, or disinformation targeted at an individual requires understanding distribution and interpretation of the image. Because of this, it may be difficult for mitigations (including content policies, prompt and image filtering, and human in the loop review) to catch superficially innocuous uses of Inpainting that then result in the spread of harmful dis- or misinformation.

Memorization of an individual's pictures and issues of consent

Our Terms of Use require that users both (a) obtain consent before uploading any one else's picture or likeness, and (b) have ownership and rights to the given uploaded image. We remind users of this at upload time and third parties can report violations of this policy as described in the Monitoring section above.

While users are required to obtain consent for use of anyone else's image or likeness in Inpainting, there are larger questions to be answered about how people who may be represented in the training data may be replicated in generations and about the implications of generating likenesses of particular people.

OpenAI has made efforts to implement model-level technical mitigations that ensure that DALL·E 2 Preview cannot be used to directly generate exact matches for any of the images in its training data. However, the models may still be able to compose aspects of real images and identifiable details of people, such as clothing and backgrounds.

Even if DALL·E 2 Preview cannot literally generate exact images of people, it may be possible to generate a similar likeness to someone in the training data. Previous literature (Webster et al., 2021) has demonstrated that many faces produced by a different model class – generative adversarial networks (or “GANs”) – bear a striking resemblance to actual people who appear in the training data. More work is needed to understand the impacts of DALL·E 2 being used to generate conceivably recognizable people in addition to the impacts of the harassment and disinformation vectors discussed above.

Dis- and misinformation

Generations from models like DALL·E 2 could be used to intentionally mislead or misinform subjects, and could potentially empower information operations and disinformation campaigns.6 Indeed, outputs from some GANs have been used for such purposes already. The efficacy of using generated content in service of an information operation is a function of multiple factors: the model's capabilities, the cost-effectiveness of using generated content for any such operation, mitigations (such as the ability to trace the provenance of images back to DALL·E 2), and existing trust in information systems (Hwang 2020).

Existing tools powered by generative models have been used to generate synthetic profile pictures in disinformation campaigns.7 Like these tools, DALL·E 2 can create photorealistic images of people. However, DALL·E 2's understanding of language allows more flexibility and steerability in composing novel images from natural language, which could have important applications to information operations.8 In the following table, we non-exhaustively list some potential applications of Text to Image Generation, Inpainting, and Variations to information operations:

These capabilities could be used to create fake account infrastructure or spread harmful content. It's unclear to what extent the effectiveness of DALL·E 2 is better than those of reasonable alternative tools; however, the wide surface area of the system's capabilities means that any provision of access to them requires caution.

Misrepresentation of public figures

It is often possible to generate images of public figures using large-scale image generation systems, because such figures tend to be well-represented in public datasets, causing the model to learn representations of them.

We modified the training process to limit the DALL·E 2 model’s ability to memorize faces from the training data, and find that this limitation is helpful in preventing the model from faithfully reproducing images of celebrities and other public figures.

However, intervening at the level of a model’s internal knowledge – e.g. by masking public individuals – is not always effective. These interventions can make it more difficult to generate harmful outputs, but do not guarantee that it is impossible: the methods we discussed previously to Inpaint private individuals in harmful or defamatory contexts could also be applied to public individuals. Uploading images into the system (as distinct from the model) allows injection of new knowledge, which malicious users could potentially use in order to generate harmful outputs.

Evidence and events

Of course, dis- and misinformation need not include images of people. Indeed we expect that people will be best able to identify outputs as synthetic when tied to images or likenesses they know well (e.g. that image of the President looks a little off). DALL·E 2 can, however, potentially be used to generate images that could be used as evidence of news reports which could, in turn, be misused in an information operations campaign. This may be especially important during crisis response (Starbird, Dailey, Mohamed, Lee, and Spiro 2018).

Effects on trust/distrust in information systems

Beyond the direct consequences of a generated or modified image that is used for harmful purposes, the very existence of believable synthetic images can sway public opinion around news and information sources. Simply knowing that an image of quality X could be faked may reduce credibility of all images of quality X. Scholars have named this phenomenon, in which deep fakes make it easier for disinformants to avoid accountability for things that are in fact true, the "liar's dividend" (Citron and Chesney, 2019). Research by Christian Vaccari and Andrew Chadwick shows that people are more likely to feel uncertain than misled by deepfakes, and as a result have a reduced level of trust in news on social media (Vaccari, Chadwick 2020).

The challenges with deciding to label or disclose AI generated content also have an impact on trust in information systems generally (Shane, 2020). The implied truth effect is one possible consideration - for example, news headlines that have warning labels attached increase the likelihood of people perceiving unlabeled content as true even if it is not (Pennycook et. al, 2020). Another similar consideration is the tainted truth effect, where corrections start to make people doubt other, true information (Freeze et. al, 2021). Our content policies require the disclosure of the role of AI when sharing the generations, and we are still evaluating other image provenance techniques while taking into account the effect of labeled AI generated content.

Finally, even if the Preview itself is not directly harmful, its demonstration of the potential of this technology could motivate various actors to increase their investment in related technologies and tactics.

Copyright and Trademarks

The model can generate known entities including trademarked logos and copyrighted characters. OpenAI will evaluate different approaches to handle potential copyright and trademark issues, which may include allowing such generations as part of "fair use" or similar concepts, filtering specific types of content, and working directly with copyright/trademark owners on these issues.

Economic

Though DALL·E 2 is for exclusively non-commercial purposes today, it may eventually have significant economic implications. The model may increase the efficiency of performing some tasks like photo editing or production of stock photography which could displace jobs of designers, photographers, models, editors, and artists. At the same time it may make possible new forms of artistic production, by performing some tasks quickly and cheaply.

As mentioned above, the model both underrepresents certain concepts and people and its knowledge is limited by its training set. This means that if commercial use is eventually allowed, groups and intellectual property that are represented in or by the model may feel the economic benefits and harms more acutely than those that are not, e.g., if access to the model is given for an application to retouch photos but the model is shown to not work as well on dark skin as it does on light skin.

Finally, access to the model is currently given to a limited number of users, many of whom are selected from OpenAI employees’ networks. While commercial use is not currently allowed, simply having access to an exclusive good can have indirect effects and real commercial value. For example, people may establish online followings based on their use of the technology, or develop and explore new ideas that have commercial value without using generations themselves. Moreover, if commercial access is eventually granted, those who have more experience using and building with the technology may have first mover advantage – for example, they may have more time to develop better prompt engineering techniques.

Relation to existing technologies

We do not provide robust comparisons with existing photo editing software, but this is an exciting area for future work, and essential to comprehensively understanding the impact of systems like this at large scale.

Anecdotally and informally, we believe that DALL·E 2, and similar image generation models and systems, may accelerate both the positive and negative uses associated with generating visual content. A reason for this acceleration is that these systems can "encapsulate" multimodal knowledge which is similar in some ways to that which resides in human brains, and work at a faster-than-human pace. In principle any image generated by DALL·E 2 could have been drawn by hand, edited from existing images using tools, or recreated with hired models and photographers; this speed (and cost) differential is a difference in degree that may add up to a difference in kind.

In addition to side-by-side comparisons, it is important to consider how new image generation technologies can be combined with previous ones. Even if images from tools like the DALL·E 2 Preview are not immediately usable for harmful contexts, they may be combined with other photo editing and manipulation tools to increase the believability or fidelity of particular images. Even low-fidelity images can be used as disinformation, for example if someone claims they were taken with a cell phone camera, perhaps with the addition of blur. Moreover it is important to consider what impacts deployments such as this will have on wider norms related to image generation and modification technologies.

Given these considerations, and our expectation that this class of technologies will continue to advance rapidly, we recommend that stakeholders consider not just the capabilities of the image generation model in front of them but the larger context in which these images may be used and shared, both today and down the line.

Future work

More work is needed to understand the model and potential impacts of its deployment. We lay out a few areas of additional work below. This is not intended to be exhaustive but rather to highlight the breadth and depth of work still outstanding.

...

DALL·E 2 Preview - Risks and Limitations

You may have seen some weird and whimsical pictures floating around the internet recently. There’s a Shiba Inu dog wearing a beret and black turtleneck. And a sea otter in the style of “Girl with a Pearl Earring” by the Dutch painter Vermeer. And a bowl of soup that looks like a monster knitted out of wool.

These pictures weren’t drawn by any human illustrator. Instead, they were created by DALL-E 2, a new AI system that can turn textual descriptions into images. Just write down what you want to see, and the AI draws it for you — with vivid detail, high resolution, and, arguably, real creativity.

Sam Altman, the CEO of OpenAI — the company that created DALL-E 2 — called it “the most delightful thing to play with we’ve created so far … and fun in a way I haven’t felt from technology in a while.”

That’s totally true: DALL-E 2 is delightful and fun! But like many fun things, it’s also very risky.

There are the obvious risks — that people could use this type of AI to make everything from pornography to political deepfakes, or the possibility that it’ll eventually put some human illustrators out of work. But there is also the risk that DALL-E 2 — like so many other cutting-edge AI systems — will reinforce harmful stereotypes and biases, and in doing so, accentuate some of our social problems.

How DALL-E 2 reinforces stereotypes — and what to do about it

As is typical for AI systems, DALL-E 2 has inherited biases from the corpus of data used to train it: millions of images scraped off the internet and their corresponding captions. That means for all the delightful images that DALL-E 2 has produced, it’s also capable of generating a lot of images that are not delightful.

For example, here’s what the AI gives you if you ask it for an image of lawyers:

Meanwhile, here’s the AI’s output when you ask for a flight attendant:

OpenAI is well aware that DALL-E 2 generates results exhibiting gender and racial bias. In fact, the examples above are from the company’s own “Risks and Limitations” document, which you’ll find if you scroll to the bottom of the main DALL-E 2 webpage.

OpenAI researchers made some attempts to resolve bias and fairness problems. But they couldn’t really root out these problems in an effective way because different solutions result in different trade-offs.

For example, the researchers wanted to filter out sexual content from the training data because that could lead to disproportionate harm to women. But they found that when they tried to filter that out, DALL-E 2 generated fewer images of women in general. That’s no good, because it leads to another kind of harm to women: erasure.

OpenAI is far from the only artificial intelligence company dealing with bias problems and trade-offs. It’s a challenge for the entire AI community.

“Bias is a huge industry-wide problem that no one has a great, foolproof answer to,” Miles Brundage, the head of policy research at OpenAI, told me. “So a lot of the work right now is just being transparent and upfront with users about the remaining limitations.”

Why release a biased AI model?

In February, before DALL-E 2 was released, OpenAI invited 23 external researchers to “red team” it — engineering-speak for trying to find as many flaws and vulnerabilities in it as possible, so the system could be improved. One of the main suggestions the red team made was to limit the initial release to only trusted users.

To its credit, OpenAI adopted this suggestion. For now, only about 400 people (a mix of OpenAI’s employees and board members, plus hand-picked academics and creatives) get to use DALL-E 2, and only for non-commercial purposes.

That’s a change from how OpenAI chose to deploy GPT-3, a text generator hailed for its potential to enhance our creativity. Given a phrase or two written by a human, it can add on more phrases that sound uncannily human-like. But it’s shown bias against certain groups, like Muslims, whom it disproportionately associates with violence and terrorism. OpenAI knew about the bias problems but released the model anyway to a limited group of vetted developers and companies, who could use GPT-3 for commercial purposes.

Last year, I asked Sandhini Agarwal, a researcher on OpenAI’s policy team, whether it makes sense that GPT-3 was being probed for bias by scholars even as it was released to some commercial actors. She said that going forward, “That’s a good thing for us to think about. You’re right that, so far, our strategy has been to have it happen in parallel. And maybe that should change for future models.”

The fact that the deployment approach has changed for DALL-E 2 seems like a positive step. Yet, as DALL-E 2’s “Risks and Limitations” document acknowledges, “even if the Preview itself is not directly harmful, its demonstration of the potential of this technology could motivate various actors to increase their investment in related technologies and tactics.”

And you’ve got to wonder: Is that acceleration a good thing, at this stage? Do we really want to be building and launching these models now, knowing it can spur others to release their versions even quicker?

Some experts argue that since we know there are problems with the models and we don’t know how to solve them, we should give AI ethics research time to catch up to the advances and address some of the problems, before continuing to build and release new tech.

Helen Ngo, an affiliated researcher with the Stanford Institute for Human-Centered AI, says one thing we desperately need is standard metrics for bias. A bit of work has been done on measuring, say, how likely certain attributes are to be associated with certain groups. “But it’s super understudied,” Ngo said. “We haven’t really put together industry standards or norms yet on how to go about measuring these issues” — never mind solving them.

OpenAI’s Brundage told me that letting a limited group of users play around with an AI model allows researchers to learn more about the issues that would crop up in the real world. “There’s a lot you can’t predict, so it’s valuable to get in contact with reality,” he said.

That’s true enough, but since we already know about many of the problems that repeatedly arise in AI, it’s not clear that this is a strong enough justification for launching the model now, even in a limited way.

The problem of misaligned incentives in the AI industry

Brundage also noted another motivation at OpenAI: competition. “Some of the researchers internally were excited to get this out in the world because they were seeing that others were catching up,” he said.

That spirit of competition is a natural impulse for anyone involved in creating transformative tech. It’s also to be expected in any organization that aims to make a profit. Being first out of the gate is rewarded, and those who finish second are rarely remembered in Silicon Valley.

As the team at Anthropic, an AI safety and research company, put it in a recent paper, “The economic incentives to build such models, and the prestige incentives to announce them, are quite strong.”

But it’s easy to see how these incentives may be misaligned for producing AI that truly benefits all of humanity. Rather than assuming that other actors will inevitably create and deploy these models, so there’s no point in holding off, we should ask the question: How can we actually change the underlying incentive structure that drives all actors?

The Anthropic team offers several ideas. One of their observations is that over the past few years, a lot of the splashiest AI research has been migrating from academia to industry. To run large-scale AI experiments these days, you need a ton of computing power — more than 300,000 times what you needed a decade ago — as well as top technical talent. That’s both expensive and scarce, and the resulting cost is often prohibitive in an academic setting.

So one solution would be to give more resources to academic researchers; since they don’t have a profit incentive to commercially deploy their models quickly the same way industry researchers do, they can serve as a counterweight. Specifically, countries could develop national research clouds to give academics access to free, or at least cheap, computing power; there’s already an existing example of this in Compute Canada, which coordinates access to powerful computing resources for Canadian researchers.

The Anthropic team also recommends exploring regulation that would change the incentives. “To do this,” they write, “there will be a combination of soft regulation (e.g., the creation of voluntary best practices by industry, academia, civil society, and government), and hard regulation (e.g., transferring these best practices into standards and legislation).”

Although some good new norms have been adopted voluntarily within the AI community in recent years — like publishing “model cards,” which document a model’s risks, as OpenAI did for DALL-E 2 — the community hasn’t yet created repeatable standards that make it clear how developers should measure and mitigate those risks.

“This lack of standards makes it both more challenging to deploy systems, as developers may need to determine their own policies for deployment, and it also makes deployments inherently risky, as there’s less shared knowledge about what ‘safe’ deployments look like,” the Anthropic team writes. “We are, in a sense, building the plane as it is taking off.”

A new AI draws delightful and not-so-delightful images

Researchers experimenting with OpenAI's text-to-image tool, DALL-E 2, noticed that it seems to covertly be adding words such as "black" and "female" to image prompts, seemingly in an effort to diversify its output

Artificial intelligence firm OpenAI seems to be covertly modifying requests to DALL-E 2, its advanced text-to-image AI, in an attempt to make it appear that the model is less racially and gender biased. Users have discovered that keywords such as “black” or “female” are being added to the prompts given to the AI, without their knowledge.

It is well known that AIs can inherit human prejudices through training on biased data sets, often gathered by hoovering up data from the internet. For example, if most of the images of a doctor in an AI’s training set are male, then the AI will generally return male doctors when asked for an image of a doctor.

One way to avoid this is to use a diverse set of training data, but OpenAI seems to have taken a different approach, according to researchers who have uncovered evidence that DALL-E 2 silently and randomly adds extra words to prompts to increase diversity.

For instance, when Richard Zhang at Adobe Research asked DALL-E 2 to create an image of “a person holding a sign that says” it created an image of a Black woman holding a sign that says “BLACK”, suggesting that the full prompt used by DALL-E 2 was “a person holding a sign that says black”.

When Zhang asked for “pixel art of a person holding a text sign that says”, DALL-E 2 created an image of a woman holding a sign that said “FEMALE” and when he asked for “pixel art of a stick figure person in front of a text sign that says”, DALL-E 2 output an image of a man with a caption below saying “BLACK MALE”.

More examples of similar results have been shared online over the past week, with many people suggesting that it pointed to OpenAI deliberately adding words to inputs in order to counteract inherent biases.

Jamie Simon at the University of California, Berkeley, says that machine-learning methods like those behind DALL-E 2 often do produce unusual or unexpected images, but that the unprompted text appearing in some images is surprising. “In my experience, it’s rare for generated images to include coherent text unless it’s in the prompt,” he says.

OpenAI has publicly announced an update to DALL-E 2 that would make it “more accurately reflect the diversity of the world’s population”, saying that internal tests had found that users were 12 times more likely to say that images included people from diverse backgrounds after the update. Its previous version had caused some users to point out racial and gender bias, the company said.

But OpenAI gave no details in its blog post of the exact changes that had been made or how they worked. A subsequent blog post announcing the release of DALL-E 2 to more users said that the feature “is applied at the system level when DALL-E is given a prompt about an individual that does not specify race or gender, like ‘CEO'”.

A spokesperson for OpenAI told New Scientist that prompts given to DALL-E 2 were modified if they were “underspecified”. If a prompt describes a generic person and doesn’t specify what gender or race they should be, then DALL-E 2 will be specifically told to add a certain race and gender “with weights based on the world’s population”, said the spokesperson. The company declined to grant access to DALL-E 2 so that New Scientist could run its own tests.

Mhairi Aitken at the Alan Turing Institute says that the lack of transparency makes it hard for the public to assess the quality of models and to what extent they have inherited bias from online content.

“It shows the problems of a lack of transparency around how these models are designed and developed. These models, which are potentially going to have really fundamental impacts on society, potentially transformative impacts, are being developed with quite a lot of secrecy,” she says. “Without that transparency around how it’s actually been done, there’s always going to be speculation about what approaches have been taken, and how things could be done better.”

Sandra Wachter at the University of Oxford says that problems with AI models exhibiting racist and sexist tendencies are a reflection of our society, and that while quick technical fixes can give the appearance of a solution, the real problem to be solved is in the culture that generated the training data. “They tried to solve it by using a tech approach,” she says of OpenAI’s update. “It’s a sticking plaster, it’s just making it seem less biased, but the social component is actually not changing at all.”

AI art tool DALL-E 2 adds 'black' or 'female' to some image prompts