Incident 12: Common Biases of Vector Embeddings
Description: Researchers from Boston University and Microsoft Research, New England demonstrated gender bias in the most common techniques used to embed words for natural language processing (NLP).
Entities
View all entitiesAlleged: Microsoft Research , Boston University and Google developed an AI system deployed by Microsoft Research and Boston University, which harmed Women and Minority Groups.
Incident Stats
Incident ID
12
Report Count
1
Incident Date
2016-07-21
Editors
Sean McGregor
CSETv0 Taxonomy Classifications
Taxonomy DetailsFull Description
A plain-language description of the incident in one paragraph or less.
The most common techniques used to embed words for natural language processing (NLP) show gender bias, according to researchers from Boston University and Microsoft Research, New England. The primary embedding studied was a 300-dimensional word2vec embedding of words from a corpus of Google News texts, chosen because it is open-source and popular in NLP applications. After demonstrating gender bias in the embedding, the researchers show that several geometric features are associated with that bias which can be used to define the bias subspace. This finding allows them to create several debiasing algorithms.
Short Description
A one-sentence description of the incident.