Associated Incidents

Microsoft’s Tay is an Example of Bad Design
or Why Interaction Design Matters, and so does QA-ing.
caroline sinders Blocked Unblock Follow Following Mar 24, 2016
Yesterday Microsoft launched a teen girl AI on Twitter named “Tay.” I work with chat bots and natural language processing as a researcher for my day job and I’m pretty into teen culture (sometimes I write for Rookie Mag). But even further more, I love bots. Bots are the best, and Olivia Tators is a national treasure that we needed but didn’t deserve.
But because I work with bots, primarily testing and designing software to let people set up bots and parse language, and I follow bot creators/advocates such as Allison Parrish, Darius Kazemi and Thrice Dotted, I was excited and then horrifically disappointed with Tay.
According to Business Insider, “The aim was to “experiment with and conduct research on conversational understanding,” with Tay able to learn from “her” conversations and get progressively ‘smarter.’ ” The Telegraph sums it up the most elegantly though, “ Tay also asks her followers to ‘f***’ her, and calls them ‘daddy’. This is because her responses are learned by the conversations she has with real humans online — and real humans like to say weird stuff online and enjoy hijacking corporate attempts at PR…”
Here’s the thing about machine learning, and bots in general, and hell, even AI. They, those capabilities, are not very smart, and must be trained by a corpus of data. When that data is fed into a series of different kinds of machine learning algorithms, let’s go with one specifically designed for chat, that algorithm or chat set up must be trained. The corpus of data, when it comes to chat robots, can be things like questions and answers, with those questions and answers directly mapped to each other. “What is your name” can be asked a thousand different ways, but have one or two applicable answers. Training the system to match those two concrete answers to a variety of questions is done in Q&A, and reinforced through launching the system, and those answers will be mapped to new kinds of questions that are similar to the questions that it’s been trained to answer. And that’s what Microsoft seemed to be doing. They had a general set of knowledge trees that ‘read’ language, like different words, and mapped them to general answers. But their intention was to get a bunch of help in making Tay sound more ‘like the internet.’
However, Microsoft didn’t ‘black list’ certain words- meaning creating much more ‘hard coded’ responses to certain words, like domestic violence, gamergate, or rape.
They did, however, do that with Eric Garner. So some words, some key words, were specifically trained for nuanced responses, but a lot where not.
But what does this mean when it comes to training? So training a bot is about frequency and kinds of questions asked. If a large amount of questions asked are more racist in nature, it’s training the bot to be more racist, especially if there haven’t been specific parameters set to counter that racism.
People like to kick the tires of machines and AI, and see where the fall off is. People like to find holes and exploit them, not because the internet is incredibly horrible (even if at times it seems like a cesspool); but because it’s human nature to try to see what the extremes are of a device. People run into walls in video games or find glitches because it’s fun to see where things break. This is necessary because creators and engineers need to understand ways that bots can act that were unintended for, and where the systems for creating, updating and maintaining them can fall apart.
But if your bot is racist, and can be taught to be racist, that’s a design flaw. That’s bad design, and that’s on you. Making a thing that talks to people, and talks to people only on Twitter, which has a whole history of harassment, especially against women, is a large oversight on Microsoft’s part. These problems- this accidental racism, or being taught to harass people like Zoe Quinn- these are not bugs; they are features because they are in your public-facing and user-interacting software.
Language is fucking nuanced, and so is conversation. If we are going to make things people use, people touch, and people actually talk to, then we need to, as bot creators and AI enthusiasts, talk about codes of conduct and how AIs should respond to racism, especially if companies are rolling out these products, and especially if they are doin’ it for funsies. Conversations run the gamut of emotions, from the silly and mundane to harassing and abusive. To assume that your users will only engage in polite conversation is a fucking massive and gross oversight, especially on Twitter. But mix in the ability through machine learning where the bot is being trained and retrained? Then I have massive ethical questions about WTF design choices you are making. Microsoft, you owe it to your users to think about how your machine learning mechanisms resp