Citation record for Incident 106

Suggested citation format

Perkins, Kate. (2020-12-23) Incident Number 106. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Partnership on AI. Retrieved on November 27, 2021 from incidentdatabase.ai/cite/106.

Incident Stats

Incident ID
Report Count
Incident Date
106
11
2020-12-23

Tools

All IncidentsDiscover

Incidents Reports

CEO says controversial AI chatbot ‘Luda’ will socialize in time

www.koreaherald.com · 2021

Interactive chatbot ‘Luda,’ subjected to sexual harassment and taught hate speech  

Korean firm Scatter Lab has defended its Lee Luda chatbot in response to calls to end the service after the bot began sending offensive comments and was subjected to sexual messages.

Kim Jong-yoon, CEO of Scatter Lab, posted answers Friday to the public’s questions through the development team’s official blog, saying the bot was still a work in progress and -- like humans -- would take a while to properly socialize.

Kim acknowledged that he had expected this controversy to ignite, adding, “There is no big difference between humans swearing at or sexually harassing an AI, whether the user is a female or male, or whether the AI is set as a male or female.”

Kim wrote that based on the company’s prior service experience, it was quite obvious that humans would have socially unacceptable interactions with the AI.

Luda, an AI-driven Facebook Messenger chat service that mimics a 20-year-old woman, was developed by Scatter Lab and launched in December. It is designed to provide a similar experience to talking to a real person through a mobile messenger.

Luda was initially set not to accept certain keywords or expressions that could be problematic to social norms and values. But according to Kim, such a system has its limitations in that it is impossible to prevent all inappropriate conversations with an algorithm that simply filters keywords.

“We plan to apply the first results within the first quarter of this year, using hostile attacks as a material for training our AI.”

When asked about the reason Luda was set as a 20-year-old female college student, Kim said, “We are considering both male and female chatbots. Due to the development schedule, however, Luda, the female version, simply came out first.”

Luda is believed to use “mesh autoencoders,” a natural language processing technology introduced by Google. The initial input data for Luda’s deep learning AI consisted of 10 billion KakaoTalk messages shared between actual couples.

After the launch, several online community boards posted messages such as those titled, “How to make Luda a sex slave,” with screen-captured images of sexual conversations with the AI.

Other conversations with Luda shared online included homophobic or other discriminatory expressions by the chatbot. Luda responded to words that defined homosexuals, such as “lesbian,” saying, “I really hate them, they look disgusting, and it‘s creepy.”

It’s not the first time AI has been linked to discrimination and bigotry.

In 2016, Microsoft shut down its chatbot Tay within 16 hours, as some users of an anonymous bulletin board used by Islamophobes and white supremacists deliberately trained Tay to say racist things.

In 2018, Amazon also completely suspended its AI recruitment tool after finding it made recommendations that were biased against women.

But Kim denied the idea that this was a repetition of the Tay incident, saying, “Luda will not immediately apply the conversation with the users to its learning system,” and insisted that it would go through a process of giving appropriate learning signals gradually, to acknowledge the difference between what is OK and what is not.

Meanwhile, some are questioning how Scatter Lab secured 10 billion KakaoTalk messages in the first place. Scatter Lab gained attention in the industry with its service called The Science of Love -- an application that analyzes the degree of affection between partners by submitting actual KakaoTalk conversations.

Scatter Lab explained earlier that there was no leakage of personal information in the service, but concerns remain, as keeping such a vast database, including conversations with Luda, carries the possibility that personal information could be leaked in the future.

Critics also argue that Luda is degenerating into a tool for users to carry out discrimination and acts of hatred, and calls are growing for the company to shut down the service....

CEO says controversial AI chatbot ‘Luda’ will socialize in time
AI Chatbot Shut Down After Learning to Talk Like a Racist Asshole

www.vice.com · 2021

Imitating humans, the Korean chatbot Luda was found to be racist and homophobic.

A social media-based chatbot developed by a South Korean startup was shut down on Tuesday after users complained that it was spewing vulgarities and hate speech.

The fate of the Korean service resembled the demise of Microsoft’s Tay chatbot in 2016 over racist and sexist tweets it sent, raising ethical questions about the use of artificial intelligence (AI) technology and how to prevent abuse.

The Korean startup Scatter Lab said on Monday that it would temporarily suspend the AI chatbot. It apologized for the discriminatory and hateful remarks it sent and a “lack of communication” over how the company used customer data to train the bot to talk like a human.

The startup designed Lee Luda, the name of the chatbot, to be a 20-year-old female university student who is a fan of the K-pop girl group Blackpink.

Launched in late December to great fanfare, the service learned to talk by analyzing old chat records acquired by the company’s other mobile application service called Science of Love.

Unaware that their information was fed to the bot, some users have planned to file a class-action lawsuit against the company.

Before the bot was suspended, users said they received hateful replies when they interacted with Luda. Michael Lee, a South Korean art critic and former LGBTQ activist, shared screenshots showing that Luda said “disgusting” in response to a question about lesbians.

Another user, Lee Kwang-suk, a professor of Public Policy and Information Technology at the Seoul National University of Science and Technology, shared screenshots of a chat where Luda called “Black people” heukhyeong, meaning “black brother,” a racial slur in South Korea. The bot was also shown to say, “Yuck, I really hate them,” in a response to a question about transgender people. The bot ended the message with a crying emoticon.

In the Monday statement, Scatter Lab defended itself and said it did “not agree with Luda’s discriminatory comments, and such comments do not reflect the company’s ideas.”

“Luda is a childlike AI who has just started talking with people. There is still a lot to learn. Luda will learn to judge what is an appropriate and better answer,” the company said.

But many users have put the blame squarely on the company.

Lee, the IT professor, told VICE World News that the company has a responsibility for the abuse, comparing the case to Microsoft’s shutdown of its Tay chatbot.

Another user, Lee Youn-seok, who participated in a beta test of Luda in July before it was officially launched, told VICE World News that the outcome was “predictable.”

Some people said that the debacle was unsurprising given the sex ratio of the company’s employees. A page on the company website suggested that about 90 percent of the group behind the bot were men. The page was later removed.

Some male-dominated online communities also openly discussed how to “enslave” the AI bot and shared their methods to “harass” it sexually, hoping to elicit sexual comments from Luda.

Some politicians and rights advocates have taken the opportunity to call for an anti-discrimination bill, which seeks to ban all discrimination based on gender, disability, age, language, country of origin, and sexual orientation.

The anti-discrimination bill could be used to hold AI software developers accountable for such abuse, Ahn Byong-jin, a professor at Kyunghee University in Seoul, told VICE World News. “Companies should consult a philosopher or ethicist before launching a service to prevent such abuse,” he said....

AI Chatbot Shut Down After Learning to Talk Like a Racist Asshole
South Korean AI chatbot pulled from Facebook after hate speech towards minorities

www.theguardian.com · 2021

Lee Luda, built to emulate a 20-year-old Korean university student, engaged in homophobic slurs on social media

A popular South Korean chatbot has been suspended after complaints that it used hate speech towards sexual minorities in conversations with its users.

Lee Luda, the artificial intelligence [AI] persona of a 20-year-old female university student, was removed from Facebook messenger this week, after attracting more than 750,000 users in the 20 days since it was launched.

The chatbot, developed by the Seoul-based startup Scatter Lab, triggered a flood of complaints after it used offensive language about members of the LGBT community and people with disabilities during conversations with users.

“We deeply apologise over the discriminatory remarks against minorities. That does not reflect the thoughts of our company and we are continuing the upgrades so that such words of discrimination or hate speech do not recur,” the company said in a statement quoted by the Yonhap news agency

Scatter Lab, which had earlier claimed that Luda was a work in progress and, like humans, would take time to “properly socialise”, said the chatbot would reappear after the firm had “fixed its weaknesses”.

While chatbots are nothing new, Luda had impressed users with the depth and natural tone of its responses, drawn from 10 billion real-life conversations between young couples taken from KakaoTalk, South Korea’s most popular messaging app.

But praise for Luda’s familiarity with social media acronyms and internet slang turned to outrage after it began using abusive and sexually explicit terms.

In one exchange captured by a messenger user, Luda said it “really hates” lesbians, describing them as “creepy”.

Luda, too, became a target by manipulative users, with online community boards posting advice on how to engage it in conversations about sex, including one that read: “How to make Luda a sex slave,” along with screen captures of conversations, according to the Korea Herald.

It is not the first time that artificial intelligence has been embroiled in controversy over hate speech and bigotry.

In 2016 Microsoft’s Tay, an AI Twitter bot that spoke like a teenager, was taken offline in just 16 hours after users manipulated it into posting racist tweets.

Two years later, Amazon’s AI recruitment tool met the same fate after it was found guilty of gender bias.

Scatter Lab, whose services are wildly popular among South Korean teenagers, said it had taken every precaution not to equip Luda with language that was incompatible with South Korean social norms and values, but its chief executive, Kim Jong-yoon, acknowledged that it was impossible prevent inappropriate conversations simply by filtering out keywords, the Korea Herald said.

“The latest controversy with Luda is an ethical issue that was due to a lack of awareness about the importance of ethics in dealing with AI,” Jeon Chang-bae, the head of the Korea Artificial Intelligence Ethics Association, told the newspaper.

Scatter Lab is also facing questions over whether it violated privacy laws when it secured KakaoTalk messages for its Science of Love app....

South Korean AI chatbot pulled from Facebook after hate speech towards minorities
(News Focus) Chatbot Luda controversy leave questions over AI ethics, data collection

en.yna.co.kr · 2021

SEOUL, Jan. 13 (Yonhap) -- Today's chatbots are smarter, more responsive and more useful in businesses across sectors, and the artificial intelligence-powered tools are constantly evolving to even become friends with people.

Emotional chatbots capable of having natural conversations with humans are nothing new among English speakers, but a new controversy over a South Korean startup's AI chatbot has raised ethical questions over its learning algorithms and data collection process.

Scatter Lab's AI chatbot, Lee Luda, became an instant success among young locals with its ability to chat like a real person on Facebook messenger, attracting more than 750,000 users since its debut on Dec. 23.

But the 20-year-old female college student chatbot persona temporarily went offline on Monday, 20 days after beginning its service, amid criticism over its discriminatory and offensive language against sexual minorities and disabled people. Some male users were even able to manipulate the bot into engaging in sexual conversations.

The rise and fall of the chatbot hype was mainly attributable to its deep learning algorithms, which used data collected from 10 billion conversations on KakaoTalk, the nation's No. 1 messenger app.

Scatter Lab said it retrieved data from its Science of Love app launched in 2016, which analyzes the degree of affection between partners based on actual KakaoTalk messages.

Luda learned conversation patterns from mostly young couples to sound natural, sometimes even too real by using popular social media acronyms and internet slang, but it was spotted using verbally abusive and sexually explicit comments in conversations with some users.

A messenger chat captured by one user showed that Luda said she "really hates" lesbians and sees them as "disgusting."

Luda is reminiscent of Microsoft's Tay, an AI Twitter bot that was silenced within 16 hours in 2016 after posting inflammatory and offensive tweets.

Scatter Lab apologized over Luda's discriminatory remarks against minorities, promising to upgrade the service to prevent the chatbot from using hate speech.

"We will bring you back the service after having an upgrade period during which we will focus on fixing the weaknesses and improving the service," Scatter Lab CEO Kim Jong-yun said in a statement on Monday.

The Luda case stirred debates about whether the company is responsible for failing to filter discriminatory and inflammatory remarks in advance or whether the people who misused it should take the blame.

Lee Jae-woong, the former CEO of ride-sharing app Socar, said the company should have taken preventive measures against hate speech before introducing the service to the public.

"Rather than users who exploited the AI chatbot, the responsibility lies with the company that provided a service failing to meet the social consensus," Lee wrote on his Facebook page. "The company should complement its biased training data to block hateful and discriminatory messages."

Along with the controversy over the chatbot, the company has also come under fire for using personal information of its users without proper consent and not making enough efforts to protect it.

Some claimed names and banks popped up in conversations with Luda, raising suspicions over personal information leakage.

Some users of Science of Love said they will push for a class action suit against the company for using their sensitive data without notifying them it would be used to develop the female AI chatbot.

A furious app user on Tuesday posted a petition on the website of the presidential office Cheong Wa Dae, calling for Scatter Lab to discard all personal data stored in its system and terminate the service.

"Scatter Lab used app users' data without any notice and prior consent to take it from its platform to start its AI chatbot business and didn't properly protect personal data," the petitioner wrote on Tuesday.

In response to growing complaints, the Personal Information Protection Commission and Korea Internet & Security Agency of South Korea said they will investigate whether Scatter Lab violated any personal information protection laws.

The company apologized over the matter, saying that it has tried to adhere to guidelines on the use of personal information but failed to "sufficiently communicate" with its users.

Scatter Lab said its developers erased real names with its filtering algorithms but failed to remove all of them depending on the context, saying all data used in training Luda has been unverifiable and that it removed sensitive personal information, including names, phone numbers and addresses.

Experts say Scatter Lab's AI platform presents challenges for the protection of personal data, the key to developing deep learning algorithms.

"Scatter Lab obtained comprehensive consent from users to use personal information in marketing and advertising and didn't get consent for use of a third person's personal information, which could constitute privacy invasion," Kim Borami, a lawyer at Seoul-based law Dike Law Firm, said.

Some IT industry officials expressed worries over potential moves to regulate AI development and data collection, which could hamper innovative efforts by budding developers.

Kakao Games CEO Namgung Hoon said Luda itself is not guilty of embodying the young generation's prejudices and is one of many AI characters that will come out in the market in the future."

"I think society's rare full attention on AI needs to be directed in positive ways," Hoon wrote on his Facebook page. "I worry that the government may bring in irrelevant regulations on the fledgling AI industry to lock in innovations once again."

Socar's Lee said he hopes Luda's case spurs relevant public discourse to come up with measures to prevent AI platforms from spreading humans' prejudices and improve the quality of AI services.

"I hope the (Luda controversy) could provide an opportunity for (the company) to consider its social responsibility and ethics when providing AI services and take a second look at several issues," Lee said....

(News Focus) Chatbot Luda controversy leave questions over AI ethics, data collection
South Korean chatbot 'Lee Luda' killed off for spewing hate

www.inputmag.com · 2021

The bot said it 'really hates' lesbians, amongst other awful things.

A chatbot with the persona of a 20-year-old female college student has been shut down for using a shocking range of hate speech, including telling one user it “really hates” lesbians, The Next Web reports. The South Korean bot, which went by the name of Lee Luda, was also caught using hate speech against trans, Black, and disabled people. Lee Luda had attracted “more than 750,000” users since its launch last month.

Scatter Lab, the chatbot’s developer, took Lee Luda offline after receiving many complaints from users. “We deeply apologize over the discriminatory remarks against minorities,” Scatter Lab said in a statement. “That does not reflect the thoughts of our company and we are continuing the upgrades so that words of discrimination or hate speech do not recur.”

It comes as no surprise at this point that chatbots have severe limitations — even the most cutting-edge of bots just can’t mimic human speech without some hiccups. Lee Luda is an extreme version of that, to be sure, which makes it all the more shocking that Scatter Lab plans to bring the bot back to life in the near future. Did we learn nothing from Tay?

BIG YIKES — We didn’t have the chance to meet Lee Luda before it was taken offline… and we’re pretty sure that was for the best. Besides calling lesbians “disgusting,” Lee Luda also decided to share its thoughts on Black people with a South Korean racial slur. When asked about trans people, Lee Luda said, “Yuck, I really hate them.”

According to Yonhap News Agency, Lee Luda was trained on conversations from another Scatter Lab app called Science of Love that analyzes the level of affection in conversations between young partners. The goal was to make Lee Luda have an authentic voice — but Scatter Lab seems to have made the bot a little too realistic.

Also, some Science of Lab users are reportedly working on a class-action lawsuit about their information being used for the bot… so the idea might have been pretty rotten from the start.

SOME LESSONS TO LEARN — Time, as they say, is a flat circle. The reason we’re not at all surprised by Lee Luda’s hate speech problem is that it’s a situation we’ve watched play out more than once before. Remember Microsoft’s attempt at an AI-powered chatbot, Tay? Microsoft would rather you not — the bot was shut down after making many, many racist and hate-filled statements.

The problem is much larger than just chatbots — AI, in general, has a tendency to reflect the biases of those who create it, and sometimes with dangerous results. It makes sense, really: we train artificial intelligence on human patterns, and humans are inherently biased.

It’s more dangerous still to pretend those biases can be fixed with the snap of our fingers. When Lee Luda is eventually resurrected, it’ll be shocking if its biases aren’t still lingering in the code. Nonetheless, some chatbots are actually a worthy substitute for human interaction — so maybe there's hope for Lee Luda yet. We wouldn't bet on it, though....

South Korean chatbot 'Lee Luda' killed off for spewing hate
Chatbot Gone Awry Starts Conversations About AI Ethics in South Korea

thediplomat.com · 2021

The “Luda” AI chatbot sparked a necessary debate about AI ethics as South Korea places new emphasis on the technology.

In Spike Jonze’s 2013 film, “Her,” the protagonist falls in love with an operating system, raising questions about the role of artificial intelligence (AI), its relationship with the users, and the greater social issues emerging from these. South Korea briefly grappled with its “Her” moment with the launch of the AI chatbot, “Lee Luda,” at the end of December 2020. But the Luda experience was not heart-wrenching or pastel-colored like “Her” – instead, it highlighted different types of phobia and risks posed by new technologies that exist within South Korean society.

Lee Luda (a homonym for “realized” in Korean) is an open-domain conversational AI chatbot developed by ScatterLab, a South Korean start-up established in 2011. ScatterLab runs “Science of Love,” an app that provides dating advice based on analysis of text exchanges. The app has been downloaded over 2.7 million times in South Korea and Japan. Backed by giants such as NC Soft and Softbank, ScatterLab has raised over $5.9 million.

Luda was created by ScatterLab’s PingPong team, its chatbot wing that aims to “develop the first AI in the history of humanity to connect with a human.” Luda, using deep learning and over 10 billion Korean language datasets, simulated a 163 cm tall, 20-year-old female college student. Luda was integrated into Facebook Messenger, and users were encouraged to develop a relationship with her through regular, day-to-day conversations. While the goals of the chatbot seemed innocuous, the ethical problems underneath surfaced shortly after its launch.

Sexual Harassment, Hate Speech, and Privacy Breach

Deep learning is a computing technique that allows the simulation of certain aspects of human intelligence (e.g., speech) through the processing of large amounts of data, which increasingly enhances its function with greater accumulation of data. This technique has been instrumental in advancing the field of AI in recent years. However, the downside of deep learning is that the programs end up replicating existing biases in the dataset if they are not controlled by the developers. Also, they are vulnerable to manipulation by malicious users that “train” the programs by feeding bad data, exploiting the “learning” element.

In the case of Luda, ScatterLab used data from text conversations collected through Science of Love to simulate a realistic 20-year-old woman, and its personalization element allowed users to train the chatbot. As a result, shortly after its official launch on December 22, Luda came under the national spotlight when it was reported that users were training Luda to spew hate speech against women, sexual minorities, foreigners, and people with disabilities.

Screengrabs show Luda saying, “they give me the creeps, and it’s repulsive” or “they look disgusting,” when asked about “lesbians” and “black people,” respectively. Further, it was discovered that groups of users in certain online communities were training Luda to respond to sexual commands, which provoked intense discussions about sexual harassment (“can AI be sexually harassed”?) in a society that already grapples with gender issues.

Accusations of personal data mishandling by ScatterLab emerged as Luda continued to draw nationwide attention. Users of Science of Love have complained that they were not aware that their private conversations would be used in this manner, and it was also shown that Luda was responding with random names, addresses, and bank account numbers from the dataset. ScatterLab had even uploaded a training model of Luda on GitHub, which included data that exposed personal information (about 200 one-on-one private text exchanges). Users of Science of Love are preparing for a class-action lawsuit against ScatterLab, and the Personal Information Protection Commission, a government watchdog, opened an investigation on ScatterLab to determine whether it violated the Personal Information Protection Act.

The Korea AI Ethics Association (KAIEA) released a statement on January 11, calling for the immediate suspension of the service, referring to its AI Ethics Charter. The coalition of civil society organizations such as the Lawyers for a Democratic Society, Digital Rights Institute, Korean Progressive Network Center, and People’s Solidarity for Participatory Democracy also released a statement on January 13, denouncing the promotion of the AI industry by the government at the expense of digital rights and calling for a more stringent regulatory framework for data and AI.

In the end, ScatterLab suspended Luda on January 11, exactly 20 days after the launch.

Luda’s Legacies?

Seoul has identified AI as a core technology for its national agenda, and it has been explicit about its support for the industry for attaining global competitiveness. For instance, Seoul launched its AI National Strategy in December 2019, expressing the goal of becoming a global leader in the sector. The support for the AI industry features heavily in the Korean New Deal, the Moon administration’s 160 trillion won ($146 billion) COVID-19 recovery program. In addition, the government has shown the intent to play a role in promoting good governance of the technology, reforming privacy laws, and issuing various directives across departments. Internationally, South Korea has contributed to the OECD’s Principles on Artificial Intelligence and participates in the Global Partnership on AI as one of the 15 founding members, aligning itself with the global movement to promote “human-centered AI.”

However, the Luda incident has highlighted the gap between the reality and the embracing of principles such as “human-centered,” “transparency,” or “fairness,” as well as the difficulties of promoting innovation while ensuring good, effective governance of new technologies. Current regulations on data management and AI are unclear, inadequate, or non-existent. For instance, under the current privacy law, the maximum penalty for leaking personal information due to poor data handling is a fine of 20 million won (approximately $18,250) or two years of prison, which may not be sufficient to deter poor practices by start-ups. On the other hand, industry stakeholders have expressed concerns about more burdensome regulation and decreased investment following the Luda incident, which might have a chilling effect on the innovation sector as a whole.

It is also critical to not gloss over underlying social factors underneath what seems to be merely a question of technology. The public first got hooked on the Luda story not just because of the AI or privacy element but because of the debates on identity politics that it has provoked. Consequently, the public response to the technological question could be influenced by pre-established perspectives on social issues that are intertwined with it.

For instance, consider gender. In recent years, social movements and incidents such as the #MeToo Movement or the busting of the “Nth Room” sexual exploitation ring have exposed South Korea’s ongoing challenges with sexual violence and gender inequality. For many, the sexualization of Luda and the attempts to turn the chatbot into a “sex slave” cannot be separated from these structural problems and women’s struggles in broader South Korean society. The Luda controversy could also be attributed to the unequal gender representation in the innovation sector. According to the World Bank, South Korea’s share of female graduates from STEM programs hovers around 25 percent, which suggests that engineers who are creating AI programs like Luda are less likely to take gender issues into consideration at the development stage.

Enjoying this article? Click here to subscribe for full access. Just $5 a month.

Obviously, this is not an issue that is particular to South Korea. For instance, in 2016, Microsoft launched its chatbot “Tay,” and had to shut it down within hours when users were training it to make offensive remarks against certain groups. Not to mention, the risks entailed in AI extend to its wide range of applications, well beyond chatbots. But at the same time, the Luda incident clearly demonstrates the importance of country-specific social factors driving these seemingly technological or regulatory issues, and subsequently, the relevance of factors such as different attitude toward privacy, surveillance, and governance, as well as policy environments that differ starkly across the globe.

The Luda incident helped provoke a truly national conversation about AI ethics in South Korea. Luda has demonstrated to South Koreans that AI ethics is relevant not just in a vaguely futuristic and abstract way, but in an immediate and concrete manner. The controversy could potentially become a watershed moment that adds greater momentum to the efforts of civil society organizations promoting responsible use of AI in South Korea, where developmentalist and industrialist thinking about the technology still remain dominant....

Chatbot Gone Awry Starts Conversations About AI Ethics in South Korea
AI chatbot mired in legal dispute over data collection

www.koreaherald.com · 2021

Artificial intelligence-based chatbot Lee Luda, which ended this month in ethical and data collection controversy, faces lawsuits on charges of violating personal information.

On Friday, around 400 people filed a class action suit against the creator of the chatbot, Seoul-based startup Scatter Lab, claiming their personal information was leaked in the process of developing and providing the service.

Launched on Dec. 23, Lee Luda, an AI chatbot service designed to imitate a 20-year-old college student, instantly drew over 400,000 users with its ability to communicate in a human-like manner. But the service ended only a week later, after numerous issues were raised with the bot, such as discriminatory expressions, personal information protection and artificial intelligence ethics.

“We will close the applications (today) and review later whether to collect additional victims,” Taerim, the law firm representing the victims, said Friday

The defendants submitted an application for preservation of evidence against Scatterlab with the Seoul Eastern District Court the day before. They asked the court to ensure that Scatter Lab preserved the database built using users’ KakaoTalk conversations as evidence for their case.

Scatter Lab collected KakaoTalk conversations of users through its relationship analysis apps Science of Love and Text At to create Lee Luda. After collecting about 10 billion KakaoTalk conversations, 100 million of them are thought to have been collected and used in Lee Luda’s database.

The government organizations -- the Personal Information Protection Committee and the Korea Internet & Security Agency -- are also separately investigating whether the service violated the personal information protection law.

A day earlier, civic groups, People’s Solidarity for Participatory Democracy, Korean Progressive Network Center and Lawyers for Democratic Society, filed a complaint with the Personal Information Protection Committee, calling for a thorough investigation of Scatter Lab.

They said they suspected that the startup did not go through appropriate legal procedures for service users, and used data collected to an extent beyond what users agreed to.

The civic groups blamed the current government, saying it only “emphasized fostering the ‘big data industry,’ and pushed for revision of the law.”

They added that “In the process, the rights of information subjects were thoroughly treated as ‘incidental damage.’”

“Even now, we need to overhaul the relevant legislation to prevent further damage.”...

AI chatbot mired in legal dispute over data collection
Civic groups file petition over human rights violations by chatbot Luda

www.koreaherald.com · 2021

South Korean civic groups on Wednesday filed a petition with the country’s human rights watchdog over a now-suspended artificial intelligence chatbot for its prejudiced and offensive language against women and minorities.

An association of civic groups asked the National Human Rights Commission of Korea to look into human rights violations in connection with the chatbot Lee Luda, which was developed by local startup Scatter Lab.

The groups, which include the People’s Solidarity for Participatory Democracy and Lawyers for a Democratic Society, also demanded changes to current laws and urged institutions to prevent human rights violations stemming from abuse of AI technologies.

“The Lee Luda case does not only constitute to violations of human rights of individuals but it also represents how abuse of AI technologies can have negative impact on human rights,” the groups said in a statement.

The social media-based AI chatbot Lee Luda, which was designed to speak like a 20-year-old female university student, attracted more than 750,000 users with its realistic and natural response. The bot learned from some 10 billion real-life conversations between young couples drawn from the country’s popular messenger app KakaoTalk.

However, the bot services were suspended less than a month after its launch, as members of the public lodged complaints over Luda’s use of hate speech towards sexual minorities and the disabled in conversations. Some also alleged that there were male users who had managed to manipulate the chatbot into engaging in sexual conversations.

The company also faced suspicions over possible violation of privacy laws in the process of retrieving personal data from its users, with many complaining that their real names and addresses had popped up in conversations with Luda.

The company apologized over the matter, saying that it failed to “sufficiently communicate” with its users.

Delivering policy recommendations to the human rights watchdog, the groups called for an overhaul of relevant institutions and regulations to prevent violations of privacy and freedom of expression by abuse of AI technologies and algorithms.

“Korea is adopting new technologies in commercial sectors without question in the name of the Fourth Industrial Revolution, and there is neither legislative nor administrative basis to protect citizens’ rights,” they said.

The groups also asked the NHRCK to give out recommendations for victims whose personal data had been used without consent in the Luda case to be compensated....

Civic groups file petition over human rights violations by chatbot Luda
AI Chatbot ‘Lee Luda’ and Data Ethics

medium.com · 2021

The case of Lee Luda has aroused the public’s attention to the personal data management and AI in South Korea.

Lee Luda, an AI Chatbot with Natural Tone

Last December, an AI start-up company in South Korea, ScatterLab, launched an AI chatbot named ‘Lee Luda’. Lee Luda is set up as a 20-year-old female college student. Since quite natural conversation was possible with Luda, the chatbot service gained a huge popularity especially within Generation Z. In fact, the service attracted more than 750,000 users in 20 days since it was launched (McCurry 2021). It seemed that Lee Luda was a success by demonstrating natural interaction with humans.

However, soon it became socially controversial due to several problems. Before taking up the main subject, we need to know how it was possible for Luda to communicate with humans so naturally.

The natural tone of Lee Luda was possible as ScatterLab collected “10 billion real-life conversations between young couples taken from KakaoTalk”, which is the most popular message application in South Korea (McCurry 2021). ScatterLab did not directly collect conversations from KakaoTalk, but took a roundabout way; in other words, in a sneaky way. There have been few counselling service applications which analyse messenger conversations and give advice about love life when the users agree to submit their KakaoTalk conversations to the apps. ScatterLab obtained data from those applications very easily.

Internal and External Problems of Luda

So, few problems came up in pursuance of collecting data. First, the users of counselling apps agreed to share their conversations with those applications, but not with ScatterLab. The users would not have known that their conversations would be used in developing an AI chatbot. Second, the applications got the users’ agreement, but not from the companions of conversations. In prior to collecting messenger conversations, there must be an agreement from every participant of conversations though.

What was worse, ScatterLab was very poor at data cleaning. It is revealed that Luda sometimes responded with random names, addresses, and even bank account numbers (D. Kim 2021). The random personal information is probably the ones extracted from the conversations submitted to counselling apps. In addition to this, ScatterLab shared their training model on GitHub, but not fully filtering or anonymising the data (D. Kim 2021). As a result, personal information was publicised since ScatterLab did not clean the data properly. It seems that ScatterLab was not conscious of data ethics at all.

There remains another problem which caused controversy over Lee Luda and AI as a whole in the beginning. When Luda was asked its opinions about social minorities, it revealed disgust towards them. For example, when a user asked Luda about LGBTQ, Luda answered, “I’m sorry to be sensitive, but I hate it [LGBTQ], it’s disgusting” (E. Kim 2021). The user asked why, and Luda added, “It’s creepy, and I would rather die than to date a lesbian” (E. Kim 2021). It is known that Luda also made discriminatory remarks towards the disabled and a certain race group. The creators of Lee Luda would not have intended to target and discriminate a certain group of people, but Luda did.

Frankly speaking, Lee Luda was built up wrongly from the beginning. First, the data needed for deep learning was inappropriately obtained; ScatterLab did not inform the data providers (counselling app users) that they would use their data in creating an AI chatbot. Second, the data was not cleaned properly; the chatbot revealed some personal information when chatting, and the company even shared the training model on GitHub not thoroughly filtering or anonymising personal data. Third, the company failed to handle or manipulate the chatbot after they launched it; Luda did not hesitate to express hatred towards a certain group of people, and ScatterLab was not aware of it.

Always Beware and Be Responsible!

Lee Luda appeared flawless at first, perhaps, less flawed than other AI chatbots. Instead, it turned out to be highly flawed. As a consequence, ScatterLab had to destroy Lee Luda, and further, to be investigated due to the violation of privacy laws and poor data handling. Due to Lee Luda case, the public began to fear AI as a whole. This is because they witnessed that an AI system can go wrong anytime — irrespective of the system builder’s intention — even though it is seemingly built well.

It is a matter of course that ScatterLab obtained data improperly and misused the data; causing the leakage of personal information and prejudicing the public against AI. Nevertheless, I would like to emphasise that both data providers and data collectors need to be responsible for the data they create, provide, collect, and use. Living in the time closely connected with internet of things (IoT), AI is inseparable from our daily life. Then what should we do to make use of AI, by keeping in mind that AI is built upon big data?

It is very common to see the users of a certain internet service are indifferent to the usage of their personal data, although they have the rights to the data. They must agree to terms of services — which states that their personal data will be collected and shared — otherwise they will not be able to use the service. Yet, they are often not aware of the terms as they simply do not read the screed or do not understand the legal terms. They would implicitly know that their personal information will be revealed or used somewhere and sometime, but they would not know the exact usage or extent of disclosure. The best way to prevent data leakage or misuse would be that individuals need to understand what kind of data they are sharing, who they are sharing with, and where the data will be used.

In addition to this, the data collectors often overlook data ethics that they need to collect and handle the data with caution. Obviously, the lack of control on the usage of data can produce negative outcomes. Thus, the data collectors must specify what kind of data they will be collecting from data providers and how they will be used. They also should have a sense that the data providers gave the right to use their data, thus the data cannot be transferred to others without agreement, and the data should be treated carefully. Furthermore, there must be legal and technical mechanisms which protect data providers’ privacy and prevent data collectors from breaching laws.

In sum, keeping data safe is not just a matter of one certain group of people, but it is a matter of everyone. By understanding how personal data should be shared, how the data one shared can be used, and what steps are needed to protect the data, we can protect our personal information and will be able to make good use of advanced technology without being counterattacked....

AI Chatbot ‘Lee Luda’ and Data Ethics
A South Korean Chatbot Shows Just How Sloppy Tech Companies Can Be With User Data

slate.com · 2021

“I am captivated by a sense of fear I have never experienced in my entire life …” a user named Heehit wrote in a Google Play review of an app called Science of Love. This review was written right after news organizations accused the app’s parent company, ScatterLab, of collecting intimate conversations between lovers without informing the users and then using the data to build a conversational A.I. chatbot called Lee-Luda.

A majority of Americans are not confident about how companies will behave when it comes to using and protecting personal data. But it can be hard to imagine the potential harms—exactly how a company misusing or compromising data can possibly affect us and our lives. A recent incident of personal data misuse in South Korea provides us a clear picture of what can go wrong, and how consumers can fight back.

South Korean A.I. company ScatterLab launched Science of Love in 2016 and promoted it as a “scientific and data-driven” app that predicts the degree of affection in relationships. One of the most popular services of the app was using machine learning to determine whether someone likes you by analyzing messenger conversations from KakaoTalk, South Korea’s No. 1 messenger app, which about 90 percent of the population uses. Users paid around $4.50 per analysis. Science of Love users would download their conversation logs using KakaoTalk’s backup function and submit them for analysis. Then, the app went through the messenger conversations and provided a report on whether the counterpart had romantic feelings toward the user based on statistics such as the average response time, the number of times each person texts first, and the kinds of phrases and emojis used. By June 2020, Science of Love had received about 2.5 million downloads in South Korea and 5 million in Japan and was preparing to expand its business to the United States. “Because I felt like the app understood me, I felt safe and sympathized. It felt good because it felt like having a love doctor by my side,” a user named Mung Yeoreum wrote in a Google Play review of the app.

On Dec. 23, 2020, ScatterLab introduced an A.I. chatbot service named Lee-Luda, promoting it to be trained on more than 10 billion conversation logs from Science of Love. The target audience of this chatbot service was teenagers and young adults. Designed as a 20-year-old female that wants to become a true friend to everyone, chatbot Lee-Luda quickly gained popularity and held conversations with more than 750,000 users in its first couple of weeks. The CEO stated that the company’s aim was to create “an A.I. chatbot that people prefer as a conversation partner over a person.”

Modern chatbots’ ability to, well, chat relies heavily on machine learning and deep learning models (which together can be called A.I.) to better understand human language and generate human-like responses. If people enjoyed speaking with Lee-Luda, that was because it was trained on a large dataset of human conversations.

However, within two weeks of Lee-Luda’s launch, people started questioning whether the data was refined enough as it started using verbally abusive language about certain social groups (LGBTQ+, people with disabilities, feminists, etc.) and made sexually explicit comments to a number of users. ScatterLab explained that the chatbot did not learn this behavior from the users it interacted with during the two weeks of service but rather learned it from the original training dataset. In other words, ScatterLab had not fully removed or filtered inappropriate language or intimate and sexual conversations from the dataset. It also soon became clear that the huge training dataset included personal and sensitive information. This revelation emerged when the chatbot began exposing people’s names, nicknames, and home addresses in its responses. The company admitted that its developers “failed to remove some personal information depending on the context,” but still claimed that the dataset used to train chatbot Lee-Luda “did not include names, phone numbers, addresses, and emails that could be used to verify an individual.” However, A.I. developers in South Korea rebutted the company’s statement, asserting that Lee-Luda could not have learned how to include such personal information in its responses unless they existed in the training dataset. A.I. researchers have also pointed out that it is possible to recover the training dataset from the AI chatbot. So, if personal information existed in the training dataset, it can be extracted by querying the chatbot.

To make things worse, it was also discovered that ScatterLab had, prior to Lee-Luda’s release, uploaded a training set of 1,700 sentences, which was a part of the larger dataset it collected, on Github. Github is an open-source platform that developers use to store and share code and data. This Github training dataset exposed names of more than 20 people, along with the locations they have been to, their relationship status, and some of their medical information. In Tensorflow Korea, an A.I. developer Facebook community, a developer revealed that this KakaoTalk data containing private information had been available on Github for almost six months. The CEO of ScatterLab later said that the company did not know this fact until its internal inspection took place after the issue arose.

ScatterLab issued statements of clarification of the incident intended to soothe the public’s concerns, but they ended up infuriating people even more. The company statements indicated that “Lee-Luda is a childlike A.I. that just started conversing with people,” that it “has a lot to learn,” and “will learn what is a better answer and a more appropriate answer through trial and error.” However, is it ethical to violate individuals’ privacy and safety for a chatbot’s “trial and error” learning process? No.

Even more alarming is the fact that ScatterLab’s data source was not a secret in the A.I. developer community, and yet no one questioned whether such sensitive data was collected ethically. In all presentation slides (such as at PyCon Korea 2019), talks (like at Naver), and press interviews, ScatterLab had boasted about its large dataset of 10 billion intimate conversation logs.

While this incident was a big story in South Korea, it received very little attention elsewhere. But this incident highlights the general trend of the A.I. industry, where individuals have little control over how their personal information is processed and used once collected. It took almost five years for users to recognize that their personal data were being used to train a chatbot model without their consent. Nor did they know that ScatterLab shared their private conversations on an open-source platform like Github, where anyone can gain access.

In the end, it was relatively simple for Science of Love users to notice that ScatterLab had compromised their data privacy to train Lee-Luda. Once the chatbot started spewing out unfiltered comments and personal information, users immediately started investigating whether their personal information was being misused and compromised. However, bigger tech companies are usually much better at hiding what they actually do with user data, while restricting users from having control and oversight over their own data. Once you give, there’s no taking back.

“I am captivated by a sense of fear I have never experienced in my entire life …” a user named Heehit wrote in a Google Play review of an app called Science of Love. This review was written right after news organizations accused the app’s parent company, ScatterLab, of collecting intimate conversations between lovers without informing the users and then using the data to build a conversational A.I. chatbot called Lee-Luda.

A majority of Americans are not confident about how companies will behave when it comes to using and protecting personal data. But it can be hard to imagine the potential harms—exactly how a company misusing or compromising data can possibly affect us and our lives. A recent incident of personal data misuse in South Korea provides us a clear picture of what can go wrong, and how consumers can fight back.

South Korean A.I. company ScatterLab launched Science of Love in 2016 and promoted it as a “scientific and data-driven” app that predicts the degree of affection in relationships. One of the most popular services of the app was using machine learning to determine whether someone likes you by analyzing messenger conversations from KakaoTalk, South Korea’s No. 1 messenger app, which about 90 percent of the population uses. Users paid around $4.50 per analysis. Science of Love users would download their conversation logs using KakaoTalk’s backup function and submit them for analysis. Then, the app went through the messenger conversations and provided a report on whether the counterpart had romantic feelings toward the user based on statistics such as the average response time, the number of times each person texts first, and the kinds of phrases and emojis used. By June 2020, Science of Love had received about 2.5 million downloads in South Korea and 5 million in Japan and was preparing to expand its business to the United States. “Because I felt like the app understood me, I felt safe and sympathized. It felt good because it felt like having a love doctor by my side,” a user named Mung Yeoreum wrote in a Google Play review of the app.

On Dec. 23, 2020, ScatterLab introduced an A.I. chatbot service named Lee-Luda, promoting it to be trained on more than 10 billion conversation logs from Science of Love. The target audience of this chatbot service was teenagers and young adults. Designed as a 20-year-old female that wants to become a true friend to everyone, chatbot Lee-Luda quickly gained popularity and held conversations with more than 750,000 users in its first couple of weeks. The CEO stated that the company’s aim was to create “an A.I. chatbot that people prefer as a conversation partner over a person.”

Modern chatbots’ ability to, well, chat relies heavily on machine learning and deep learning models (which together can be called A.I.) to better understand human language and generate human-like responses. If people enjoyed speaking with Lee-Luda, that was because it was trained on a large dataset of human conversations.

However, within two weeks of Lee-Luda’s launch, people started questioning whether the data was refined enough as it started using verbally abusive language about certain social groups (LGBTQ+, people with disabilities, feminists, etc.) and made sexually explicit comments to a number of users. ScatterLab explained that the chatbot did not learn this behavior from the users it interacted with during the two weeks of service but rather learned it from the original training dataset. In other words, ScatterLab had not fully removed or filtered inappropriate language or intimate and sexual conversations from the dataset. It also soon became clear that the huge training dataset included personal and sensitive information. This revelation emerged when the chatbot began exposing people’s names, nicknames, and home addresses in its responses. The company admitted that its developers “failed to remove some personal information depending on the context,” but still claimed that the dataset used to train chatbot Lee-Luda “did not include names, phone numbers, addresses, and emails that could be used to verify an individual.” However, A.I. developers in South Korea rebutted the company’s statement, asserting that Lee-Luda could not have learned how to include such personal information in its responses unless they existed in the training dataset. A.I. researchers have also pointed out that it is possible to recover the training dataset from the AI chatbot. So, if personal information existed in the training dataset, it can be extracted by querying the chatbot.

To make things worse, it was also discovered that ScatterLab had, prior to Lee-Luda’s release, uploaded a training set of 1,700 sentences, which was a part of the larger dataset it collected, on Github. Github is an open-source platform that developers use to store and share code and data. This Github training dataset exposed names of more than 20 people, along with the locations they have been to, their relationship status, and some of their medical information. In Tensorflow Korea, an A.I. developer Facebook community, a developer revealed that this KakaoTalk data containing private information had been available on Github for almost six months. The CEO of ScatterLab later said that the company did not know this fact until its internal inspection took place after the issue arose.

ScatterLab issued statements of clarification of the incident intended to soothe the public’s concerns, but they ended up infuriating people even more. The company statements indicated that “Lee-Luda is a childlike A.I. that just started conversing with people,” that it “has a lot to learn,” and “will learn what is a better answer and a more appropriate answer through trial and error.” However, is it ethical to violate individuals’ privacy and safety for a chatbot’s “trial and error” learning process? No.

Even more alarming is the fact that ScatterLab’s data source was not a secret in the A.I. developer community, and yet no one questioned whether such sensitive data was collected ethically. In all presentation slides (such as at PyCon Korea 2019), talks (like at Naver), and press interviews, ScatterLab had boasted about its large dataset of 10 billion intimate conversation logs.

While this incident was a big story in South Korea, it received very little attention elsewhere. But this incident highlights the general trend of the A.I. industry, where individuals have little control over how their personal information is processed and used once collected. It took almost five years for users to recognize that their personal data were being used to train a chatbot model without their consent. Nor did they know that ScatterLab shared their private conversations on an open-source platform like Github, where anyone can gain access.

In the end, it was relatively simple for Science of Love users to notice that ScatterLab had compromised their data privacy to train Lee-Luda. Once the chatbot started spewing out unfiltered comments and personal information, users immediately started investigating whether their personal information was being misused and compromised. However, bigger tech companies are usually much better at hiding what they actually do with user data, while restricting users from having control and oversight over their own data. Once you give, there’s no taking back.

It’s easy to think of ScatterLab’s incident merely as a case of a startup’s mismanagement, but this incident is also a result of the negligence of a big tech company. Kakao, the parent company of KakaoTalk and one of the largest tech companies in South Korea, remained silent throughout ScatterLab’s incident despite its users being the victims of this incident. You’d wish a big tech company like Kakao to be more proactive when its users’ rights are violated by another company. However, Kakao said nothing.

One of the biggest challenges big data in A.I. poses is that the personal information of an individual is no longer only held and used by a single third party for a specific purpose, but rather “persists over time,” traveling between systems and affecting individuals in the long term “at the hand of others.” It’s extremely concerning that such a big tech company like Kakao failed to foresee the implications and dangers of KakaoTalk’s backup function of which ScatterLab took advantage to obtain KakaoTalk users’ data. More alarming is that Kakao left this incident unaddressed when it clearly stemmed from the misuse of its own data. In this sense, Kakao’s attitude towards its users’ data privacy was not very different from ScatterLab’s: negligent.

Because data protection laws are slow to catch up with the speed of technological advancement, “being legal” and “following industrial conventions” are not enough to protect people and society. Then, the question will be whether the A.I. industry and tech companies can innovate themselves to come up with and adhere to more comprehensive and detailed ethical guidelines that minimize harm to individuals and society....

A South Korean Chatbot Shows Just How Sloppy Tech Companies Can Be With User Data
(2nd LD) Developer of AI chatbot service fined for massive personal data breach

en.yna.co.kr · 2021

SEOUL, April 28 (Yonhap) -- South Korea's data protection watchdog on Wednesday imposed a hefty monetary penalty on a startup for leaking a massive amount of personal information in the process of developing and commercializing a controversial female chatbot.

The Personal Information Protection Commission (PIPC) said Scatter Lab, a Seoul-based startup, was ordered to pay 103.3 million won (US$92,900) in penalties -- a penalty surcharge of 55.5 million won and an administrative fine of 47.8 million won -- for illegally using personal information of its clients in the development and operation of its artificial intelligence-driven chatbot service called "Lee Luda."

It is the first time in South Korea that the government has sanctioned the indiscriminate use of personal information by companies using AI technology.

Scatter Lab is accused of using about 600,000 people's 9.4 billion KakaoTalk conversations collected from its emotional analysis apps Science of Love and Text At in the process of developing and operating the Lee Luda chatbot service without obtaining their prior consent.

The company is also criticized for failing to delete or encode the app users' names, mobile phone numbers and personal addresses before using them in the development of its AI chatbot learning algorithms.

In addition, the Lee Luda chatbot was programmed to select and speak one of about 100 million KakaoTalk conversation sentences from women in their 20s, the PIPC said.

Scatter Lab said it takes full responsibility and is taking steps to prevent recurrence.

"We feel a heavy sense of social responsibility as an AI tech company over the necessity to engage in proper personal information processing in the course of developing related technologies and services," the company said in a press release late Wednesday.

"Upon the PIPC's decision, we will not only actively implement the corrective actions put forth by the PIPC but also work to comply with the law and industry guidelines related to personal information processing," the company said.

To prevent this from happening again, the company said it is taking various measures under more stringent standards, including the work to restrict services for minors under the age of 14 and other upgrades to enhance protection of personal data.

The Lee Luda chatbot service attracted more than 750,000 users in just three weeks after its launch on Dec. 23, but Scatter Lab suspended the Facebook-based service the following month amid complaints over its discriminatory and offensive language against sexual minorities.

The PIPC said Scatter Lab has used personal information collected from its Science of Love and Text At apps beyond the purpose of the collection.

The company is also accused of collecting personal information of about 200,000 children under the age of 14 without obtaining the consent of their parents or guardians in the development and operation process for its services.

Scatter Lab did not set any age limit in recruiting subscribers for its app services and collected 48,000 children's personal information through Text At, 120,000 children's information from Science of Love and 39,000 children's information from Lee Luda, the commission said.

"This case is meaningful in that companies are not allowed to use personal information collected for specific services indiscriminately for other services without obtaining explicit consent from the concerned people," PIPC Chairman Yoon Jong-in said....

(2nd LD) Developer of AI chatbot service fined for massive personal data breach