Incident 143: Facebook’s and Twitter's Automated Content Moderation Reportedly Failed to Effectively Enforce Violation Rules for Small Language Groups
Suggested citation format
According to the responses to BIRN’s questionnaire, some 57 per cent of those who reported hate speech said they were notified that the reported post/account violated the rules.
On the other hand, some 28 per cent said they had received notification that the content they reported did not violate the rules, while 14 per cent received only confirmation that their report was filed.
In terms of reports of targeted harassment, half of people said they received confirmation that the content violated the rules; 16 per cent were told the content did not violate rules. A third of those who reported targeted harassment only received confirmation their report was received.
As for threatening violence, 40 per cent of people received confirmation that the reported post/account violated the rules while 60 per cent received only confirmation their complaint had been received.
One of the respondents told BIRN they had reported at least seven accounts for spreading hatred and violent content.
“I do not engage actively on such reports nor do I keep looking and searching them. However, when I do come across one of these hateful, genocide deniers and genocide supporters, it feels the right thing to do, to stop such content from going further,” the respondent said, speaking on condition of anonymity. “Maybe one of all the reported individuals stops and asks themselves what led to this and simply opens up discussions, with themselves or their circles.”
Although for those seven acounts Twitter confirmed they violate some of the rules, six of them are still available online.
BIRN methodology BIRN conducted its questionnaire via the network’s tool for engaging citizens in reporting, developed in cooperation with the British Council. The anonymous questionnaire had the aim of collecting information on what type of violations people reported, who was the target and how successful the report was. The questions were available in English, Macedonian, Albanian and Bosnian/Serbian/Montenegrin. BIRN focused on Facebook and Twitter given their popularity in the Balkans and the sensitivity of shared content, which is mostly textual and harder to assess compared to videos and photos.
Another issue that emerged is unclear criteria while reporting violations. Basic knowledge of English is also required.
Sanjana Hattotuwa, special advisor at ICT4Peace Foundation agreed that the in-app or web-based reporting process is confusing.
“Moreover, it is often in English even though the rest of the UI/UX [User Interface/User Experience] could be in the local language. Furthermore, the laborious selection of categories is, for a victim, not easy – especially under duress.”
Facebook told BIRN that the vast majority of reports are reviewed within 24 hours and that the company uses community reporting, human review and automation.
It refused, however, to give any specifics on those it employs to review content or reports in Balkan languages, saying “it isn’t accurate to only give the number of content reviewers”.
“That alone doesn’t reflect the number of people working on a content review for a particular country at any given time,” the spokesperson said.
Social networks often remove content themselves, in what they call a ‘proactive approach’.
According to data provided by Facebook, in the last quarter of 2017 their proactive detection rate was 23.6 per cent.
“This means that of the hate speech we removed, 23.6 per cent of it was found before a user reported it to us,” the spokesperson said. “The remaining majority of it was removed after a user reported it. Today we proactively detect about 95 per cent of hate speech content we remove.”
“Whether content is proactively detected or reported by users, we often use AI to take action on the straightforward cases and prioritise the more nuanced cases, where context needs to be considered, for our reviewers.”
There is no available data, however, when it comes to content in a specific language or country.
Facebook publishes a Community Standards Enforcement Report on a quarterly basis, but, according to the spokesperson, the company does not “disclose data regarding content moderation in specific countries.”
Whatever the tools, the results are sometimes highly questionable.
In May 2018, Facebook blocked for 24 hours the profile of Bosnian journalist Dragan Bursac after he posted a photo of a detention camp for Bosniaks in Serbia during the collapse of federal Yugoslavia in the 1990s.
Facebook determined that Bursac’s post had violated “community standards,” local media reported.
Bojan Kordalov, Skopje-based public relations and new media specialist, said that, “when evaluating efficiency in this area, it is important to emphasise that the traffic in the Internet space is very dense and is increasing every second, which unequivocally makes it a field where everyone needs to contribute”.
“This means that social media managements are undeniably responsible for meeting the standards and compliance with regulations within their platforms, but this does not absolve legislators, governments and institutions of responsibility in adapting to the needs of the new digital age, nor does it give anyone the right to redefine and narrow down the notion and the benefits that democracy brings.”
Lack of language sensibility
SHARE Foundation, a Belgrade-based NGO working on digital rights, said the question was crucial given the huge volume of content flowing through the likes of Facebook and Twitter in all languages.
“When it comes to relatively small language groups in absolute numbers of users, such as languages in the former Yugoslavia or even in the Balkans, there is simply no incentive or sufficient pressure from the public and political leaders to invest in human moderation,” SHARE told BIRN.
Berthelemy of EDRi said the Balkans were not a stand alone example, and that the content moderation practices and policies of Facebook and Twitter are “doomed to fail.”
“Many of these corporations operate on a massive scale, some of them serving up to a quarter of the world’s population with a single service,” Berthelemy told BIRN. “It is impossible for such monolithic architecture, and speech regulation process and policy to accommodate and satisfy the specific cultural and social needs of individuals and groups.”
The European Parliament has also stressed the importance of a combined assessment.
“The expressions of hatred can be conveyed in many ways, and the same words typically used to convey such expressions can also be used for different purposes,” according to a 2020 study – ‘The impact of algorithms for online content filtering or moderation’ – commissioned by the Parliament’s Policy Department for Citizens’ Rights and Constitutional Affairs.
“For instance, such words can be used for condemning violence, injustice or discrimination against the targeted groups, or just for describing their social circumstances. Thus, to identify hateful content in textual messages, an attempt must be made at grasping the meaning of such messages, using the resources provided by natural language processing.”
Hattotuwa said that, in general, “non-English language markets with non-Romanic (i.e. not English letter based) scripts are that much harder to design AI/ML solutions around”.
“And in many cases, these markets are out of sight and out of mind, unless the violence, abuse or platform harms are so significant they hit the New York Times front-page,” Hattotuwa told BIRN.
“Humans are necessary for evaluations, but as you know, there are serious emotional / PTSD issues related to the oversight of violent content, that companies like Facebook have been sued for (and lost, having to pay damages).”
Failing in non-English
Dragan Vujanovic of the Sarajevo-based NGO Vasa prava [Your Rights] criticised what he said was a “certain level of tolerance with regards to violations which support certain social narratives.”
“This is particularly evident in the inconsistent behavior of social media moderators where accounts with fairly innocuous comments are banned or suspended while other accounts, with overt abuse and clear negative social impact, are tolerated.”
For Chloe Berthelemy, trying to apply a uniform set of rules on the very diverse range of norms, values and opinions on all available topics that exist in the world is “meant to fail.”
“For instance, where nudity is considered to be sensitive in the United States, other cultures take a more liberal approach,” she said.
The example of Myanmar, when Facebook effectively blocked an entire language by refusing all messages written in Jinghpaw, a language spoken by Myanmar’s ethnic Kachin and written with a Roman alphabet, shows the scale of the issue.
“The platform performs very poorly at detecting hate speech in non-English languages,” Berthelemy told BIRN.
The techniques used to filter content differ depending on the media analysed, according to the 2020 study for the European Parliament.
“A filter can work at different levels of complexity, spanning from simply comparing contents against a blacklist, to more sophisticated techniques employing complex AI techniques,” it said.
“In machine learning approaches, the system, rather than being provided with a logical definition of the criteria to be used to find and classify content (e.g., to determine what counts as hate speech, defamation, etc.) is provided with a vast set of data, from which it must learn on its own the criteria for making such a classification.”
Users of both Twitter and Facebook can appeal in the event their accounts are suspended or blocked.
“Unfortunately, the process lacks transparency, as the number of filed appeals is not mentioned in the transparency report, nor is the number of processed or reinstated accounts or tweets,” the study noted.
Between January and October 2020, Facebook restored some 50,000 items of content without an appeal and 613,000 after appeal.
According to the Twitter Transparency report, in the first six months of 2020, 12.4 million accounts were reported to the company, just over six million of which were reported for hateful conduct and some 5.1 million for “abuse/harassment”.
In the same period, Twitter suspended 925,744 accounts, of which 127,954 were flagged for hateful conduct and 72,139 for abuse/harassment. The company removed such content in a little over 1.9 million cases: 955,212 in the hateful conduct category and 609,253 in the abuse/harassment category.
Toskic Cvetinovic said the rules needed to be clearer and better communicated to users by “living people.”
“Often, the content removal doesn’t have a corrective function, but amounts to censorship,” she said.
Berthelemy said that, “because the dominant social media platforms reproduce the social systems of oppression, they are also often unsafe for many groups at the margins.”
“They are unable to understand the discriminatory and violent online behaviours, including certain forms of harassment and violent threats and therefore, cannot address the needs of victims,” Berthelemy told BIRN.
“Furthermore,” she said, “those social media networks are also advertisement companies. They rely on inflammatory content to generate profiling data and thus advertisement profits. There will be no effective, systematic response without addressing the business models of accumulating and trading personal data.”
Did our AI mess up? Flag the unrelated incidents