関連インシデント

When Netflix gets a movie recommendation wrong, you’d probably think that it’s not a big deal. Likewise, when your favourite sneakers don’t make it into Amazon’s list of recommended products, it’s probably not the end of the world. But when an algorithm assigns you a threat score from 1 to 500 that is used to rule on jail time, you might have some concerns about this use of predictive analytics.
To the general audience, predictive policing methods are probably best known from the 2002 science fiction movie Minority Report starring Tom Cruise. Based on a short story by Philip K. Dick, the movie presents a vision of the future in which crimes can be predicted and prevented. This may sound like a far-fetched utopian scenario. However, predictive justice already exists today. Built on advanced machine learning systems, there is a wave of new companies that provide predictive services to courts; for example, in the form of risk-assessment algorithms that estimate the likelihood of recidivism for criminals.
Can machines identify future criminals?
After his arrest in 2013, Eric Loomis was sentenced to six years in prison based in part on an opaque algorithmic prediction that he would commit more crimes. Equivant (formerly Northpointe), the company behind the proprietary software used in Eric Loomis’ case, claims to have provided a 360-degree view of the defendant in order to provide detailed algorithmic assistance in judicial decision-making.
This company is one of many players in the predictive justice field in the US. A recent report by the Electronic Privacy Information Center finds that algorithms are increasingly used in court to “set bail, determine sentences, and even contribute to determinations about guilt or innocence”. This shift towards more machine intelligence in courts, allowing AI to augment human judgement, could be extremely beneficial for the judicial system as a whole.
However, an investigative report by ProPublica found that these algorithms tend to reinforce racial bias in law enforcement data. Algorithmic assessments tend to falsely flag black defendants as future criminals at almost twice the rate as white defendants. What is more, the judges who relied on these risk-assessments typically did not understand how the scores were computed.
This is problematic, because machine learning models are only as reliable as the data they’re trained on. If the underlying data is biased in any form, there is a risk that structural inequalities and unfair biases are not just replicated, but also amplified. In this regard, AI engineers must be especially wary of their blind spots and implicit assumptions; it is not just the choice of machine learning techniques that matters, but also all the small decisions about finding, organising and labelling training data for AI models.
Biased data feeds biased algorithms
Even small irregularities and biases can produce a measurable difference in the final risk-assessment. The critical issue is that problems like racial bias and structural discrimination are baked into the world around us.
For instance, there is evidence that, despite similar rates of drug use, black Americans are arrested at four times the rate of white Americans on drug-related charges. Even if engineers were to faithfully collect this data and train a machine learning model with it, the AI would still pick up the embedded bias as part of the model.
Systematic patterns of inequality are everywhere. If you look at the top grossing movies of 2014/2015 you can see that female characters are vastly underrepresented both in terms of screen time and speaking time. New machine learning models can quantify these inequalities, but there are a lot of open questions about how engineers can proactively mitigate them.
Google’s recent “Quick, Draw!” experiment vividly demonstrates why addressing bias matters. The experiment invited internet users worldwide to participate in a fun game of drawing. In every round of the game, users were challenged to draw an object in under 20 seconds. The AI system would then try to guess what their drawing depicts. More than 20 million people from 100 nations participated in the game, resulting in over 2 billion diverse drawings of all sorts of objects, including cats, chairs, postcards, butterflies, skylines, etc.
But when the researchers examined the drawings of shoes in the data-set, they realised that they were dealing with strong cultural bias. A large number of early users drew shoes that looked like Converse sneakers. This led the model to pick up the typical visual attributes of sneakers as the prototypical example of what a “shoe” should look like. Consequently, shoes that did not look like sneakers, such as high heels, ballerinas or clogs, were not recognized as shoes.
Recent studies show that, if left unchecked, machine learning models will learn outdated gender stereotypes, such as “doctors” being male and “receptionists” being female. In a similar fashion, AI models