Report 701

It was a striking story. “Machine Bias,” the headline read, and the teaser proclaimed: “There’s software used across the country to predict future criminals. And it’s biased against blacks.”

ProPublica, a Pulitzer Prize–winning nonprofit news organization, had analyzed risk assessment software known as COMPAS. It is being used to forecast which criminals are most likely to reoffend. Guided by such forecasts, judges in courtrooms throughout the United States make decisions about the future of defendants and convicts, determining everything from bail amounts to sentences. When ProPublica compared COMPAS’s risk assessments for more than 10,000 people arrested in one Florida county with how often those people actually went on to reoffend, it discovered that the algorithm “correctly predicted recidivism for black and white defendants at roughly the same rate.” But when the algorithm was wrong, it was wrong in different ways for blacks and whites. Specifically, “blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend.” And COMPAS tended to make the opposite mistake with whites: “They are much more likely than blacks to be labeled lower risk but go on to commit other crimes.”

Whether it’s appropriate to use systems like COMPAS is a question that goes beyond racial bias. The U.S. Supreme Court might soon take up the case of a Wisconsin convict who says his right to due process was violated when the judge who sentenced him consulted COMPAS, because the workings of the system were opaque to the defendant. Potential problems with other automated decision-making (ADM) systems exist outside the justice system, too. On the basis of online personality tests, ADMs are helping to determine whether someone is the right person for a job. Credit-scoring algorithms play an enormous role in whether you get a mortgage, a credit card, or even the most cost-effective cell-phone deals.

It’s not necessarily a bad idea to use risk assessment systems like COMPAS. In many cases, ADM systems can increase fairness. Human decision making is at times so incoherent that it needs oversight to bring it in line with our standards of justice. As one specifically unsettling study showed, parole boards were more likely to free convicts if the judges had just had a meal break. This probably had never occurred to the judges. An ADM system could discover such inconsistencies and improve the process.

But often we don’t know enough about how ADM systems work to know whether they are fairer than humans would be on their own. In part because the systems make choices on the basis of underlying assumptions that are not clear even to the systems’ designers, it’s not necessarily possible to determine which algorithms are biased and which ones are not. And even when the answer seems clear, as in ProPublica’s findings on COMPAS, the truth is sometimes more complicated.

Lawmakers, the courts, and an informed public should decide what we want algorithms to prioritize.

What should we do to get a better handle on ADMs? Democratic societies need more oversight over such systems than they have now. AlgorithmWatch, a Berlin-based nonprofit advocacy organization that I cofounded with a computer scientist, a legal philosopher, and a fellow journalist, aims to help people understand the effects of such systems. “The fact that most ADM procedures are black boxes to the people affected by them is not a law of nature. It must end,” we assert in our manifesto. Still, our take on the issue is different from many critics’—because our fear is that the technology could be demonized undeservedly. What’s important is that societies, and not only algorithm makers, make the value judgments that go into ADMs.

Measures of fairness

COMPAS determines its risk scores from answers to a questionnaire that explores a defendant’s criminal history and attitudes about crime. Does this produce biased results?

After ProPublica’s investigation, Northpointe, the company that developed COMPAS, disputed the story, arguing that the journalists misinterpreted the data. So did three criminal-justice researchers, including one from a justice-reform organization. Who’s right—the reporters or the researchers? Krishna Gummadi, head of the Networked Systems Research Group at the Max Planck Institute for Software Systems in Saarbrücken, Germany, offers a surprising answer: they all are.

Gummadi, who has extensively researched fairness in algorithms, says ProPublica’s and Northpointe’s results don’t contradict each other. They differ because they use different measures of fairness.

If used properly, criminal-justice algorithms offer “the chance of a generation, and perhaps a lifetime, to reform sentencing and unwind mass incarceration in a scientific way.”

Imagine you are designing a system to predict which criminals will reoffend. One option is to optimize for “true positives,” meaning that you will identify as many people as possible who are at high risk of comm

Report 701

Associated Incidents

Incident 4021 Report
COMPAS Algorithm Reportedly Performs Poorly in Crime Recidivism Prediction

Inspecting Algorithms for Bias

Report 701

Associated Incidents

Incident 4021 ReportCOMPAS Algorithm Reportedly Performs Poorly in Crime Recidivism Prediction

Inspecting Algorithms for Bias

Incident 4021 Report
COMPAS Algorithm Reportedly Performs Poorly in Crime Recidivism Prediction