Citation record for Incident 54

Suggested citation format

Yampolskiy, Roman. (2015-11-18) Incident Number 54. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
Report Count
Incident Date
Editors
54
19
2015-11-18
Sean McGregor

CSET Taxonomy Classifications

Taxonomy Details

Full Description

Predictive policing algorithms meant to aid law enforcement by predicting future crime show signs of biased output. PredPol, used by the Oakland (California) Police Department, and the Strategic Subject List, used by Chicago PD, were subjects of studies in 2015 and 2016 showing their bias against "low-income, minority neighborhoods." These neighborhoods would receive added attention from police departments expecting crimes to be more prevalent in the area. Notably, Oakland Police Department used 2010's record of drug crime as their baseline to train the system.

Short Description

Predictive policing algorithms meant to aid law enforcement by predicting future crime show signs of biased output.

Severity

Minor

Harm Distribution Basis

Race, National origin or immigrant status, Financial means

Harm Type

Harm to civil liberties

AI System Description

Predictive policing algorithms meant to aid police in predicting future crime.

System Developer

PredPol, Chicago Police Department

Sector of Deployment

Public administration and defence

Relevant AI functions

Cognition

AI Techniques

machine learning

AI Applications

Predictive policing

Named Entities

Oakland Police Department, Chicago Police Department, PredPol, Human Rights Data Analysis Group, Strategic Subject List

Technology Purveyor

PredPol, Chicago Police Department, Oakland Police Department

Beginning Date

2015-01-01T00:00:00.000Z

Ending Date

2017-01-01T00:00:00.000Z

Near Miss

Unclear/unknown

Intent

Unclear

Lives Lost

No

Laws Implicated

Fourth Amendment of the US Constitution

Data Inputs

Crime statistics

Incidents Reports

Faiza Patel is the co-director of the Liberty and National Security Program at the Brennan Center for Justice at New York University Law School. She is on Twitter.

In every age, police forces gain access to new tools that may advance their mission to prevent and combat crime. Predictive technologies — the use of data from a variety of sources to create an assessment model for the probability of future crimes — have been touted as a means to forecast where crime will likely take place and sometimes who is likely to commit a crime.

Technology that purports to zero in on an individual who is likely to commit a crime is particularly suspect.

Given the far-reaching implications of acting on such projections, any police department considering a predictive analytical tool must thoroughly test the reliability of its claims. So far, research is in its infancy.

A handful of studies have shown short-term decreases in crime when police allocate resources to predicted “hotspots,” but other assessments have showed no statistically significant correlation or diminishing returns.

At a time of rising concern about over-policing in minority communities, surging police to particular locations may have its own compounding negative consequences. Technology that purports to zero in on categories of people likely to commit crimes is even more suspect. It undermines the constitutional requirement that police should target people based upon an individual suspicion of wrongdoing, not statistical probability.

Of course, even algorithms used to predict the location of crime will only be as good as the information that is fed into them. If an algorithm is populated primarily with crimes committed by black people, it will spit out results that send police to black neighborhoods. This is a serious and perhaps insurmountable problem, considering that people of color are stopped, detained, arrested and incarcerated at higher levels than whites regardless of crime rates. The New York Police Department’s infamous stop-and-frisk program overwhelmingly targeted black and Latino men, even though the number of arrests or summons resulting from stops was actually lower for minority targets. The risks of “driving while black” are also well-documented.

These realities mean we, as a society, should proceed with extreme caution despite the hype about the promise of data. Predictive policing shouldn’t just become racial profiling by another name.

Join Opinion on Facebook and follow updates on twitter.com/roomfordebate.

Be Cautious About Data-Driven Policing

themarshallproject.org · 2016

Just over a year after Michael Brown’s death became a focal point for a national debate about policing and race, Ferguson and nearby St. Louis suburbs have returned to what looks, from the outside, like a kind of normalcy. Near the Canfield Green apartments, where Brown was shot by police officer Darren Wilson, a sign reading “Hands Up Don’t Shoot” and a mountain of teddy bears have been cleared away. The McDonald’s on West Florissant Avenue, where protesters nursed rubber bullet wounds and escaped tear gas, is now just another McDonald’s.

This story was produced in collaboration with

Half a mile down the road in the city of Jennings, between the China King restaurant and a Cricket cell phone outlet, sits an empty room that the St. Louis County Police Department keeps as a substation. During the protests, it was a war room, where law enforcement leaders planned their responses to the chaos outside.

One day last December, a few Jennings police officers flicked on the substation’s fluorescent lights and gathered around a big table to eat sandwiches. The conversation drifted between the afternoon shift’s mundane roll of stops, searches, and arrests, and the day’s main excitement: the officers were trying out a new software program called HunchLab, which crunches vast amounts of data to help predict where crime will happen next.

The conversation also turned to the grand anxieties of post-Ferguson policing. “Nobody wants to be the next Darren Wilson,” Officer Trevor Voss told me. They didn’t personally know Wilson. Police jurisdiction in St. Louis is notoriously labyrinthine and includes dozens of small, local municipal agencies like the Ferguson Police Department, where Wilson worked — munis, the officers call them — and the St. Louis County Police Department, which patrols areas not covered by the munis and helps with “resource intense events,” like the protests in Ferguson. The munis have been the targets of severe criticism; in the aftermath 2014's protests, Ferguson police were accused by the federal Department of Justice of being racially discriminatory and poorly trained, more concerned with handing out tickets to fund municipal coffers than with public safety.

The officers in Jennings work for the St. Louis County Police Department; in 2014, their colleagues appeared on national TV, pointing sniper rifles at protesters from armored trucks. Since then, the agency has also been called out by the Justice Department for, among other things, its lack of engagement with the community.

ST. LOUIS COUNTY HUNCHLAB CRIME MAP ST CHARLES FLORISSANT FERGUSON JENNINGS Mississippi River CHESTERFIELD FLORISSANT St. Louis 70 ST CHARLES BALLWIN KIRKWOOD 270 FERGUSON FENTON JENNINGS MEHLVILLE Mississippi River MARYLAND HEIGHTS 64 St. Louis CHESTERFIELD 5 miles CREVE COEUR CLAYTON ST. LOUIS COUNTY POLICE DEPARTMENT JURISIDICTION BALLWIN WILDWOOD ST CHARLES KIRKWOOD FLORISSANT 55 FERGUSON JENNINGS FENTON Mississippi River CHESTERFIELD 255 St. Louis MEHLVILLE 44 BALLWIN KIRKWOOD FENTON MEHLVILLE 5 miles 5 miles ST. LOUIS COUNTY POLICE JURISIDICTION ST. LOUIS COUNTY ST CHARLES FLORISSANT FLORISSANT FERGUSON FERGUSON JENNINGS JENNINGS Mississippi River Mississippi River CHESTERFIELD CHESTERFIELD St. Louis St. Louis BALLWIN BALLWIN FENTON FENTON MEHLVILLE MEHLVILLE 5 miles 5 miles HUNCHLAB CRIME MAP ST CHARLES FLORISSANT 70 FERGUSON 270 JENNINGS 64 Mississippi River CHESTERFIELD St. Louis BALLWIN KIRKWOOD 55 255 FENTON MEHLVILLE 44 5 miles ST. LOUIS COUNTY ST CHARLES FLORISSANT 70 FERGUSON 270 JENNINGS 64 Mississippi River CHESTERFIELD St. Louis BALLWIN KIRKWOOD 55 255 FENTON MEHLVILLE 44 5 miles ST. LOUIS COUNTY POLICE JURISIDICTION ST CHARLES FLORISSANT FERGUSON JENNINGS Mississippi River CHESTERFIELD St. Louis BALLWIN KIRKWOOD FENTON MEHLVILLE 5 miles HUNCHLAB CRIME MAP ST CHARLES FLORISSANT FERGUSON JENNINGS Mississippi River CHESTERFIELD St. Louis BALLWIN KIRKWOOD FENTON MEHLVILLE 5 miles

Still, the county police enjoy a better local reputation than the munis. Over the last five years, Jennings precinct commander Jeff Fuesting has tried to improve relations between officers — nearly all white — and residents — nearly all black — by going door to door for “Walk and Talks.” Fuesting had expressed interest in predictive policing years before, so when the department heads brought in HunchLab, they asked his precinct to roll it out first. They believed that data could help their officers police better and more objectively. By identifying and aggressively patrolling “hot spots,” as determined by the software, the police wanted to deter crime before it ever happened.

HunchLab, produced by Philadelphia-based startup Azavea, represents the newest iteration of predictive policing, a method of analyzing crime data and identifying patterns that may repeat into the future. HunchLab primarily surveys past crimes, but also digs into doze

Policing the Future

A police officer stands at the corner of a busy intersection, scanning the crowd with her body camera. The feed is live-streamed into the Real Time Crime Center at department headquarters, where specialized software uses biometric recognition to determine if there are any persons of interests on the street.

Data analysts alert the officer that a man with an abnormally high threat score is among the crowd; the officer approaches him to deliver a “custom notification”, warning him that the police will not tolerate criminal behavior. He is now registered in a database of potential offenders.

Overhead, a light aircraft outfitted with an array of surveillance cameras flies around the city, persistently keeping watch over entire sectors, recording everything that happens and allowing police to stop, rewind and zoom in on specific people or vehicles …

None of this is techno-paranoia from the mind of Philip K Dick or Black Mirror, but rather existing technologies that are already becoming standard parts of policing.

The California city of Fresno is just one of the police departments in the US already using a software program called “Beware” to generate “threat scores” about an individual, address or area. As reported by the Washington Post in January, the software works by processing “billions of data points, including arrest reports, property records, commercial databases, deep web searches and the [person’s] social media postings”.

A brochure for Beware uses a hypothetical example of a veteran diagnosed with PTSD, indicating they also take into account health-related data. Scores are colour-coded so officers can know at a glance what level the threat is: green, yellow or red.

Chicago's computer-generated heat list profiled potential criminals – essentially suspects for crimes not yet committed

This is just one of many new technologies facilitating “data-driven policing”. The collection of vast amounts of data for use with analytics programs allows police to gather data from just about any source and for just about any reason.

The holy grail is ‘predictive policing’

“Soon it will be feasible and affordable for the government to record, store and analyze nearly everything we do,” writes law professor Elizabeth Joh in Harvard Law & Policy Review. “The police will rely on alerts generated by computer programs that sift through the massive quantities of available information for patterns of suspicious activity.”

The holy grail of data-fueled analytics is called “predictive policing”, which uses statistical models to tell officers where crime is likely to happen and who is likely to commit it.

In February 2014, the Chicago police department (CPD) attracted attention when officers pre-emptively visited residents on a computer-generated “heat list”, which marked them as likely to be involved in a future violent crime. These people had done nothing wrong, but the CPD wanted to let them know that officers would be keeping tabs on them.

Essentially, they were already considered suspects for crimes not yet committed.

From Fresno to New York, and Rio to Singapore, data analysts sit at the helm of futuristic control rooms, powered by systems such as IBM’s Intelligent Operations Center and Siemens’ City Cockpit. Monitors pipe in feeds from hundreds or even thousands of surveillance cameras in the city.

Is this really the promise of a ‘smart city’?

These analysts have access to massive databases of citizens’ records. Sensors installed around the city detect pedestrian traffic and suspicious activities. Software programs run analytics on all this data in order to generate alerts and “actionable insights”.

Neither IBM nor the Chicago Police Department responded to a request to comment. But it this is the new model of how policing should be done in an era of “smart cities”?

These analytics can be used to great effect, potentially enhancing the ability of police to make more informed, less biased decisions about law enforcement. But they are often used in dubious ways and for repressive purposes. It is unclear – especially for analytics tools developed and sold by private tech companies – how exactly they even work.

What data are they using? How are they weighing variables? What values and biases are coded into them? Even the companies that develop them can’t answer all those questions, and what they do know can’t be divulged because of trade secrets.

So when police say they are using data-driven techniques to make smarter decisions, what they really mean is they are relying on software that spits out scores and models, without any real understanding of how. This demonstrates a tremendous faith in the veracity of analytics.

Those affected lose their right to individualized treatment, as systems treat them as a mere collection of data points

It is absurd for police not to know what decisions, weights, values and biases are baked into the analytics they use. It obscures the factors and decisions that influence how police operate – eve

Police data could be labelling 'suspects' for crimes they have not committed

This is Episode 12 of Real Future, Fusion’s documentary series about technology and society. More episodes available at realfuture.tv.

There's a new kind of software that claims to help law enforcement agencies reduce crime, by using algorithms to predict where crimes will happen and directing more officers to those areas. It's called "predictive policing," and it's already being used by dozens of police departments all over the country, including the Los Angeles, Chicago, and Atlanta Police Departments.

Aside from the obvious "Minority Report" pre-crime allusions, there has been a tremendous amount of speculation about what the future of predictive policing might hold. Could people be locked up just because a computer model says that they are likely to commit a crime? Could all crime end altogether, because an artificial intelligence gets so good at predicting when crimes will occur?

Some skeptics doubt that predictive policing software actually works as advertised. After all, most crimes occur only in semi-regular patterns, while big, low-frequency crimes like terrorist attacks aren't typically governed by patterns at all, making them much harder for an algorithm to predict.

There is also the question of what happens to communities of color under a predictive policing regime. Brown and black people are already the disproportionate targets of police action, and with predictive policing software, some worry that police could feel even more empowered to spend time looking for crime in neighborhoods populated by minorities.

Advertisement

Although big companies like IBM also make predictive policing tools, one of the most widely deployed products comes from a small Santa Cruz, California firm called PredPol.

The way PredPol works is actually quite simple. It takes in past crime data—only when it happened and what type of crime—and spits out predictions about where future crimes are more likely to occur. It turns those predictions into 500 foot by 500 foot red boxes on a Google map, indicating areas that police officers should patrol when they’re not actively responding to a call. The idea is that if officers focus their attention on an area that’s slightly more likely to see a crime committed than other places, they will reduce the amount of crime in that location.

Advertisement

Police chiefs who have tried PredPol and similar systems swear that it work. For example, the Norcross (GA) Police Department claims it saw a 15-30% reduction in burglaries and robberies after deploying the software.

But I wanted to ask tougher questions about predictive policing—not just whether it helps reduce crime, but how it helps reduce crime, and whether the system could serve as an algorithmic justification for old-school racial profiling by placing more police in minority-populated neighborhoods.

So I went to Santa Cruz, California, where the local police department is using PredPol to patrol the city. I went on a ride-along with Deputy Police Chief Steve Clark, and spoke to local activists who fear that predictive policing software could invite harm, rather than preventing it.

Advertisement

Here's the video of my trip to see the real effects of predictive policing:

To be notified about new episodes of Real Future, like the show on Facebook, or follow it on Twitter.

Predictive Policing: the future of crime-fighting, or the future of racial profiling?

propublica.org · 2016

ON A SPRING AFTERNOON IN 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an unlocked kid’s blue Huffy bicycle and a silver Razor scooter. Borden and a friend grabbed the bike and scooter and tried to ride them down the street in the Fort Lauderdale suburb of Coral Springs.

Just as the 18-year-old girls were realizing they were too big for the tiny conveyances — which belonged to a 6-year-old boy — a woman came running after them saying, “That’s my kid’s stuff.” Borden and her friend immediately dropped the bike and scooter and walked away.

But it was too late — a neighbor who witnessed the heist had already called the police. Borden and her friend were arrested and charged with burglary and petty theft for the items, which were valued at a total of $80.

Compare their crime with a similar one: The previous summer, 41-year-old Vernon Prater was picked up for shoplifting $86.35 worth of tools from a nearby Home Depot store.

Prater was the more seasoned criminal. He had already been convicted of armed robbery and attempted armed robbery, for which he served five years in prison, in addition to another armed robbery charge. Borden had a record, too, but it was for misdemeanors committed when she was a juvenile.

Yet something odd happened when Borden and Prater were booked into jail: A computer program spat out a score predicting the likelihood of each committing a future crime. Borden — who is black — was rated a high risk. Prater — who is white — was rated a low risk.

Two years later, we know the computer algorithm got it exactly backward. Borden has not been charged with any new crimes. Prater is serving an eight-year prison term for subsequently breaking into a warehouse and stealing thousands of dollars’ worth of electronics.

Scores like this — known as risk assessments — are increasingly common in courtrooms across the nation. They are used to inform decisions about who can be set free at every stage of the criminal justice system, from assigning bond amounts — as is the case in Fort Lauderdale — to even more fundamental decisions about defendants’ freedom. In Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia, Washington and Wisconsin, the results of such assessments are given to judges during criminal sentencing.

Rating a defendant’s risk of future crime is often done in conjunction with an evaluation of a defendant’s rehabilitation needs. The Justice Department’s National Institute of Corrections now encourages the use of such combined assessments at every stage of the criminal justice process. And a landmark sentencing reform bill currently pending in Congress would mandate the use of such assessments in federal prisons.

Two Petty Theft Arrests

VERNON PRATER

RISK 3

BRISHA BORDEN

RISK 8

Borden was rated high risk for future crime after she and a friend took a kid’s bike and scooter that were sitting outside. She did not reoffend.

In 2014, then U.S. Attorney General Eric Holder warned that the risk scores might be injecting bias into the courts. He called for the U.S. Sentencing Commission to study their use. “Although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice,” he said, adding, “they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”

The sentencing commission did not, however, launch a study of risk scores. So ProPublica did, as part of a larger examination of the powerful, largely hidden effect of algorithms in American life.

We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.

The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.

When a full range of crimes were taken into account — including misdemeanors such as driving with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.

We also turned up significant racial disparities, just as Holder feared. In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.

The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.

White defendants were mislabeled as low risk more often than black defendants.

Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender. Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind. (Read our analysis.)

The algorithm used to create the Florida risk scores is a product of a for-profit company, Northpointe. The company disputes our analysis.

In a letter, it criticized ProPublica’s methodology and defended the accuracy of its test: “Northpointe does not agree that the results of your analysis, or the claims being made based upon that analysis, are correct or that they accurately reflect the outcomes from the application of the model.”

Northpointe’s software is among the most widely used assessment tools in the country. The company does not publicly disclose the calculations used to arrive at defendants’ risk scores, so it is not possible for either defendants or the public to see what might be driving the disparity. (On Sunday, Northpointe gave ProPublica the basics of its future-crime formula — which includes factors such as education levels, and whether a defendant has a job. It did not share the specific calculations, which it said are proprietary.)

Northpointe’s core product is a set of scores derived from 137 questions that are either answered by defendants or pulled from criminal records. Race is not one of the questions. The survey asks defendants such things as: “Was one of your parents ever sent to jail or prison?” “How many of your friends/acquaintances are taking drugs illegally?” and “How often did you get in fights while at school?” The questionnaire also asks people to agree or disagree with statements such as “A hungry person has a right to steal” and “If people make me angry or lose my temper, I can be dangerous.”

The appeal of risk scores is obvious: The United States locks up far more people than any other country, a disproportionate number of them black. For more than two centuries, the key decisions in the legal process, from pretrial release to sentencing to parole, have been in the hands of human beings guided by their instincts and personal biases.

If computers could accurately predict which defendants were likely to commit new crimes, the criminal justice system could be fairer and more selective about who is incarcerated and for how long. The trick, of course, is to make sure the computer gets it right. If it’s wrong in one direction, a dangerous criminal could go free. If it’s wrong in another direction, it could result in someone unfairly receiving a harsher sentence or waiting longer for parole than is appropriate.

The first time Paul Zilly heard of his score — and realized how much was riding on it — was during his sentencing hearing on Feb. 15, 2013, in court in Barron County, Wisconsin. Zilly had been convicted of stealing a push lawnmower and some tools. The prosecutor recommended a year in county jail and follow-up supervision that could help Zilly with “staying on the right path.” His lawyer agreed to a plea deal.

But Judge James Babler had seen Zilly’s scores. Northpointe’s software had rated Zilly as a high risk for future violent crime and a medium risk for general recidivism. “When I look at the risk assessment,” Babler said in court, “it is about as bad as it could be.”

Then Babler overturned the plea deal that had been agreed on by the prosecution and defense and imposed two years in state prison and three years of supervision.

CRIMINOLOGISTS HAVE LONG TRIED to predict which criminals are more dangerous before deciding whether they should be released. Race, nationality and skin color were often used in making such predictions until about the 1970s, when it became politically unacceptable, according to a survey of risk assessment tools by Columbia University law professor Bernard Harcourt.

In the 1980s, as a crime wave engulfed the nation, lawmakers made it much harder for judges and parole boards to exercise discretion in making such decisions. States and the federal government began instituting mandatory sentences and, in some cases, abolished parole, making it less important to evaluate individual offenders.

But as states struggle to pay for swelling prison and jail populations, forecasting criminal risk has made a comeback.

Two Drug Possession Arrests

DYLAN FUGETT

RISK 3

BERNARD PARKER

RISK 10

Fugett was rated low risk after being arrested with cocaine and marijuana. He was arrested three times on drug charges after that.

Dozens of risk assessments are being used across the nation — some created by for-profit companies such as Northpointe and others by nonprofit organizations. (One tool being used in states including Kentucky and Arizona, called the Public Safety Assessment, was developed by the Laura and John Arnold Foundation, which also is a funder of ProPublica.)

There have been few independent studies of these criminal risk assessments. In 2013, researchers Sarah Desmarais and Jay Singh examined 19 different risk methodologies used in the United States and found that “in most cases, validity had only been examined in one or two studies” and that “frequently, those investigations were completed by the same people who developed the instrument.”

Their analysis of the research through 2012 found that the tools “were moderate at best in terms of predictive validity,” Desmarais said in an interview. And she could not find any substantial set of studies conducted in the United States that examined whether risk scores were racially biased. “The data do not exist,” she said.

Since then, there have been some attempts to explore racial disparities in risk scores. One 2016 study examined the validity of a risk assessment tool, not Northpointe’s, used to make probation decisions for about 35,000 federal convicts. The researchers, Jennifer Skeem at University of California, Berkeley, and Christopher T. Lowenkamp from the Administrative Office of the U.S. Courts, found that blacks did get a higher average score but concluded the differences were not attributable to bias.

The increasing use of risk scores is controversial and has garnered media coverage, including articles by the Associated Press, and the Marshall Project and FiveThirtyEight last year.

Most modern risk tools were originally designed to provide judges with insight into the types of treatment that an individual might need — from drug treatment to mental health counseling.

“What it tells the judge is that if I put you on probation, I’m going to need to give you a lot of services or you’re probably going to fail,” said Edward Latessa, a University of Cincinnati professor who is the author of a risk assessment tool that is used in Ohio and several other states.

But being judged ineligible for alternative treatment — particularly during a sentencing hearing — can translate into incarceration. Defendants rarely have an opportunity to challenge their assessments. The results are usually shared with the defendant’s attorney, but the calculations that transformed the underlying data into a score are rarely revealed.

“Risk assessments should be impermissible unless both parties get to see all the data that go into them,” said Christopher Slobogin, director of the criminal justice program at Vanderbilt Law School. “It should be an open, full-court adversarial proceeding.”

Black Defendants’ Risk Scores

White Defendants’ Risk Scores

These charts show that scores for white defendants were skewed toward lower-risk categories. Scores for black defendants were not. (Source: ProPublica analysis of data from Broward County, Fla.)

Proponents of risk scores argue they can be used to reduce the rate of incarceration. In 2002, Virginia became one of the first states to begin using a risk assessment tool in the sentencing of nonviolent felony offenders statewide. In 2014, Virginia judges using the tool sent nearly half of those defendants to alternatives to prison, according to a state sentencing commission report. Since 2005, the state’s prison population growth has slowed to 5 percent from a rate of 31 percent the previous decade.

In some jurisdictions, such as Napa County, California, the probation department uses risk assessments to suggest to the judge an appropriate probation or treatment plan for individuals being sentenced. Napa County Superior Court Judge Mark Boessenecker said he finds the recommendations helpful. “We have a dearth of good treatment programs, so filling a slot in a program with someone who doesn’t need it is foolish,” he said.

However, Boessenecker, who trains other judges around the state in evidence-based sentencing, cautions his colleagues that the score doesn’t necessarily reveal whether a person is dangerous or if they should go to prison.

“A guy who has molested a small child every day for a year could still come out as a low risk because he probably has a job,” Boessenecker said. “Meanwhile, a drunk guy will look high risk because he’s homeless. These risk factors don’t tell you whether the guy ought to go to prison or not; the risk factors tell you more about what the probation conditions ought to be.”

“I’m surprised [my risk score] is so low. I spent five years in state prison in Massachusetts.” (Josh Ritchie for ProPublica)

Sometimes, the scores make little sense even to defendants.

James Rivelli, a 54-year old Hollywood, Florida, man, was arrested two years ago for shoplifting seven boxes of Crest Whitestrips from a CVS drugstore. Despite a criminal record that included aggravated assault, multiple thefts and felony drug trafficking, the Northpointe algorithm classified him as being at a low risk of reoffending.

“I am surprised it is so low,” Rivelli said when told by a reporter he had been rated a 3 out of a possible 10. “I spent five years in state prison in Massachusetts. But I guess they don’t count that here in Broward County.” In fact, criminal records from across the nation are supposed to be included in risk assessments.

Less than a year later, he was charged with two felony counts for shoplifting about $1,000 worth of tools from Home Depot. He said his crimes were fueled by drug addiction and that he is now sober.

NORTHPOINTE WAS FOUNDED in 1989 by Tim Brennan, then a professor of statistics at the University of Colorado, and Dave Wells, who was running a corrections program in Traverse City, Michigan.

Wells had built a prisoner classification system for his jail. “It was a beautiful piece of work,” Brennan said in an interview conducted before ProPublica had completed its analysis. Brennan and Wells shared a love for what Brennan called “quantitative taxonomy” — the measurement of personality traits such as intelligence, extroversion and introversion. The two decided to build a risk assessment score for the corrections industry.

Brennan wanted to improve on a leading risk assessment score, the LSI, or Level of Service Inventory, which had been developed in Canada. “I found a fair amount of weakness in the LSI,” Brennan said. He wanted a tool that addressed the major theories about the causes of crime.

Brennan and Wells named their product the Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS. It assesses not just risk but also nearly two dozen so-called “criminogenic needs” that relate to the major theories of criminality, including “criminal personality,” “social isolation,” “substance abuse” and “residence/stability.” Defendants are ranked low, medium or high risk in each category.

Two DUI Arrests

GREGORY LUGO

RISK 1

MALLORY WILLIAMS

RISK 6

Lugo crashed his Lincoln Navigator into a Toyota Camry while drunk. He was rated as a low risk of reoffending despite the fact that it was at least his fourth DUI.

As often happens with risk assessment tools, many jurisdictions have adopted Northpointe’s software before rigorously testing whether it works. New York State, for instance, started using the tool to assess people on probation in a pilot project in 2001 and rolled it out to the rest of the state’s probation departments — except New York City — by 2010. The state didn’t publish a comprehensive statistical evaluation of the tool until 2012. The study of more than 16,000 probationers found the tool was 71 percent accurate, but it did not evaluate racial differences.

A spokeswoman for the New York state division of criminal justice services said the study did not examine race because it only sought to test whether the tool had been properly calibrated to fit New York’s probation population. She also said judges in nearly all New York counties are given defendants’ Northpointe assessments during sentencing.

In 2009, Brennan and two colleagues published a validation study that found that Northpointe’s risk of recidivism score had an accuracy rate of 68 percent in a sample of 2,328 people. Their study also found that the score was slightly less predictive for black men than white men — 67 percent versus 69 percent. It did not examine racial disparities beyond that, including whether some groups were more likely to be wrongly labeled higher risk.

Brennan said it is difficult to construct a score that doesn’t include items that can be correlated with race — such as poverty, joblessness and social marginalization. “If those are omitted from your risk assessment, accuracy goes down,” he said.

In 2011, Brennan and Wells sold Northpointe to Toronto-based conglomerate Constellation Software for an undisclosed sum.

Wisconsin has been among the most eager and expansive users of Northpointe’s risk assessment tool in sentencing decisions. In 2012, the Wisconsin Department of Corrections launched the use of the software throughout the state. It is used at each step in the prison system, from sentencing to parole.

In a 2012 presentation, corrections official Jared Hoy described the system as a “giant correctional pinball machine” in which correctional officers could use the scores at every “decision point.”

Wisconsin has not yet completed a statistical validation study of the tool and has not said when one might be released. State corrections officials declined repeated requests to comment for this article.

Some Wisconsin counties use other risk assessment tools at arrest to determine if a defendant is too risky for pretrial release. Once a defendant is convicted of a felony anywhere in the state, the Department of Corrections attaches Northpointe’s assessment to the confidential presentence report given to judges, according to Hoy’s presentation.

In theory, judges are not supposed to give longer sentences to defendants with higher risk scores. Rather, they are supposed to use the tests primarily to determine which defendants are eligible for probation or treatment programs.

Prediction Fails Differently for Black Defendants

WHITE AFRICAN AMERICAN

Labeled Higher Risk, But Didn’t Re-Offend 23.5% 44.9%

Labeled Lower Risk, Yet Did Re-Offend 47.7% 28.0%

Overall, Northpointe’s assessment tool correctly predicts recidivism 61 percent of the time. But blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend. It makes the opposite mistake among whites: They are much more likely than blacks to be labeled lower risk but go on to commit other crimes. (Source: ProPublica analysis of data from Broward County, Fla.)

But judges have cited scores in their sentencing decisions. In August 2013, Judge Scott Horne in La Crosse County, Wisconsin, declared that defendant Eric Loomis had been “identified, through the COMPAS assessment, as an individual who is at high risk to the community.” The judge then imposed a sentence of eight years and six months in prison.

Loomis, who was charged with driving a stolen vehicle and fleeing from police, is challenging the use of the score at sentencing as a violation of his due process rights. The state has defended Horne’s use of the score with the argument that judges can consider the score in addition to other factors. It has also stopped including scores in presentencing reports until the state Supreme Court decides the case.

“The risk score alone should not determine the sentence of an offender,” Wisconsin Assistant Attorney General Christine Remington said last month during state Supreme Court arguments in the Loomis case. “We don’t want courts to say, this person in front of me is a 10 on COMPAS as far as risk, and therefore I’m going to give him the maximum sentence.”

That is almost exactly what happened to Zilly, the 48-year-old construction worker sent to prison for stealing a push lawnmower and some tools he intended to sell for parts. Zilly has long struggled with a meth habit. In 2012, he had been working toward recovery with the help of a Christian pastor when he relapsed and committed the thefts.

After Zilly was scored as a high risk for violent recidivism and sent to prison, a public defender appealed the sentence and called the score’s creator, Brennan, as a witness.

Brennan testified that he didn’t design his software to be used in sentencing. “I wanted to stay away from the courts,” Brennan said, explaining that his focus was on reducing crime rather than punishment. “But as time went on I started realizing that so many decisions are made, you know, in the courts. So I gradually softened on whether this could be used in the courts or not.”

“Not that I’m innocent, but I just believe people do change.” (Stephen Maturen for ProPublica)

Still, Brennan testified, “I don’t like the idea myself of COMPAS being the sole evidence that a decision would be based upon.”

After Brennan’s testimony, Judge Babler reduced Zilly’s sentence, from two years in prison to 18 months. “Had I not had the COMPAS, I believe it would likely be that I would have given one year, six months,” the judge said at an appeals hearing on Nov. 14, 2013.

Zilly said the score didn’t take into account all the changes he was making in his life — his conversion to Christianity, his struggle to quit using drugs and his efforts to be more available for his son. “Not that I’m innocent, but I just believe people do change.”

FLORIDA’S BROWARD COUNTY, where Brisha Borden stole the Huffy bike and was scored as high risk, does not use risk assessments in sentencing. “We don’t think the [risk assessment] factors have any bearing on a sentence,” said David Scharf, executive director of community programs for the Broward County Sheriff’s Office in Fort Lauderdale.

Broward County has, however, adopted the score in pretrial hearings, in the hope of addressing jail overcrowding. A court-appointed monitor has overseen Broward County’s jails since 1994 as a result of the settlement of a lawsuit brought by inmates in the 1970s. Even now, years later, the Broward County jail system is often more than 85 percent full, Scharf said.

In 2008, the sheriff’s office decided that instead of building another jail, it would begin using Northpointe’s risk scores to help identify which defendants were low risk enough to be released on bail pending trial. Since then, nearly everyone arrested in Broward has been scored soon after being booked. (People charged with murder and other capital crimes are not scored because they are not eligible for pretrial release.)

The scores are provided to the judges who decide which defendants can be released from jail. “My feeling is that if they don’t need them to be in jail, let’s get them out of there,” Scharf said.

Two Shoplifting Arrests

JAMES RIVELLI

RISK 3

ROBERT CANNON

RISK 6

After Rivelli stole from a CVS and was caught with heroin in his car, he was rated a low risk. He later shoplifted $1,000 worth of tools from a Home Depot.

Scharf said the county chose Northpointe’s software over other tools because it was easy to use and produced “simple yet effective charts and graphs for judicial review.” He said the system costs about $22,000 a year.

In 2010, researchers at Florida State University examined the use of Northpointe’s system in Broward County over a 12-month period and concluded that its predictive accuracy was “equivalent” in assessing defendants of different races. Like others, they did not examine whether different races were classified differently as low or high risk.

Scharf said the county would review ProPublica’s findings. “We’ll really look at them up close,” he said.

Broward County Judge John Hurley, who oversees most of the pretrial release hearings, said the scores were helpful when he was a new judge, but now that he has experience he prefers to rely on his own judgment. “I haven’t relied on COMPAS in a couple years,” he said.

Hurley said he relies on factors including a person’s prior criminal record, the type of crime committed, ties to the community, and their history of failing to appear at court proceedings.

ProPublica’s analysis reveals that higher Northpointe scores are slightly correlated with longer pretrial incarceration in Broward County. But there are many reasons that could be true other than judges being swayed by the scores — people with higher risk scores may also be poorer and have difficulty paying bond, for example.

Most crimes are presented to the judge with a recommended bond amount, but he or she can adjust the amount. Hurley said he often releases first-time or low-level offenders without any bond at all.

However, in the case of Borden and her friend Sade Jones, the teenage girls who stole a kid’s bike and scooter, Hurley raised the bond amount for each girl from the recommended $0 to $1,000 each.

Hurley said he has no recollection of the case and cannot recall if the scores influenced his decision.

Sade Jones, who had never been arrested before, was rated a medium risk. (Josh Ritchie for ProPublica)

The girls spent two nights in jail before being released on bond.

“We literally sat there and cried” the whole time they were in jail, Jones recalled. The girls were kept in the same cell. Otherwise, Jones said, “I would have gone crazy.” Borden declined repeated requests to comment for this article.

Jones, who had never been arrested before, was rated a medium risk. She completed probation and got the felony burglary charge reduced to misdemeanor trespassing, but she has still struggled to find work.

“I went to McDonald’s and a dollar store, and they all said no because of my background,” she said. “It’s all kind of difficult and unnecessary.”

Machine Bias

For the last four years, the Chicago Police Department has kept a list of people they believe are most likely to be involved in a shooting. The list—known as the “heat list” or the Strategic Subject List—was developed using a secret algorithm and contains the names of over a thousand people at any given time.

In record-breaking year for gun violence rates, Superintendent Eddie Johnson has praised the list and the department’s use of big data in predicting crimes. In May, the CPD reported three out of four shooting victims in 2016 were on the Strategic Subject List. The number of people arrested in connection to shootings was even more impressive: 80 percent were on the SSL, they say.

Though the department has been quick to tout the list’s accuracy, there is no way to independently verify their claims. The names on the list are private and the department won’t even explain what variables are used to determine a person’s ranking on the list. In other words, we’ve had to take the department at their word that their big data program works.

That changed this week with the release of a report by RAND Corporation in the Journal of Experimental Criminology. In the first independent audit of the department’s SSL, researchers found a 2013 version of the list to be not nearly as valuable as the department claims.

“Individuals on the SSL are not more or less likely to become a victim of a homicide or shooting than the [control] group,” the authors write. Police use of the list also had no effect on citywide violence levels in Chicago.

While the study’s authors found that individuals on the SSL were indeed more likely to be arrested for a shooting, the researchers guessed that this was happening because officers were using the list as leads to close cases. Superintendent Johnson has said as recently as last month, however, that the list is not being used to target people for arrest.

“One of the major findings [of the study] was that the police on the ground, the people in the field, do not get a lot of training about how to use this list and what it means,” says lead author Jessica Saunders.

When asked for a comment yesterday afternoon, CPD spokesman Anthony J. Guglielmi said he was unaware of the report. In a statement released today, which includes a point-by-point response, police emphasize that the SSL has changed significantly since the 2013 version that is the subject of RAND’s analysis.

“The evaluation was conducted on an early model of the algorithm that is no longer in use today… We are currently using SSL Version 5, which is more than 3 times as accurate as the version reviewed by RAND,” the statement says.

But Saunders says that her findings can still apply to the tool CPD is using today.

“The findings of this study are probably not changed by making the list better,” says Saunders. “What we really found was that they didn’t know what to do with the list and there was no intervention tied to the list. So in my opinion, it almost doesn’t matter how good the list is, if you don’t know what to do with it.”

Saunders says that the CPD must carefully consider what interventions it uses on people on the list in order to prevent crime. Tactics such as call-ins and home visits, which the CPD sometimes uses in conjunction with the list, cannot be effective if they are not done across the board.

In its official statement, CPD says this intervention strategy has likewise evolved along with the SSL since 2013: they are now used in every police district, and metrics on the interventions are now “fully integrated within our CompStat accountability framework and weekly Compstat meetings.”

Still, those who study big data policing say this week’s report from RAND is troubling.

“I think there’s a real question now after [the] RAND [report],” says Andrew Ferguson, a law professor at the University of the District of Columbia in Washington. “We don’t know how effective these lists are except for what the police tell us. This is one of the first analyses of the risk factors.”

Police departments and criminal justice organizations across the country are increasingly using algorithms like Chicago’s to predict the locations and perpetrators of future crimes. And in an era marked by the police shootings of young black men, big data has been held up as a way to avoid racial profiling and reduce violence.

But few cities make their algorithms available to the public or to organizations that work with communities most at risk for violence. This week’s RAND study is one of only two independent evaluations of predictive policing programs that have been done nationwide.

Given the shroud of secrecy that covers big data policing, many have questioned the algorithms’ accuracy and fairness. A ProPublica investigation earlier this year found a risk assessment algorithm used in Florida to have significant racial disparities and to be only slightly more accurate than a coin flip.

The Electronic Frontier Foundation and the American Civil Liberties Union of Illinois have both voiced concerns about how Chicago’s Strategic Subject List handles the issue of race. The Chicago Police Department has said that race is not one of the 11 weighted variables it uses to determine a person’s ranking on the list, but other variables they are using may code for race in less explicit ways. For example, a person’s address in a highly segregated neighborhood in Chicago could indicate a person’s wealth and race.

“The RAND analysis should be the beginning, not the end of determining whether or not these systems work,” says Ferguson. “The underlying idea of prioritizing police resources on those most at risk makes a ton of sense. The downside of getting that prediction wrong means a lot of wasted resources. So I think we need to figure out whether it’s possible to prioritize risk and then really decide whether police are really the right remedy once we’ve identified risks through big data policing.”

Study Casts Doubt on Chicago Police’s Secretive “Heat List”

“Predictive policing” is happening now — and police could learn a lesson from Minority Report.

David Robinson Blocked Unblock Follow Following Aug 31, 2016

In the movie Minority Report, mutants in a vat look into the future, and tell Tom Cruise who is about to commit a crime, so he can arrest the offender before the crime happens. Spoiler alert: Those mutant fortune tellers turn out not to be infallible, but the cops treat them as though they were. Law enforcement’s blind faith in a tool that doesn’t always work — a tool that can easily finger the wrong person, with terrible results — provides the central tension for that blockbuster film.

Real police are now at risk of making a similar mistake. But this time the results are in the street, not at the box office. Today’s cops don’t rely on soothsayers, but they do increasingly use software to forecast where future crimes may happen, or who may be involved. And they are nowhere near skeptical enough of the forecasts those computers are making.

Today, a national coalition of 17 advocacy groups is raising the alarm about this, with a shared statement highlighting six ways that this trend threatens civil rights.

These groups all agree — the current rush toward predictive policing is wrong.

Upturn, where I work, helped draft the new statement, and today we’re releasing a report designed to empower you to get beyond the hype and make up your own mind about what the industry calls “predictive policing.” As lead author of that report, here’s what I’d like you to know.

Police can easily trust these tools too much.

People often overestimate the accuracy, objectivity, and reliability of information that comes from a computer, including from a predictive policing system. The RAND Corporation, which has done the best studies to date, is a famously buttoned-down kind of place. But they’re just as bothered by this problem as I am. They write: “[p]redictive policing has been so hyped that the reality cannot live up to the hyperbole. There is an underlying, erroneous assumption that advanced mathematical and computational power is both necessary and sufficient to reduce crime [but in fact] the predictions are only as good as the data used to make them.”

Police data about crime paints a distorted picture, which can easily lead to discriminatory policing patterns.

These systems only predict which crimes police will detect in the future — and that creates a distorted picture. As an eminent criminologist once said, “[i]t has been known for more than 30 years that, in general, police statistics are poor measures of true levels of crime.”

In the context of predictive policing, statistics generated by the policing process are often treated as though they are records of underlying criminal behavior. But these numbers are a direct record of how law enforcement responds to particular crimes, and they are only indirect evidence about what is actually happening in the world. Criminologists argue that “[a]rrest, conviction, and incarceration data are most appropriately viewed as measures of official response to criminal behavior.”

Of course, it makes sense for police to be responsive to community needs. Different communities served by the same police department often do have different needs, and different levels of need. That means officers will see more of what goes on in some neighborhoods than they will in others. But it is dangerous to treat the results of that process as though they were a completely neutral reflection of the world.

As data scientist Cathy O’Neil explains, “people have too much trust [that] numbers [will] be intrinsically objective.”

Rather than changing their tactics, police who use predictive tools have tended to focus on generating more citations and arrests. I read everything I could find about how police are actually using predictive policing tools. The consistent answer was that police aren’t being guided toward different or more humane tactics. Instead, where the computer says to focus, the police do more enforcement. Which worsens the data problem.

Data could be used in ways that strengthen civil rights, but we’re missing that opportunity.

This, to me, is one of the most exciting things we found in our research. To quote from our report:

In most of the nation, police currently measure outcomes and assess performance based on only some of the activities, costs, and benefits that matter in policing…

Serious violent crimes will always be important. But violent crime doesn’t reflect the full scope of community concerns…

[E]xperts on police performance measurement have long argued that police should track all uses of coercive authority so they can better promote public safety with minimum coercion…. And research on police performance measurement consistently calls for surveying victims to gather their feedback on the police officers with whom they interact.

Beyond the basic goal of constitutional, lawful policing, measuring factors like these could allow the poli

“Predictive policing” is happening now - and police could learn a lesson from Minority Report.

rss.onlinelibrary.wiley.com · 2016

In late 2013, Robert McDaniel – a 22‐year‐old black man who lives on the South Side of Chicago – received an unannounced visit by a Chicago Police Department commander to warn him not to commit any further crimes. The visit took McDaniel by surprise. He had not committed a crime, did not have a violent criminal record, and had had no recent contact with law enforcement. So why did the police come knocking?

It turns out that McDaniel was one of approximately 400 people to have been placed on Chicago Police Department's “heat list”. These individuals had all been forecast to be potentially involved in violent crime, based on an analysis of geographic location and arrest data. The heat list is one of a growing suite of predictive “Big Data” systems used in police departments across the USA and in Europe to attempt what was previously thought impossible: to stop crime before it occurs.1

This seems like the sort of thing citizens would want their police to be doing. But predictive policing software – and the policing tactics based on it – has raised serious concerns among community activists, legal scholars, and sceptical police chiefs. These concerns include: the apparent conflict with protections against unlawful search and seizure and the concept of reasonable suspicion; the lack of transparency from both police departments and private firms regarding how predictive policing models are built; how departments utilise their data; and whether the programs unnecessarily target specific groups more than others.

But there is also the concern that police‐recorded data sets are rife with systematic bias. Predictive policing software is designed to learn and reproduce patterns in data, but if biased data is used to train these predictive models, the models will reproduce and in some cases amplify those same biases. At best, this renders the predictive models ineffective. At worst, it results in discriminatory policing.

Bias in police‐recorded data

Decades of criminological research, dating to at least the nineteenth century, have shown that police databases are not a complete census of all criminal offences, nor do they constitute a representative random sample.2-5 Empirical evidence suggests that police officers – either implicitly or explicitly – consider race and ethnicity in their determination of which persons to detain and search and which neighbourhoods to patrol.6, 7

If police focus attention on certain ethnic groups and certain neighbourhoods, it is likely that police records will systematically over‐represent those groups and neighbourhoods. That is, crimes that occur in locations frequented by police are more likely to appear in the database simply because that is where the police are patrolling.

Bias in police records can also be attributed to levels of community trust in police, and the desired amount of local policing – both of which can be expected to vary according to geographic location and the demographic make‐up of communities. These effects manifest as unequal crime reporting rates throughout a precinct. With many of the crimes in police databases being citizen‐reported, a major source of bias may actually be community‐driven rather than police‐driven. How these two factors balance each other is unknown and is likely to vary with the type of crime. Nevertheless, it is clear that police records do not measure crime. They measure some complex interaction between criminality, policing strategy, and community–police relations.

What is predictive policing?

According to the RAND Corporation, predictive policing is defined as “the application of analytical techniques – particularly quantitative techniques – to identify likely targets for police intervention and prevent crime or solve past crimes by making statistical predictions”.13 Much like how Amazon and Facebook use consumer data to serve up relevant ads or products to consumers, police departments across the United States and Europe increasingly utilise software from technology companies, such as PredPol, Palantir, HunchLabs, and IBM to identify future offenders, highlight trends in criminal activity, and even forecast the locations of future crimes.

What is a synthetic population?

A synthetic population is a demographically accurate individual‐level representation of a real population – in this case, the residents of the city of Oakland. Here, individuals in the synthetic population are labelled with their sex, household income, age, race, and the geo‐coordinates of their home. These characteristics are assigned so that the demographic characteristics in the synthetic population match data from the US Census at the highest geographic resolution possible.

How do we estimate the number of drug users?

In order to combine the NSDUH survey with our synthetic population, we first fit a model to the NSDUH data that predicts an individual's probability of drug use within the past month based on their demographic characteristics (i.e. sex, household income, age, and race). Then, we apply this model to each individual in the synthetic population to obtain an estimated probability of drug use for every synthetic person in Oakland. These estimates are based on the assumption that the relationship between drug use and demographic characteristics is the same at the national level as it is in Oakland. While this is probably not completely true, contextual knowledge about the local culture in Oakland leads us to believe that, if anything, drug use is even more widely and evenly spread than indicated by national‐level data. While some highly localised “hotspots” of drug use may be missed by this approach, we have no reason to believe the location of those should correlate with the locations indicated by police data.

Machine learning algorithms of the kind predictive policing software relies upon are designed to learn and reproduce patterns in the data they are given, regardless of whether the data represents what the model's creators believe or intend. One recent example of intentional machine learning bias is Tay, Microsoft's automated chatbot launched earlier this year. A coordinated effort by the users of 4chan – an online message board with a reputation for crass digital pranks – flooded Tay with misogynistic and otherwise offensive tweets, which then became part of the data corpus used to train Tay's algorithms. Tay's training data quickly became unrepresentative of the type of speech its creators had intended. Within a day, Tay's Twitter account was put on hold because it was generating similarly unsavoury tweets.

A prominent case of unintentionally unrepresentative data can be seen in Google Flu Trends – a near real‐time service that purported to infer the intensity and location of influenza outbreaks by applying machine learning models to search volume data. Despite some initial success, the models completely missed the 2009 influenza A–H1N1 pandemic and consistently over‐predicted flu cases from 2011 to 2014. Many attribute the failure of Google Flu Trends to internal changes to Google's recommendation systems, which began suggesting flu‐related queries to people who did not have flu.8 In this case, the cause of the biased data was self‐induced rather than internet hooliganism. Google's own system had seeded the data with excess flu‐related queries, and as a result Google Flu Trends began inferring flu cases where there were none.

In both examples the problem resides with the data, not the algorithm. The algorithms were behaving exactly as expected – they reproduced the patterns in the data used to train them. Much in the same way, even the best machine learning algorithms trained on police data will reproduce the patterns and unknown biases in police data. Because this data is collected as a by‐product of police activity, predictions made on the basis of patterns learned from this data do not pertain to future instances of crime on the whole. They pertain to future instances of crime that becomes known to police. In this sense, predictive policing (see “What is predictive policing?”) is aptly named: it is predicting future policing, not future crime.

To make matters worse, the presence of bias in the initial training data can be further compounded as police departments use biased predictions to make tactical policing decisions. Because these predictions are likely to over‐represent areas that were already known to police, officers become increasingly likely to patrol these same areas and observe new criminal acts that confirm their prior beliefs regarding the distributions of criminal activity. The newly observed criminal acts that police document as a result of these targeted patrols then feed into the predictive policing algorithm on subsequent days, generating increasingly biased predictions. This creates a feedback loop where the model becomes increasingly confident that the locations most likely to experience further criminal activity are exactly the locations they had previously believed to be high in crime: selection bias meets confirmation bias.

Predictive policing case study

How biased are police data sets? To answer this, we would need to compare the crimes recorded by police to a complete record of all crimes that occur, whether reported or not. Efforts such as the National Crime Victimization Survey provide national estimates of crimes of various sorts, including unreported crime. But while these surveys offer some insight into how much crime goes unrecorded nationally, it is still difficult to gauge any bias in police data at the local level because there is no “ground truth” data set containing a representative sample of local crimes to which we can compare the police databases.

We needed to overcome this particular hurdle to assess whether our claims about the effects of data bias and feedback in predictive policing were grounded in reality. Our solution was to combine a demographically representative synthetic population of Oakland, California (see “What is a synthetic population?”) with survey data from the 2011 National Survey on Drug Use and Health (NSDUH). This approach allowed us to obtain high‐resolution estimates of illicit drug use from a non‐criminal justice, population‐based data source (see “How do we estimate the number of drug users?”) which we could then compare with police records. In doing so, we find that drug crimes known to police are not a representative sample of all drug crimes.

While it is likely that estimates derived from national‐level data do not perfectly represent drug use at the local level, we still believe these estimates paint a more accurate picture of drug use in Oakland than the arrest data for several reasons. First, the US Bureau of Justice Statistics – the government body responsible for compiling and analysing criminal justice data – has used data from the NSDUH as a more representative measure of drug use than police reports.2 Second, while arrest data is collected as a by‐product of police activity, the NSDUH is a well‐funded survey designed using best practices for obtaining a statistically representative sample. And finally, although there is evidence that some drug users do conceal illegal drug use from public health surveys, we believe that any incentives for such concealment apply much more strongly to police records of drug use than to public health surveys, as public health officials are not empowered (nor inclined) to arrest those who admit to illicit drug use. For these reasons, our analysis continues under the assumption that our public health‐derived estimates of drug crimes represent a ground truth for the purpose of comparison.

Figure 1(a) shows the number of drug arrests in 2010 based on data obtained from the Oakland Police Department; Figure 1(b) shows the estimated number of drug users by grid square. From comparing these figures, it is clear that police databases and public health‐derived estimates tell dramatically different stories about the pattern of drug use in Oakland. In Figure 1(a), we see that drug arrests in the police database appear concentrated in neighbourhoods around West Oakland (1) and International Boulevard (2), two areas with largely non‐white and low‐income populations. These neighbourhoods experience about 200 times more drug‐related arrests than areas outside of these clusters. In contrast, our estimates (in Figure 1(b)) suggest that drug crimes are much more evenly distributed across the city. Variations in our estimated number of drug users are driven primarily by differences in population density, as the estimated rate of drug use is relatively uniform across the city. This suggests that while drug crimes exist everywhere, drug arrests tend to only occur in very specific locations – the police data appear to disproportionately represent crimes committed in areas with higher populations of non‐white and low‐income residents.

To investigate the effect of police‐recorded data on predictive policing models, we apply a recently published predictive policing algorithm to the drug crime records in Oakland.9 This algorithm was developed by PredPol, one of the largest vendors of predictive policing systems in the USA and one of the few companies to publicly release its algorithm in a peer‐reviewed journal. It has been described by its founders as a parsimonious race‐neutral system that uses “only three data points in making predictions: past type of crime, place of crime and time of crime. It uses no personal information about individuals or groups of individuals, eliminating any personal liberties and profiling concerns.” While we use the PredPol algorithm in the following demonstration, the broad conclusions we draw are applicable to any predictive policing algorithm that uses unadjusted police records to predict future crime.

The PredPol algorithm, originally based on models of seismographic activity, uses a sliding window approach to produce a one‐day‐ahead prediction of the crime rate across locations in a city, using only the previously recorded crimes. The areas with the highest predicted crime rates are flagged as “hotspots” and receive additional police attention on the following day. We apply this algorithm to Oakland's police database to obtain a predicted rate of drug crime for every grid square in the city for every day in 2011. We record how many times each grid square would have been flagged by PredPol for targeted policing. This is shown in Figure 2(a).

We find that rather than correcting for the apparent biases in the police data, the model reinforces these biases. The locations that are flagged for targeted policing are those that were, by our estimates, already over‐represented in the historical police data. Figure 2(b) shows the percentage of the population experiencing targeted policing for drug crimes broken down by race. Using PredPol in Oakland, black people would be targeted by predictive policing at roughly twice the rate of whites. Individuals classified as a race other than white or black would receive targeted policing at a rate 1.5 times that of whites. This is in contrast to the estimated pattern of drug use by race, shown in Figure 2(c), where drug use is roughly equivalent across racial classifications. We find similar results when analysing the rate of targeted policing by income group, with low‐income households experiencing targeted policing at disproportionately high rates. Thus, allowing a predictive policing algorithm to allocate police resources would result in the disproportionate policing of low‐income communities and communities of colour.

The results so far rely on one implicit assumption: that the presence of additional policing in a location does not change the number of crimes that are discovered in that location. But what if police officers have incentives to increase their productivity as a result of either internal or external demands? If true, they might seek additional opportunities to make arrests during patrols. It is then plausible that the more time police spend in a location, the more crime they will find in that location.

We can investigate the consequences of this scenario through simulation. For each day of 2011, we assign targeted policing according to the PredPol algorithm. In each location where targeted policing is sent, we increase the number of crimes observed by 20%. These additional simulated crimes then become part of the data set that is fed into PredPol on subsequent days and are factored into future forecasts. We study this phenomenon by considering the ratio of the predicted daily crime rate for targeted locations to that for non‐targeted locations. This is shown in Figure 3, where large values indicate that many more crimes are predicted in the targeted locations relative to the non‐targeted locations. This is shown separately for the original data (baseline) and the described simulation. If the additional crimes that were found as a result of targeted policing did not affect future predictions, the lines for both scenarios would follow the same trajectory. Instead, we find that this process causes the PredPol algorithm to become increasingly confident that most of the crime is contained in the targeted bins. This illustrates the feedback loop we described previously.

Discussion

We have demonstrated that predictive policing of drug crimes results in increasingly disproportionate policing of historically over‐policed communities. Over‐policing imposes real costs on these communities. Increased police scrutiny and surveillance have been linked to worsening mental and physical health;10, 11 and, in the extreme, additional police contact will create additional opportunities for police violence in over‐policed areas.12 When the costs of policing are disproportionate to the level of crime, this amounts to discriminatory policy.

In the past, police have relied on human analysts to allocate police resources, often using the same data that would be used to train predictive policing models. In many cases, this has also resulted in unequal or discriminatory policing. Whereas before, a police chief could reasonably be expected to justify policing decisions, using a computer to allocate police attention shifts accountability from departmental decision‐makers to black‐box machinery that purports to be scientific, evidence‐based and race‐neutral. Although predictive policing is simply reproducing and magnifying the same biases the police have historically held, filtering this decision‐making process through sophisticated software that few people understand lends unwarranted legitimacy to biased policing strategies.

The impact of poor data on analysis and prediction is not a new concern. Every student who has taken a course on statistics or data analysis has heard the old adage “garbage in, garbage out”. In an era when an ever‐expanding array of statistical and machine learning algorithms are presented as panaceas to large and complex real‐world problems, we must not forget this fundamental lesson, especially when doing so can result in significant negative consequences for society.

To predict and serve?

Natalie Behring/Getty

Algorithms have taken hold over our lives whether we appreciate it or not.

When Facebook delivers us clickbait and conspiracy theories, it's an algorithm deciding what you're interested in.

When Uber ratchets up rush-hour prices, it's the service's algorithm kicking in to maximize profits.

When ads for shoes you can't afford follow you around the internet until you give in, it's an algorithm tracking your course.

Algorithms are also taking over policing. In cities like Los Angeles, Atlanta and Philadelphia, "predictive policing" algorithms comb through past crime data to tell officers which people and places are most at risk for future crimes.

The most popular is PredPol, an algorithm developed by the Los Angeles Police Department in collaboration with local universities that takes in hard data about where and when crimes happened and then makes a "hotspot" map of where crime will likely happen next.

But according to a study to be published later this month in the academic journal Significance, PredPol may merely be reinforcing bad police habits. When researchers from the Human Rights Data Analysis Group — a nonprofit dedicated to using science to analyze human-rights violations around the world — applied the tool to crime data in Oakland, the algorithm recommended that police deploy officers to neighborhoods with mostly black residents. As it happens, police in Oakland were already sending officers into these areas.

"These models are supposed to give you some unseen insight into where crime is supposed to be," William Isaac, one of the report's co-authors, said in an interview. "But it's just common-sense stuff, and we make a case that these software suites are basically used as a tool to validate police decisions."

Using a publicly-available version of PredPol's algorithm, researchers Isaac and Kristian Lum used 2010 arrest data from Oakland to predict where crimes would occur in 2011. To compare that map with what's actually going down in Oakland, researchers used data from the Census and the National Crime Victimization Survey to create a heat map showing where drug use in the city was most prevalent in 2011.

But according to a study to be published later this month in the academic journal Significance, PredPol may merely be reinforcing bad police habits. When researchers from the Human Rights Data Analysis Group — a nonprofit dedicated to using science to analyze human-rights violations around the world — applied the tool to crime data in Oakland, the algorithm recommended that police deploy officers to neighborhoods with mostly black residents. As it happens, police in Oakland were already sending officers into these areas.

"These models are supposed to give you some unseen insight into where crime is supposed to be," William Isaac, one of the report's co-authors, said in an interview. "But it's just common-sense stuff, and we make a case that these software suites are basically used as a tool to validate police decisions."

Using a publicly-available version of PredPol's algorithm, researchers Isaac and Kristian Lum used 2010 arrest data from Oakland to predict where crimes would occur in 2011. To compare that map with what's actually going down in Oakland, researchers used data from the Census and the National Crime Victimization Survey to create a heat map showing where drug use in the city was most prevalent in 2011.

A CU Boulder student is arrested for trespassing on the University of Colorado campus after authorities tried to squelch a huge marijuana smoke-in in Boulder, Colorado April 20, 2012 REUTERS/Rick Wilking

In an ideal world, the maps would be similar. But in fact, PredPol directed police to black neighborhoods like West Oakland and International Boulevard instead of zeroing in on where drug crime actually occurred. Predominantly white neighborhoods like Rockridge and Piedmont got a pass, even though white people use illicit drugs at higher rates than minorities.

To see how actual police practices in Oakland matched up with PredPol's recommendations, researchers also compared PredPol's map to a map of where Oakland Police arrested people for drug crimes. The maps were strikingly similar. Regardless of where crime is happening, predominantly black neighborhoods have about 200 times more drug arrests than other Oakland neighborhoods. In other words, police in Oakland are already doing what PredPol's map suggested — over-policing black neighborhoods — rather than zeroing in on where drug crime is happening.

"If you were to look at the data and where they're finding drug crime, it's not the same thing as where the drug crime actually is," Lum said in an interview. "Drug crime is everywhere, but police only find it where they're looking."

PredPol did not respond to Mic's request for comment.

To be clear, Oakland does not currently use PredPol — researchers merely used Oakland as an example of what happens when you apply PredPol to a major metropolitan area. Dozens of othe

Crime-prediction tool may be reinforcing discriminatory policing

by Jessica Saunders

Consider it the real-life “Minority Report”: Chicago police say they're successfully using big data to predict who will get shot — and who will do the shooting. But life is more complicated than the movies. The statistics that police tout to say the program works mask the fact that society is a long way from being able to prevent crime, even if police have a strong idea of who might be involved.

Chicago police assert that three out of four shooting victims in 2016 were on the department's secret “heat list” of more than 1,000 people. And 80 percent of those arrested in connection to shootings were on the list, they say, but there has been no independent verification. Yet if that were the case, why is 2016 on track to be the most violent year in Chicago's recorded history?

This question was put to test in a recent RAND Corporation study of the Chicago program, and the results are not encouraging.

No algorithm is likely to ever predict with absolute certainty the who-when-where of a crime. But researchers have made great progress at identifying who is at heightened risk for both criminal perpetration and victimization. By calculating how often a person has been arrested with someone who later became a homicide victim, Illinois Institute of Technology researchers have identified a small group of people who are up to 500 times more likely to be the victim of a gun-related homicide than the average Chicago resident.

Less is known about how to reduce gun violence for such a high-risk population. A 2009 study on gun violence in Chicago found that a popular intervention that brings offenders to “notification forums,” which relay the enhanced punishment they will receive if they commit a crime, can reduce reincarceration by as much as 30 percent. (While reducing reincarceration and preventing homicide are two different things, this strategy is the closest to what Chicago is proposing to do with their list. They propose to have the police deliver customized letters to offenders containing their criminal history and the punishments they will receive if they reoffend, along with contact information for social services.)

Given those developments, it was exciting to have the opportunity to independently evaluate Chicago Police's predictive policing program. To make a long story short: It didn't work.

The Chicago Police identified 426 people as being at the highest risk for gun violence, with the intention of providing them with prevention services. In a city of over 2.7 million, that's a manageable number of people to focus on. However, the Chicago Police failed to provide any services or programming. Instead they increased surveillance and arrests — moves that did not result in any perceptible change in gun violence during the first year of the program, according to the RAND study.

The names of only three of the 405 homicide victims murdered between March 2013 and March 2014 were on the Chicago police's list, while 99 percent of the homicide victims were not. So even if the police knew how to prevent these murders, only three people would have been saved — and the other 402 would not have been. In a recent news release, Chicago police dismissed the conclusions ( PDF ) of RAND's findings by saying the department has more than doubled the predictive accuracy of its list and is going to start providing better intervention programming. Even if those improvements are real, the drop in crime will be almost imperceptible.

Here's why: Consider the number of homicides that would be prevented if the list's accuracy has doubled over the 2013 pilot and the police actually deliver an intervention program that is 30 percent effective. That would prevent fewer than two murders per year, a drop of less than 1 percent in the city's overall homicide rate.

To achieve even a 5 percent drop in the city's homicide rate, enormous leaps in both prediction and intervention effectiveness are necessary. In fact, the list would have to be 10 times more accurate than it was in the 2013 pilot — and prevention efforts would need to be five times more effective than current estimates. And after all that improvement — here's how many lives would be saved: 21. In a city that reported 468 murders last year, that would be tremendous progress but hardly the definitive solution.

For significant drops in citywide homicide rates, monumental — not incremental — improvements in predictive policing are needed. Preventing even one killing is laudable. But neither the police nor the public should expect predictive policing alone to have a major impact on overall homicide rates anytime soon.

Jessica Saunders is a senior criminologist at the nonprofit, nonpartisan RAND Corporation and an author of “Predictions put into practice: A quasi-experimental evaluation of Chicago's predictive policing program,” published in September in The Journal of Experimental Criminology.

This commentary originally appeared on U.S. News & World Report on October 7, 2

Pitfalls of Predictive Policing

During an October shift, Los Angeles police Sgt. Charles Coleman of the Foothill Division speaks with Clarance Dolberry, wearing baseball cap, and Veronica De Leon, donning a Mardi Gras mask, at a bus stop. Software that predicts possible future crimes helps guide where he patrols. (Patrick T. Fallon for The Washington Post)

Sgt. Charles Coleman popped out of his police SUV and scanned a trash-strewn street popular with the city’s homeless, responding to a crime that hadn’t yet happened.

It wasn’t a 911 call that brought the Los Angeles Police Department officer to this spot, but a whirring computer crunching years of crime data to arrive at a prediction: An auto theft or burglary would probably occur near here on this particular morning.

Hoping to head it off, Coleman inspected a line of ramshackle RVs used for shelter by the homeless, roused a man sleeping in a pickup truck and tapped on the side of a shack made of plywood and tarps.

“How things going, sweetheart?” he asked a woman who ambled out. Coleman listened sympathetically as she described how she was nearly raped at knifepoint months earlier, saying the area was “really tough” for a woman.

Soon, Coleman was back in his SUV on his way to fight the next pre-crime. Dozens of other LAPD officers were doing the same at other spots, guided by the crime prognostication system known as PredPol.

“Predictive policing” represents a paradigm shift that is sweeping police departments across the country. Law enforcement agencies are increasingly trying to forecast where and when crime will occur, or who might be a perpetrator or a victim, using software that relies on algorithms, the same math Amazon uses to recommend books.

“The hope is the holy grail of law enforcement — preventing crime before it happens,” said Andrew G. Ferguson, a University of District of Columbia law professor preparing a book on big data and policing.

Now used by 20 of the nation’s 50 largest police forces by one count, the technologies are at the center of an increasingly heated debate about their effectiveness, potential impact on poor and minority communities, and implications for civil liberties.

Some police departments have hailed PredPol and other systems as instrumental in reducing crime, focusing scarce resources on trouble spots and individuals and replacing officers’ hunches and potential biases with hard data.

But privacy and racial justice groups say there is little evidence the technologies work and note the formulas powering the systems are largely a secret. They are concerned the practice could unfairly concentrate enforcement in communities of color by relying on racially skewed policing data. And they worry that officers who expect a theft or burglary is about to happen may be more likely to treat the people they encounter as potential criminals.

The experiments are one of the most consequential tests of algorithms that are increasingly powerful forces in our lives, determining credit scores, measuring job performance and flagging children that might be abused. The White House has been studying how to balance the benefits and risks they pose.

“The technical capabilities of big data have reached a level of sophistication and pervasiveness that demands consideration of how best to balance the opportunities afforded by big data against the social and ethical questions these technologies raise,” the White House wrote in a recent report.

A seismic shift in policing

It was 6:45 a.m. on a Monday, but the sheet of paper Coleman held in his hands offered a glimpse of how Oct. 24 might go: an auto theft near the corner of Van Nuys and Glenoaks, a burglary at Laurel Canyon and Roscoe and so on.

The crime forecast is produced by PredPol at the beginning of each shift. Red boxes spread across Google maps of the San Fernando Valley, highlighting 500-by-500-square-foot locations where PredPol concluded property crimes were likely.

Sgt. Charles Coleman explains the possible sources of crime on a map for patrols using predictive policing zone maps from the Los Angeles Police Department. (Patrick T. Fallon for The Washington Post)

Predictive policing zone maps used by the Los Angeles Police Department in the LAPD Foothill Division show where crime may occur. (Patrick T. Fallon for The Washington Post)

The forecast is cutting edge, but it is used in the service of an old-fashioned policing philosophy: deterrence. Between calls that day, Coleman and other officers were expected to spend time and engage with people in the roughly 20 boxes PredPol identified around the Foothill Division.

Coleman sat behind the wheel of his SUV, plotting which boxes to hit the way someone consulting a weather map might weigh whether to bring an umbrella.

“It’s not always that we are going to catch someone in the box, but by being there we prevent crime,” said Capt. Elaine Morales, who oversees the Foothill Division.

Foothill is far from the glitz of Hollywood on the northern edge of L.A., but it has been at t

Police are using software to predict crime. Is it a ‘holy grail’ or biased against minorities?

System meant to alleviate police resources disproportionately targets minority communities, raises Fourth Amendment concerns

An Oakland Police patrol car (Photo: Justin Sullivan, Getty Images)

From Los Angeles to New York, there is a quiet revolution underway within police departments across the country.

Just as major tech companies and political campaigns have leveraged data to target potential customers or voters, police departments have increasingly partnered with private firms or created new divisions to develop software to predict who will commit a crime or where crime will be committed before it occurs. While it may sound eerily reminiscent of the Tom Cruise movie Minority Report, the dream of having a tool that, in theory, could more efficiently allocate policing resources would be a boon for departments struggling with shrinking budgets and increasing pressure to be more responsive to their communities.

Not surprisingly, a 2012 survey by the Police Executive Research Forum found that 70% of roughly 200 police agencies planned to implement the use of predictive policing technology in the next two to five years. The technology policy institute Upturn found that at least 20 of the 50 largest police agencies currently use predictive policing technology, and another 150 departments are testing the tools. But, in the rush to adopt these new tools, police departments have failed to address very serious Fourth Amendment concerns and questions of whether predictive policing reveals new "hot spots" unknown to the police or simply concentrates police effort in already over-policed communities.

Guilt by association, not evidence

Contrary to Hollywood’s depiction of predictive policing, police departments do not have a team of psychics who see crimes before they occur. In reality, the predictions are made by a computer program that uses historical police records to make estimates about future crime or criminals. Those predictions then become part of policing strategies that, among other things, send police commanders to people's homes to monitor their activity. In some cases, the suspects receiving additional monitoring have yet to be implicated in any crimes.

In the more commonly used place-based predictive policing, departments are provided estimates of potential crime “hot spots” or locations where a high number of recorded crimes are expected to occur. Departments allocate additional officers to patrol these areas. While these examples may seem like only minor modifications to standard policing practices, predictive policing tools bring about a fundamental shift in how police decide that certain people and communities are deserving of greater police scrutiny and who is accountable for these decisions.

The Fourth Amendment, largely prohibits unreasonable searches and seizures. Many prominent civil liberties groups such as the American Civil Liberties Union have argued that the growth of predictive policing will allow officers in the field to stop suspects under the guise of “Big Data” rather than more stringent legal standards , such as reasonable suspicion. An even more egregious violation of the Fourth Amendment could come through the use of social media by law enforcement to monitor the contacts of known criminals. That could easily open the door to an even larger network of people being monitored who have actually committed no crime, but are seen by law enforcement as guilty by association. How often do we "friend" or "follow" folks whose past is largely a mystery to us?

Use of algorithms shows little crime reduction

More concerning is the lack of solid, independent evidence that predictive policing tools actually reduce crime or reveal hidden patterns in crime that were not already known to law enforcement. While one study co-written by the founders of the proprietary predictive policing software PredPol reported evidence of a small reduction in property crime through the use of their tool, a study conducted by RAND Corporation found no decline in crime rates after the deployment of a similar tool in Shreveport, La. Another peer reviewed study assessing the Chicago Police department’s algorithm that cranks out a suspicious subjects list, or “heat list,” also found no evidence that the program decreased the number of homicides in the city. What the study did find was that the program disproportionately targeted the city’s black residents.

In our new study, we apply a predictive policing algorithm to publicly available data of drug crimes in the city of Oakland, Calif. Predictive policing offered no insight into patterns of crime beyond what the police already knew. In fact, had the city of Oakland implemented predictive policing, the algorithm would have repeatedly sent the police back to exactly the same locations where police had discovered crime in the past. Through over-policing, these tools have the potential to cause greater deterioration of community-police relations, particularly in already over-p

Predictive policing violates more than it protects: Column

Image: Gina Ferazzi/Getty

Tim Birch was six months into his new job as head of research and planning for the Oakland Police Department when he walked into his office and found a piece of easel pad paper tacked onto his wall. Scribbled across the page were the words, "I told you so!"

Paul Figueroa, then the assistant chief of Oakland Police, who sat next door to Birch, was the culprit.

A few months before, in the fall of 2014, Birch had attended a national conference for police chiefs where he was introduced to PredPol, a predictive policing software that several major cities across the US have started to use. It can forecast when and where crimes may occur based on prior crime reports, but the results of its impact on crime reduction have been mixed.

Birch, a former police officer in Daly City, thought it could help Oakland's understaffed and underfunded police force. During the January 2015 budgeting planning process he convinced Mayor Libby Schaaf to earmark $150,000 in the city's budget to fund the software over two years.

But Figueroa was skeptical of the technology. An Oakland native and 25-year veteran of the force, he worried the technology could have unintended consequences—such as disproportionately scrutinizing certain neighborhoods—and erode community trust. Figueroa and Birch had spirited discussions after the January budget proposal about why it wouldn't work in a city with a sordid history of police and community relations, including several misconduct scandals.

"If we have a way to use mathematics to find out where we need to be in order to prevent crime, let's use it."

Birch finally came around to Figueroa's thinking in April 2015 after further research and a newfound understanding of Oakland. He realized the city didn't need to give its people another reason to be suspicious. It was too easy for the public to interpret predictive policing as another form of racial profiling.

He decided to rescind his funding request from Schaaf, telling her the OPD would not be using the software. That's when Figueroa put the note on his wall.

"Maybe we could reduce crime more by using predictive policing, but the unintended consequences [are] even more damaging… and it's just not worth it," Birch said. He said it could lead to even more disproportionate stops of African Americans, Hispanics and other minorities.

The Oakland police's decision runs counter to a broader nationwide trend. Departments in cities like New York, Los Angeles, Atlanta and Chicago are turning to predictive policing software like PredPol as a way to reduce crime by deploying officers and resources more effectively.

A 2013 PredPol pilot in Atlanta was one of the first key tests of the software.

According to a 2014 national survey conducted by the Police Executive Research Forum, a Washington-based think tank made up of police executives, 70 percent of police department representatives surveyed said they expected to implement the technology in the next two to five years. Thirty-eight percent said they were already using it at the time.

But Bay Area departments are raising questions about the effectiveness and dangers of relying on data to prevent crime. San Francisco currently has no plans to use predictive policing technology. Berkeley does not either. Just north of Oakland, the Richmond Police Department canceled its contract with Predpol earlier this year and to the south, the Milpitas Police Department cut its ties with the software maker back in 2014.

These authorities say the software may be able to predict crime, but may not actually help prevent crime because knowing when a crime may occur doesn't necessarily solve the problem of stopping it. Critics of the software also argue it perpetuates racial bias inherent in crime data and the justice system, which could lead to more disproportionate stops of people of color. But police departments who support using PredPol say police presence in these predicted crime zones can potentially deter crime.

PredPol first began as a research project within UCLA's Institute for Pure and Applied Mathematics, which uses math to solve scientific and technology challenges across fields. The research team was trying to figure out if they could predict crime like scientists predict earthquake aftershocks, but they needed data to crunch. That's when they formed an informal partnership with Los Angeles Police Department Captain Sean Malinowski.

In theory, the algorithm is not too different from heat maps that law enforcement have used for years to lay out locations of past crimes. PredPol funnels in data such as the location and time of property crimes and theft from crime reports into an algorithm that analyzes the areas that are at high-risk for future crime. During routine police patrols, officers glance at a laptop inside their car to view "the box," a small red square highlighting a 500 by 500 foot region on their patrol area map. These boxes indicated where and when a crime is most likely to occur

Why Oakland Police Turned Down Predictive Policing

The problem of policing has always been that it’s after-the-fact. If law enforcement officers could be at the right place at the right time, crime could be prevented, lives could be saved, and society would surely be safer. In recent years, predictive policing technology has been touted as just such a panacea. References to Minority Report are apparently obligatory when writing about the topic, but disguise a critical problem: predictive policing isn’t sci-fi; it’s a more elaborate version of existing, flawed practices.

Predictive policing is an umbrella term to describe law enforcement’s use of new big data and machine learning tools. There are two types of tools: person-based and location-based systems. Person-based systems like Chicago’s Strategic Subject List use a variety of risk factors, including social media analysis, to identify likely offenders. A 2015 report stated the Chicago Police D epartment had assembled a list of “roughly 400 individuals identified by certain factors as likely to be involved in violent crime.” This raises a host of civil liberties questions about what degree of latitude police should be granted to perform risk analysis on people with no criminal records. In the future, these questions will become ever more pressing as revelations of threat scores, StingRays and facial recognition technology continue to grab headlines.

In the present, however, the majority of publicly-known predictive policing algorithms are location-based. Twenty of the nation’s fifty largest police departments are known to use such algorithms, all of which rely on historical crime data—things like 911 calls and police reports. Based on data trends, these algorithms direct police to locations that are likely to experience crime at a particular time. Unfortunately, the Department of Justice has estimated that less than half of violent crimes and even fewer household property crimes are reported to the police. An algorithm trying to make predictions based on historical data isn’t actually looking at crime; it’s looking at how police respond to crimes they know about.

This merely reinforces the biases of existing policing practices. In October, the Human Rights Data Analysis Group released a study that applied a predictive policing algorithm to the Oakland Police Department’s drug crime records from 2010. The study found that the algorithm would dispatch officers “almost exclusively to lower income, minority neighborhoods”—despite the fact that drug users are estimated to be widely dispersed throughout the city. The predictive algorithm essentially sent cops to areas they had already made arrests, not identifying new areas where drugs might appear.

The algorithm the researchers analyzed was written by PredPol, one of the largest suppliers of predictive policing systems in the United States, and was chosen for being one of the few algorithms openly published in a scientific journal. PredPol says it uses “only three data points in making predictions: past type of crime, place of crime and time of crime. It uses no personal information about individuals or groups of individuals, eliminating any personal liberties and profiling concerns.” Ironically, these parsimonious standards ensure that the algorithm cannot improve on the historical record; it can only reinforce it.

Some systems, like IBM’s, wisely incorporate other data points like weather and proximity of liquor stores. Unlike PredPol, however, the vast majority of these algorithms are trade secrets and not subject to independent review. The secrecy around the software makes it harder for police departments and local governments to make fully informed decisions. It also bars the public from participating in the decision-making process and sows distrust.

That’s not to say that police departments shouldn’t use software to analyze their data. In fact, a 2015 study found predictive policing technology had significantly aided law enforcement in Los Angeles and Kent, England. In Norcross, Georgia, police claim that they saw a 15 percent reduction in robberies and burglaries within four months of deploying PredPol. The Atlanta Police Department was similarly enthused.

Further development of the technology is inevitable, so local governments and police departments should develop appropriate standards and practices. For starters, these algorithms should not be called ‘predictive.’ They aren’t crystal balls; they’re making forecasts based on limited data. Less obvious data points, like broken streetlights or the presence of trees, should be incorporated to refine these forecasts. As in St. Louis, they shouldn’t be used for minor crimes. Person-based algorithmic forecasts should never be accepted as meeting the reasonable suspicion requirement for detaining an individual, and only data specialists should have access to the software to red

Predictive Policing Is Not as Predictive As You Think

The Truth About Predictive Policing and Race

Sunday, the New York Times published a well-meaning op-ed about the fears of racial bias in artificial intelligence and predictive policing systems. The author, Bärí A. Williams, should be commended for engaging the debate about building “intelligent” computer systems to predict crime, and for framing these developments in racial justice terms. One thing we have learned about new technologies is that they routinely replicate deep-seated social inequalities — including racial discrimination. In just the last year, we have seen facial recognition technologies unable to accurately identify people of color, and familial DNA databases challenged as discriminatory to over-policed populations. Artificial intelligence policing systems will be no different. If you unthinkingly train A.I. models with racially-biased inputs, the outputs will reflect the underlying societal inequality.

But the issue of racial bias and predictive policing is more complicated than what is detailed in the op-ed. I should know. For several years, I have been researching predictive policing because I was concerned about the racial justice impacts of these new technologies. I am still concerned, but think we need to be clear where the real threats exist.

Take, for example, the situation in Oakland, California described in the op-ed. Ms. Williams eloquently writes:

It’s no wonder criminologists have raised red flags about the self-fulfilling nature of using historical crime data.

This hits close to home. An October 2016 study by the Human Rights Data Analysis Group concluded that if the Oakland Police Department used its 2010 record of drug-crimes information as the basis of an algorithm to guide policing, the department “would have dispatched officers almost exclusively to lower-income, minority neighborhoods,” despite the fact that public-health-based estimates suggest that drug use is much more widespread, taking place in many other parts of the city where my family and I live.

Those “lower-income, minority neighborhoods” contain the barbershop where I take my son for his monthly haircut and our favorite hoagie shop. Would I let him run ahead of me if I knew that simply setting foot on those sidewalks would make him more likely to be seen as a criminal in the eyes of the law?

These are honest fears.

If, as the op-ed suggested, Oakland police used drug arrest statistics to forecast where future crime would occur, then its crime predictions would be as racially discriminatory as the arrest activity. In essence, the crime prediction simply would be replicating arrest patterns (where police patrol), not drug use (where people use drugs). Police patterns might, thus, be influenced by socio-economic and racial factors — not the underlying prevalence of the crime. This would be a discriminatory result — which is why it is quite fortunate that Oakland is doing no such thing. In fact, the Human Rights Data Analysis Group (HRDAG) report that Ms. Williams cites is a hypothetical model examining how a predictive policing system could be racially biased. The HRDAG researchers received a lot of positive press about their study because it used a real predictive policing algorithm designed by PredPol, an actual predictive policing company. But, PredPol does not predict drug crimes, and does not use arrests in its algorithm, precisely because the company knows the results would be racially discriminatory. Nor does Oakland use PredPol. So, the hypothetical fear is not inaccurate, but the suggestion that this is the way predictive policing is actually being used around Oakland barbershops is slightly misleading.

Do not misunderstand this to be a minimization of the racial justice problems in Oakland. As Stanford Professor Jennifer Eberhardt and other researchers have shown, the Oakland Police Department has a demonstrated pattern of racial discrimination that impacts who gets stopped, arrested, and handcuffed — and which suggests deep systemic problems. But, linking real fears about racially unfair policing to hypothetical fears about predictive technologies (which are not being used as described) distorts the critique.

Similarly, the op-ed singles out HunchLab as a company which uses artificial intelligence to build predictive policing systems:

These downsides of A.I. are no secret. Despite this, state and local law enforcement agencies have begun to use predictive policing applications fueled by A.I. like HunchLab, which combines historical crime data, moon phases, location, census data and even professional sports team schedules to predict when and where crime will occur and even who’s likely to commit or be a victim of certain crimes.

The problem with historical crime data is that it’s based upon policing practices that already disproportionately hone in on blacks, Latinos, and those who live in low-income areas.

If the police have discriminated in the past, predictive technology reinforces and perpetuates the

The Truth About Predictive Policing and Race

IN THE DECADE after the 9/11 attacks, the New York City Police Department moved to put millions of New Yorkers under constant watch. Warning of terrorism threats, the department created a plan to carpet Manhattan’s downtown streets with thousands of cameras and had, by 2008, centralized its video surveillance operations to a single command center. Two years later, the NYPD announced that the command center, known as the Lower Manhattan Security Coordination Center, had integrated cutting-edge video analytics software into select cameras across the city.

The video analytics software captured stills of individuals caught on closed-circuit TV footage and automatically labeled the images with physical tags, such as clothing color, allowing police to quickly search through hours of video for images of individuals matching a description of interest. At the time, the software was also starting to generate alerts for unattended packages, cars speeding up a street in the wrong direction, or people entering restricted areas.

Over the years, the NYPD has shared only occasional, small updates on the program’s progress. In a 2011 interview with Scientific American, for example, Inspector Salvatore DiPace, then commanding officer of the Lower Manhattan Security Initiative, said the police department was testing whether the software could box out images of people’s faces as they passed by subway cameras and subsequently cull through the images for various unspecified “facial features.”

While facial recognition technology, which measures individual faces at over 16,000 points for fine-grained comparisons with other facial images, has attracted significant legal scrutiny and media attention, this object identification software has largely evaded attention. How exactly this technology came to be developed and which particular features the software was built to catalog have never been revealed publicly by the NYPD.

Now, thanks to confidential corporate documents and interviews with many of the technologists involved in developing the software, The Intercept and the Investigative Fund have learned that IBM began developing this object identification technology using secret access to NYPD camera footage. With access to images of thousands of unknowing New Yorkers offered up by NYPD officials, as early as 2012, IBM was creating new search features that allow other police departments to search camera footage for images of people by hair color, facial hair, and skin tone.

IBM declined to comment on its use of NYPD footage to develop the software. However, in an email response to questions, the NYPD did tell The Intercept that “Video, from time to time, was provided to IBM to ensure that the product they were developing would work in the crowded urban NYC environment and help us protect the City. There is nothing in the NYPD’s agreement with IBM that prohibits sharing data with IBM for system development purposes. Further, all vendors who enter into contractual agreements with the NYPD have the absolute requirement to keep all data furnished by the NYPD confidential during the term of the agreement, after the completion of the agreement, and in the event that the agreement is terminated.”

In an email to The Intercept, the NYPD confirmed that select counterterrorism officials had access to a pre-released version of IBM’s program, which included skin tone search capabilities, as early as the summer of 2012. NYPD spokesperson Peter Donald said the search characteristics were only used for evaluation purposes and that officers were instructed not to include the skin tone search feature in their assessment. The department eventually decided not to integrate the analytics program into its larger surveillance architecture, and phased out the IBM program in 2016.

After testing out these bodily search features with the NYPD, IBM released some of these capabilities in a 2013 product release. Later versions of IBM’s software retained and expanded these bodily search capabilities. (IBM did not respond to a question about the current availability of its video analytics programs.)

Asked about the secrecy of this collaboration, the NYPD said that “various elected leaders and stakeholders” were briefed on the department’s efforts “to keep this city safe,” adding that sharing camera access with IBM was necessary for the system to work. IBM did not respond to a question about why the company didn’t make this collaboration public. Donald said IBM gave the department licenses to apply the system to 512 cameras, but said the analytics were tested on “fewer than fifty.” He added that IBM personnel had access to certain cameras for the sole purpose of configuring NYPD’s system, and that the department put safeguards in place to protect the data, including “non-disclosure agreements for each individual accessing the system; non-disclosure agreements for the companies the vendors worked for; and background checks.”

Civil liberties advocates contend that New Yorkers should have been made aware of the potential use of their physical data for a private company’s development of surveillance technology. The revelations come as a city council bill that would require NYPD transparency about surveillance acquisitions continues to languish, due, in part, to outspoken opposition from New York City Mayor Bill de Blasio and the NYPD.

A rare look inside the New York Police Department's lower Manhattan security center, where cops monitor surveillance cameras, environmental sensors and license plate readers around the clock. Mayor Michael Bloomberg and Police Commissioner Ray Kelly announced that subway cameras are also being monitored in the center -- officially called The Lower Manhattan Security Coordination Center. Modeled after London's "Ring of Steel," the NYPD opened its coordination center in 2008. Today cops monitorfeeds from over 1159 CCTV cameras with the number increasing to 3,000 as the program expands. (Photo by Timothy Fadek/Corbis via Getty Images)Inside the New York City Police Department’s lower Manhattan security center on Sept. 20, 2010, where cops monitor surveillance cameras, environmental sensors, and license plate readers around the clock. Photo: Timothy Fadek/Corbis via Getty Images

Skin Tone Search Technology, Refined on New Yorkers

IBM’s initial breakthroughs in object recognition technology were envisioned for technologies like self-driving cars or image recognition on the internet, said Rick Kjeldsen, a former IBM researcher. But after 9/11, Kjeldsen and several of his colleagues realized their program was well suited for counterterror surveillance.

“After 9/11, the funding sources and the customer interest really got driven toward security,” said Kjeldsen, who said he worked on the NYPD program from roughly 2009 through 2013. “Even though that hadn’t been our focus up to that point, that’s where demand was.”

IBM’s first major urban video surveillance project was with the Chicago Police Department and began around 2005, according to Kjeldsen. The department let IBM experiment with the technology in downtown Chicago until 2013, but the collaboration wasn’t seen as a real business partnership. “Chicago was always known as, it’s not a real — these guys aren’t a real customer. This is kind of a development, a collaboration with Chicago,” Kjeldsen said. “Whereas New York, these guys were a customer. And they had expectations accordingly.”

The NYPD acquired IBM’s video analytics software as one part of the Domain Awareness System, a shared project of the police department and Microsoft that centralized a vast web of surveillance sensors in lower and midtown Manhattan — including cameras, license plate readers, and radiation detectors — into a unified dashboard. IBM entered the picture as a subcontractor to Microsoft subsidiary Vexcel in 2007, as part of a project worth $60.7 million over six years, according to the internal IBM documents.

In New York, the terrorist threat “was an easy selling point,” recalled Jonathan Connell, an IBM researcher who worked on the initial NYPD video analytics installation. “You say, ‘Look what the terrorists did before, they could come back, so you give us some money and we’ll put a camera there.”

A former NYPD technologist who helped design the Lower Manhattan Security Initiative, asking to speak on background citing fears of professional reprisal, confirmed IBM’s role as a “strategic vendor.” “In our review of video analytics vendors at that time, they were well ahead of everyone else in my personal estimation,” the technologist said.

According to internal IBM planning documents, the NYPD began integrating IBM’s surveillance product in March 2010 for the Lower Manhattan Security Coordination Center, a counterterrorism command center launched by Police Commissioner Ray Kelly in 2008. In a “60 Minutes” tour of the command center in 2011, Jessica Tisch, then the NYPD’s director of policy and planning for counterterrorism, showed off the software on gleaming widescreen monitors, demonstrating how it could pull up images and video clips of people in red shirts. Tisch did not mention the partnership with IBM.

During Kelly’s tenure as police commissioner, the NYPD quietly worked with IBM as the company tested out its object recognition technology on a select number of NYPD and subway cameras, according to IBM documents. “We really needed to be able to test out the algorithm,” said Kjeldsen, who explained that the software would need to process massive quantities of diverse images in order to learn how to adjust to the differing lighting, shadows, and other environmental factors in its view. “We were almost using the video for both things at that time, taking it to the lab to resolve issues we were having or to experiment with new technology,” Kjeldsen said.

At the time, the department hoped that video analytics would improve analysts’ ability to identify suspicious objects and persons in real time in sensitive areas, according to Conor McCourt, a retired NYPD counterterrorism sergeant who said he used IBM’s program in its initial stages.

“Say you have a suspicious bag left in downtown Manhattan, as a person working in the command center,” McCourt said. “It could be that the analytics saw the object sitting there for five minutes, and says, ‘Look, there’s an object sitting there.’” Operators could then rewind the video or look at other cameras nearby, he explained, to get a few possibilities as to who had left the object behind.

Over the years, IBM employees said, they started to become more concerned as they worked with the NYPD to allow the program to identify demographic characteristics. By 2012, according to the internal IBM documents, researchers were testing out the video analytics software on the bodies and faces of New Yorkers, capturing and archiving their physical data as they walked in public or passed through subway turnstiles. With these close-up images, IBM refined its ability to search for people on camera according to a variety of previously undisclosed features, such as age, gender, hair color (called “head color”), the presence of facial hair — and skin tone. The documents reference meetings between NYPD personnel and IBM researchers to review the development of body identification searches conducted at subway turnstile cameras.

“We were certainly worried about where the heck this was going,” recalled Kjeldsen. “There were a couple of us that were always talking about this, you know, ‘If this gets better, this could be an issue.’”

According to the NYPD, counterterrorism personnel accessed IBM’s bodily search feature capabilities only for evaluation purposes, and they were accessible only to a handful of counterterrorism personnel. “While tools that featured either racial or skin tone search capabilities were offered to the NYPD, they were explicitly declined by the NYPD,” Donald, the NYPD spokesperson, said. “Where such tools came with a test version of the product, the testers were instructed only to test other features (clothing, eyeglasses, etc.), but not to test or use the skin tone feature. That is not because there would have been anything illegal or even improper about testing or using these tools to search in the area of a crime for an image of a suspect that matched a description given by a victim or a witness. It was specifically to avoid even the suggestion or appearance of any kind of technological racial profiling.” The NYPD ended its use of IBM’s video analytics program in 2016, Donald said.

Donald acknowledged that, at some point in 2016 or early 2017, IBM approached the NYPD with an upgraded version of the video analytics program that could search for people by ethnicity. “The Department explicitly rejected that product,” he said, “based on the inclusion of that new search parameter.” In 2017, IBM released Intelligent Video Analytics 2.0, a product with a body camera surveillance capability that allows users to detect people captured on camera by “ethnicity” tags, such as “Asian,” “Black,” and “White.”

Kjeldsen, the former IBM researcher who helped develop the company’s skin tone analytics with NYPD camera access, said the department’s claim that the NYPD simply tested and rejected the bodily search features was misleading. “We would have not explored it had the NYPD told us, ‘We don’t want to do that,’” he said. “No company is going to spend money where there’s not customer interest.”

Kjeldsen also added that the NYPD’s decision to allow IBM access to their cameras was crucial for the development of the skin tone search features, noting that during that period, New York City served as the company’s “primary testing area,” providing the company with considerable environmental diversity for software refinement.

“The more different situations you can use to develop your software, the better it’s going be,” Kjeldsen said. “That obviously pertains to people, skin tones, whatever it is you might be able to classify individuals as, and it also goes for clothing.”

The NYPD’s cooperation with IBM has since served as a selling point for the product at California State University, Northridge. There, campus police chief Anne Glavin said the technology firm IXP helped sell her on IBM’s object identification product by citing the NYPD’s work with the company. “They talked about what it’s done for New York City. IBM was very much behind that, so this was obviously of great interest to us,” Glavin said.

A rare look inside the New York Police Department's lower Manhattan security center, where cops monitor surveillance cameras, environmental sensors and license plate readers around the clock. Mayor Michael Bloomberg and Police Commissioner Ray Kelly announced that subway cameras are also being monitored in the center -- officially called The Lower Manhattan Security Coordination Center. Modeled after London's "Ring of Steel," the NYPD opened its coordination center in 2008. Today cops monitorfeeds from over 1159 CCTV cameras with the number increasing to 3,000 as the program expands. (Photo by Timothy Fadek/Corbis via Getty Images)A monitor showing surveillance footage of a New York street on Sept. 20, 2010, viewed inside the New York City Police Department’s lower Manhattan security center. Photo: Timothy Fadek/Corbis via Getty Images

Day-to-Day Policing, Civil Liberties Concerns

The NYPD-IBM video analytics program was initially envisioned as a counterterrorism tool for use in midtown and lower Manhattan, according to Kjeldsen. However, the program was integrated during its testing phase into dozens of cameras across the city. According to the former NYPD technologist, it could have been integrated into everyday criminal investigations.

“All bureaus of the department could make use of it,” said the former technologist, potentially helping detectives investigate everything from sex crimes to fraud cases. Kjeldsen spoke of cameras being placed at building entrances and near parking entrances to monitor for suspicious loiterers and abandoned bags.

Donald, the NYPD spokesperson, said the program’s access was limited to a small number of counterterrorism officials, adding, “We are not aware of any case where video analytics was a factor in an arrest or prosecution.”

Campus police at California State University, Northridge, who adopted IBM’s software, said the bodily search features have been helpful in criminal investigations. Asked about whether officers have deployed the software’s ability to filter through footage for suspects’ clothing color, hair color, and skin tone, Captain Scott VanScoy at California State University, Northridge, responded affirmatively, relaying a story about how university detectives were able to use such features to quickly filter through their cameras and find two suspects in a sexual assault case.

Join Our Newsletter

Original reporting. Fearless journalism. Delivered to you.

I’m in

“We were able to pick up where they were at different locations from earlier that evening and put a story together, so it saves us a ton of time,” Vanscoy said. “By the time we did the interviews, we already knew the story and they didn’t know we had known.”

Glavin, the chief of the campus police, added that surveillance cameras using IBM’s software had been placed strategically across the campus to capture potential security threats, such as car robberies or student protests. “So we mapped out some CCTV in that area and a path of travel to our main administration building, which is sometimes where people will walk to make their concerns known and they like to stand outside that building,” Glavin said. “Not that we’re a big protest campus, we’re certainly not a Berkeley, but it made sense to start to build the exterior camera system there.”

Civil liberties advocates say they are alarmed by the NYPD’s secrecy in helping to develop a program with the potential capacity for mass racial profiling.

The identification technology IBM built could be easily misused after a major terrorist attack, argued Rachel Levinson-Waldman, senior counsel in the Brennan Center’s Liberty and National Security Program. “Whether or not the perpetrator is Muslim, the presumption is often that he or she is,” she said. “It’s easy to imagine law enforcement jumping to a conclusion about the ethnic and religious identity of a suspect, hastily going to the database of stored videos and combing through it for anyone who meets that physical description, and then calling people in for questioning on that basis.” IBM did not comment on questions about the potential use of its software for racial profiling. However, the company did send a comment to The Intercept pointing out that it was “one of the first companies anywhere to adopt a set of principles for trust and transparency for new technologies, including AI systems.” The statement continued on to explain that IBM is “making publicly available to other companies a dataset of annotations for more than a million images to help solve one of the biggest issues in facial analysis — the lack of diverse data to train AI systems.”

Few laws clearly govern object recognition or the other forms of artificial intelligence incorporated into video surveillance, according to Clare Garvie, a law fellow at Georgetown Law’s Center on Privacy and Technology. “Any form of real-time location tracking may raise a Fourth Amendment inquiry,” Garvie said, citing a 2012 Supreme Court case, United States v. Jones, that involved police monitoring a car’s path without a warrant and resulted in five justices suggesting that individuals could have a reasonable expectation of privacy in their public movements. In addition, she said, any form of “identity-based surveillance” may compromise people’s right to anonymous public speech and association.

Garvie noted that while facial recognition technology has been heavily criticized for the risk of false matches, that risk is even higher for an analytics system “tracking a person by other characteristics, like the color of their clothing and their height,” that are not unique characteristics.

The former NYPD technologist acknowledged that video analytics systems can make mistakes, and noted a study where the software had trouble characterizing people of color: “It’s never 100 percent.” But the program’s identification of potential suspects was, he noted, only the first step in a chain of events that heavily relies on human expertise. “The technology operators hand the data off to the detective,” said the technologist. “You use all your databases to look for potential suspects and you give it to a witness to look at. … This is all about finding a way to shorten the time to catch the bad people.”

Object identification programs could also unfairly drag people into police suspicion just because of generic physical characteristics, according to Jerome Greco, a digital forensics staff attorney at the Legal Aid Society, New York’s largest public defenders organization. “I imagine a scenario where a vague description, like young black male in a hoodie, is fed into the system, and the software’s undisclosed algorithm identifies a person in a video walking a few blocks away from the scene of an incident,” Greco said. “The police find an excuse to stop him, and, after the stop, an officer says the individual matches a description from the earlier incident.” All of a sudden, Greco continued, “a man who was just walking in his own neighborhood” could be charged with a serious crime without him or his attorney ever knowing “that it all stemmed from a secret program which he cannot challenge.”

While the technology could be used for appropriate law enforcement work, Kjeldsen said that what bothered him most about his project was the secrecy he and his colleagues had to maintain. “We certainly couldn’t talk about what cameras we were using, what capabilities we were putting on cameras,” Kjeldsen said. “They wanted to control public perception and awareness of LMSI” — the Lower Manhattan Security Initiative — “so we always had to be cautious about even that part of it, that we’re involved, and who we were involved with, and what we were doing.” (IBM did not respond to a question about instructing its employees not to speak publicly about its work with the NYPD.)

The way the NYPD helped IBM develop this technology without the public’s consent sets a dangerous precedent, Kjeldsen argued.

“Are there certain activities that are nobody’s business no matter what?” he asked. “Are there certain places on the boundaries of public spaces that have an expectation of privacy? And then, how do we build tools to enforce that? That’s where we need the conversation. That’s exactly why knowledge of this should become more widely available — so that we can figure that out.”

IBM Used NYPD Surveillance Footage to Develop Technology that Lets Police Search by Skin Color

Introduction

The expansion of digital record-keeping by police departments across the U.S. in the 1990s ushered in the era of data-driven policing. Huge metropolises like New York City crunched reams of crime and arrest data to find and target “hot spots” for extra policing. Researchers at the time found that this reduced crime without necessarily displacing it to other parts of the city—although some of the tactics used, such as stop-and-frisk, were ultimately criticized by a federal judge, among others, as civil rights abuses.

The next development in data-informed policing was ripped from the pages of science fiction: software that promised to take a jumble of local crime data and spit out accurate forecasts of where criminals are likely to strike next, promising to stop crime in its tracks. One of the first, and reportedly most widely used, is PredPol, its name an amalgamation of the words “predictive policing.” The software was derived from an algorithm used to predict earthquake aftershocks that was developed by professors at UCLA and released in 2011. By sending officers to patrol these algorithmically predicted hot spots, these programs promise they will deter illegal behavior.

But law enforcement critics had their own prediction: that the algorithms would send cops to patrol the same neighborhoods they say police always have, those populated by people of color. Because the software relies on past crime data, they said, it would reproduce police departments’ ingrained patterns and perpetuate racial injustice, covering it with a veneer of objective, data-driven science.

PredPol has repeatedly said those criticisms are off-base. The algorithm doesn’t incorporate race data, which, the company says, “eliminates the possibility for privacy or civil rights violations seen with other intelligence-led or predictive policing models.”

There have been few independent, empirical reviews of predictive policing software because the companies that make these programs have not publicly released their raw data.

A seminal, data-driven study about PredPol published in 2016 did not involve actual predictions. Rather the researchers, Kristian Lum and William Isaac, fed drug crime data from Oakland, California, into PredPol’s open-source algorithm to see what it would predict. They found that it would have disproportionately targeted Black and Latino neighborhoods, despite survey data that shows people of all races use drugs at similar rates.

PredPol’s founders conducted their own research two years later using Los Angeles data and said they found the overall rate of arrests for people of color was about the same whether PredPol software or human police analysts made the crime hot spot predictions. Their point was that their software was not worse in terms of arrests for people of color than nonalgorithmic policing.

However, a study published in 2018 by a team of researchers led by one of PredPol’s founders showed that Indianapolis’s Latino population would have endured “from 200% to 400% the amount of patrol as white populations” had it been deployed there, and its Black population would have been subjected to “150% to 250% the amount of patrol compared to white populations.” The researchers said they found a way to tweak the algorithm to reduce that disproportion but that it would result in less accurate predictions—though they said it would still be “potentially more accurate” than human predictions.

In written responses to our questions, the company’s CEO said the company did not change its algorithm in response to that research because the alternate version would “reduce the protection provided to vulnerable neighborhoods with the highest victimization rates.” He also said the company did not provide the study to its law enforcement clients because it “was an academic study conducted independently of PredPol.”

Other predictive police programs have also come under scrutiny. In 2017, the Chicago Sun-Times obtained a database of the city’s Strategic Subject List, which used an algorithm to identify people at risk of becoming victims or perpetrators of violent, gun-related crime. The newspaper reported that 85% of people that the algorithm saddled with the highest risk scores were Black men—some with no violent criminal record whatsoever.

Last year, the Tampa Bay Times published an investigation analyzing the list of people that were forecast to commit future crimes by the Pasco Sheriff’s Office’s predictive tools. Deputies were dispatched to check on people on the list more than 12,500 times. The newspaper reported that at least one in 10 of the people on the list were minors, and many of those young people had only one or two prior arrests yet were subjected to thousands of checks.

For our analysis, we obtained a trove of PredPol crime prediction data that has never before been released by PredPol for unaffiliated academic or journalistic analysis. Gizmodo found it exposed on the open web (the portal is now secured) and downloaded more than 7 million PredPol crime predictions for dozens of American cities and some overseas locations between 2018 and 2021.

This makes our investigation the first independent effort to examine actual PredPol crime predictions in cities around the country, bringing quantitative facts to the debate about predictive policing and whether it eliminates or perpetuates racial and ethnic bias.

We examined predictions in 38 cities and counties crisscrossing the country, from Fresno, California, to Niles, Illinois, to Orange County, Florida, to Piscataway, New Jersey. We supplemented our inquiry with Census data, including racial and ethnic identities and household incomes of people living in each jurisdiction—both in areas that the algorithm targeted for enforcement and those it did not target.

Overall, we found that PredPol’s algorithm relentlessly targeted the Census block groups in each jurisdiction that were the most heavily populated by people of color and the poor, particularly those containing public and subsidized housing. The algorithm generated far fewer predictions for block groups with more White residents.

Analyzing entire jurisdictions, we observed that the proportion of Black and Latino residents was higher in the most-targeted block groups and lower in the least-targeted block groups (about 10% of which had zero predictions) compared to the overall jurisdiction. We also observed the opposite trend for the White population: The least-targeted block groups contained a higher proportion of White residents than the jurisdiction overall, and the most-targeted block groups contained a lower proportion.

For more than half (20) of the jurisdictions in our data, the majority of White residents lived in block groups that were targeted less than the median or not at all. The same could only be said for the Black population in four jurisdictions and for the Latino population in seven.

When we ran a statistical analysis, it showed that as the number of crime predictions for block groups increased, the proportion of the Black and Latino populations also increased and the White population decreased.

We also found that PredPol’s predictions often fell disproportionately in places where the poorest residents live. For the majority of jurisdictions (27) in our data set, a higher proportion of the jurisdiction’s low-income households live in the block groups that were targeted the most. In some jurisdictions, all of its subsidized and public housing is located in block groups PredPol targeted more than the median.

We focused on census block groups, clusters of blocks that generally have a population of between 600 to 3,000 people because these were the smallest geographic units for which recent race and income data was available at the time of our analysis (2018 American Community Survey).

Block groups are larger than the 500-by-500-square-foot prediction squares that PredPol’s algorithm produces. As a result, the populations in the larger block groups could be different from the prediction squares. To measure the potential impact, we conducted a secondary analysis at the block level using 2010 Census data for blocks whose populations remained relatively stable. (See Limitations for how we define stable.)

We found that in nearly 66% of the 131 stable block groups, predictions clustered on the blocks with the most Black or Latino residents inside of those block groups. Zooming in on blocks showed that predictions that appeared to target majority-White block groups had in fact targeted the blocks nestled inside of them where more Black and Latino people lived. This was true for 78% of the 46 stable, majority-White block groups in our sample.

To try to determine the effects of PredPol predictions on crime and policing, we filed more than 100 public records requests and compiled a database of more than 600,000 arrests, police stops, and use-of-force incidents. But most agencies refused to give us any data. Only 11 provided at least some of the necessary data.

For the 11 departments that provided arrest data, we found that rates of arrest in predicted areas remained the same whether PredPol predicted a crime that day or not. In other words, we did not find a strong correlation between arrests and predictions. (See the Limitations section for more information about this analysis.)

We do not definitively know how police acted on any individual crime prediction because we were refused that data by nearly every police department. Only one department provided more than a few days’ worth of concurrent data extracted from PredPol that reports when police responded to the predictions, and that data was so sparse as to raise questions about its accuracy.

To determine whether the algorithm’s targeting mirrored existing arrest patterns for each department, we analyzed arrest statistics by race for 29 of the agencies in our data using data from the FBI’s Uniform Crime Reporting (UCR) project. We found that the socioeconomic characteristics of the neighborhoods that the algorithm targeted mirrored existing patterns of disproportionate arrests of people of color.

In 90% of the jurisdictions, per capita arrests were higher for Black people than White people—or any other racial group included in the dataset. This is in line with national trends. (See Limitations for more information about UCR data.)

Overall, our analysis suggests that the algorithm, at best, reproduced how officers have been policing, and at worst, would reinforce those patterns if its policing recommendations were followed.

Data Gathering and Preparation

We discovered access to PredPol prediction data through a page on the Los Angeles Police Department’s public-facing website that contained a list of PredPol reporting areas with links. Those links led to an unsecured cloud storage space on Amazon Web Services belonging to PredPol that contained tens of thousands of documents, including PDFs, geospatial data, and HTML files for dozens of departments, not just the LAPD. The data was left open and available, without asking for a password to access it. (Access has since been locked down.)

We first downloaded all the available data to our own database on June 8, 2020, using a cloud storage management tool developed by Amazon. We downloaded the data again and updated our analysis on Jan. 31, 2021. This captured a total of 7.8 million individual predictions for 70 different jurisdictions. These took the form of single-page maps indicating addresses, each marking the center of 500-by-500-foot boxes that the software recommended officers patrol during specific shifts to deter crime. Each report’s HTML code was formatted with the prediction’s date, time, and location. That allowed us to investigate patterns in PredPol predictions over time.

Of the 70 agencies in our dataset, we had less than six months of predictions for 10 of them and six others were empty folders. Not all the agencies were U.S.-based or even policing agencies—some were private security firms. One was using PredPol to predict oil theft and other crimes in Venezuela’s Boscán oil field, while another was using PredPol to predict protests in Bahrain. While these uses raise interesting questions, they fell outside the scope of our current investigation.

We limited our analysis to U.S. city and county law enforcement agencies for which we had at least six months’ worth of data. We confirmed with the law enforcement agency, other media reports, and/or signed contracts that they had used PredPol in the time period for which we had reports and the stop and start dates for each city. This reduced the list to 38 agencies.

For 20 of these 38 departments, some predictions in our data fell outside the stop/start dates provided by law enforcement, so we removed these predictions from the final data used for our analysis, in an abundance of caution. The final dataset we used for analysis contained more than 5.9 million predictions.

To determine which communities were singled out for additional patrol by the software, we collected demographic information from the Census Bureau for each department’s entire jurisdiction, not only the prediction locations.

For police departments, we assumed their jurisdictions included every block group in the city, an official boundary the Census calls a “census-designated place.” (See more in the Limitations section.) Sheriff’s departments were more complicated because in some cases their home county includes cities they do not patrol. For those, we obtained the sheriff departments’ patrol maps and used an online tool called Census Reporter to compile a list of every block group within the disclosed jurisdiction.

We looked up the census tracts and block groups for the coordinates of every prediction in our database using the Census’s geocoding API. The census tracts and block groups used in our analysis were drawn during the 2010 Census. We gathered demographic data for these areas from the five-year population estimates in the 2018 American Community Survey (ACS), the most recent survey available when we began our investigation.

The ACS only provides demographic information down to the block-group level—subdivisions of a census tract that generally include between 600 and 3,000 people and take up an average of 39 blocks. These are significantly larger than the prediction boxes, which are just shy of six acres or about the size of a square city block, but we had no good alternative. Smaller, block-level demographic data from the Census Bureau for 2020 is not scheduled to be released until 2022. The block-level data available during our investigation is more than 10 years old, and we found that the demographic changes since then in the majority of block groups in our data were significant (30% or more for the block groups’ Black, Latino, or White populations). (See more in the Limitations section.)

Layering on the Census ACS data from 2018 allowed us to carry out a disparate impact analysis about the people who lived in areas the PredPol software targeted at that time—and those who lived in areas that were not targeted.

Prediction Analysis and Findings

Methods

Given the quantity and various types of data we gathered, we used various methods of analysis for this investigation, each of which will be described in detail in subsequent sections.

We carried out several disparate impact analyses seeking to discern whether predictions fell more heavily on communities of color, low-income communities, and blocks containing public housing.

For the race/ethnicity and income analyses, we merged 2018 American Community Survey data and prediction data and observed the makeup of block groups that were targeted above and below the median; those targeted the most; and those targeted the least. (We also analyzed the data in a continuous manner to confirm that our findings were due to an underlying trend, not spurious observations.)

We also conducted a limited disparate impact analysis at the smaller, block-level scale using 2010 Census data.

For the public housing disparate impact analysis, we gathered data released by the federal Department of Housing and Urban Development on the location of subsidized and public housing in all of the jurisdictions in our data, mapped them out, and observed the frequency of PredPol predictions for those locations.

To examine possible relationships between predictions and law enforcement actions, we analyzed more than 270,000 arrest records from 11 agencies, 333,000 pedestrian or traffic stops from eight agencies, and 300 use-of-force records from five agencies, all of which were released under public records laws. (Most agencies did not provide records.)

We also examined arrest rates by race/ethnicity for 29 of the 38 jurisdictions in our final dataset using data from the FBI’s Uniform Crime Reporting program.

Lastly, six agencies provided disaggregated arrest data that included race, and we examined this data to discern arrest rates across racial groups for some crime types, such as cannabis possession.

Disparate Impact Analysis

Frequent police contact, like frequent exposure to a pollutant, can have an adverse effect on individuals and result in consequences that extend across entire communities. A 2019 study published in the American Sociological Review found that increased policing in targeted hot spots in New York City under Operation Impact lowered the educational performance of Black boys from those neighborhoods. Another 2019 study found that the more times young boys are stopped by police, the more likely they are to report engaging in delinquent behavior six, 12, and 18 months later.

We carried out a disparate impact analysis to assess which, if any, demographic groups would be disproportionately exposed to potential police interactions if the agencies had acted on recommendations provided by PredPol’s software. We analyzed the distribution of PredPol predictions for each jurisdiction at the geographic level of a census block group, which is a cluster of blocks with a population of between 600 to 3,000 people, generally.

Block groups in our data were made up of 28 blocks, on average, and contained an average of 1,600 residents. As stated earlier, these were much larger than PredPol’s 500-by-500-foot prediction squares but are the smallest geographic unit for which recent government information about the race, ethnicity, and household income of its inhabitants was available at the time of our investigation.

There was significant variation in the length of time each of the 38 jurisdictions in our analysis used the software during our window of access, and which crimes they used it to predict. There was also a huge difference in the average number of predictions on block groups among jurisdictions, which varied from eight to 7,967.

The 38 jurisdictions were of varying sizes; Jacksonville, Texas, was the smallest, with 13 block groups, and Los Angeles the largest, with 2,515 block groups.

We calculated the total number of predictions per block group in each jurisdiction. We then sorted the block groups in each jurisdiction by their prediction counts and created three categories for analysis.

We defined the “most-targeted block groups” as those in each jurisdiction that encompassed the highest 5% of predictions, which corresponded to between one and 125 block groups. We defined the “median-targeted block groups” as the 5% of each jurisdiction’s block groups straddling the median block group for predictions. And we defined the “least-targeted block groups” as each jurisdiction’s block groups with the bottom 5% of predictions.

We also calculated whether the majority (more than 50%) of a jurisdiction’s demographic group lived in the block groups targeted more or less than the median.

We chose to define the most-targeted and least-targeted groups using the 5% metric rather than using alternative methods, such as the Interquartile Range (IQR).

With the IQR method, we would consider block groups below the 25th percentile to be the least targeted and block groups above the 75th percentile to be the most targeted, but this did not fit our requirements because of the large volume of zero-prediction block groups (10%). Using the IQR method, the average percentage of a jurisdiction’s block groups in the most-targeted group would have been 7% of the jurisdiction’s block groups, whereas the average in the least-targeted group would have made up 71% of the jurisdiction’s block groups. This difference is too large to make a meaningful comparison of the demographic composition of the least- and most-targeted block groups. This is why we chose to use 5% for the least- and most-targeted groups.

In some of the larger jurisdictions, more than 5% of block groups received zero predictions. In those cases, we chose the most-populated block groups with no predictions for the 5%. We also ran an analysis in which we counted every block group with zero predictions as the least-targeted block groups, and the findings did not change significantly. (See Limitations for more.)

The analysis consisted of the following steps:

  1. Sort the list of block groups from most targeted to least targeted and label the most targeted, median targeted or least targeted as defined above.

  2. Get ACS population data at the block-group-level for the following demographic populations:

a) Race: African American, Asian, Latino, and White.

b) Household Income: Less than $45,000, $75,000–$100,000, $125,000–$150,000, Greater than $200,000

  1. Calculate the proportion for each demographic group d in a jurisdiction’s most-targeted, median-targeted, and least-targeted block groups. Hence we calculate 3×38 values of dt:

  2. Calculate the proportion for each demographic group d, in all the block groups in the jurisdiction j. This gives us 38 values for dj:

  3. To determine if a demographic group’s proportion in the most-, median-, or least-targeted blocks is greater than it is in the jurisdiction overall, we simply compare the values. For each jurisdiction, we compare the three values of dt to dj. We present the results aggregated across all jurisdictions:

  4. We also calculated what proportion of a jurisdiction’s demographic group d lived in the block groups targeted more and less than the median:

  5. Using these values we can calculate the number of jurisdictions where the demographic majority lives in the most- and least-targeted blocks. After carrying out the comparisons individually for each jurisdiction, we present the aggregated results.

We acquired block group demographic data from the Census Bureau’s 2018 American Community Survey. We conducted our analysis for race/ethnicity and household income. Not every jurisdiction had reliable estimates at the block group level for each racial or income group because some populations were too small.

For our main analysis, we focused on the demographic composition of the most- and least-targeted blocks as well as those targeted more than the median and less than the median. Doing so allowed us to measure the disparate impact in a way that is clear yet simple to understand. In order to ensure we weren’t cherry-picking statistics, we also carried out an analysis that preserved the continuous nature of the data.

For each of our 38 jurisdictions, we looked at the relationship between the following variable pairs at the level of the census block group:

Prediction count and population of Race (Asian, African American, Latino, and White)

Prediction count and number of Households at different income ranges (Greater than $200,000, Between $125,000 and $150,000, Between $75,000 and $100,000, and Less than $45,000).

We calculated the Spearman correlation coefficient and used a box plot to visualize the distribution of correlation coefficients for each pair of variables and calculated the median coefficient values across all 38 jurisdictions. This analysis allowed us to measure if, for a given jurisdiction, the prediction count that a block group received is correlated to the race/ethnicity or income of the people living in it.

We chose to calculate individual coefficients for each jurisdiction, rather than collapsing all the block groups across jurisdictions into one analysis since they are independent distributions. There could be meaningful differences between jurisdictions’ policing practices, and there are definitely significant variations in the number of block groups and the racial and household income composition of the people living in each of them, as well as the total number of predictions they received. For this reason, we analyzed each jurisdiction individually and examined the distribution of those correlation coefficients to see if a pattern emerged.

For our final analysis, we looked at the demographic composition of the 38 jurisdictions individually by binning the block groups into discrete buckets based on the number of predictions they received. We made 10 equal-sized bins based on the percentile score of a block group in a given jurisdiction. The first bin had block groups that had between 0 predictions and the 10th percentile, and the last bin had block groups that were between the 90th and 100th percentile. We then calculated the demographic composition of the collection of block groups in each of these bins. Doing this allowed us to observe if there was any relationship between the composition of the racial/ethnic or income groups in each of these bins and the predictions it received. Unlike our previous analysis, this method includes all the block groups in each jurisdiction. We present the averaged results across all jurisdictions in the following two sections and provide the results for individual jurisdictions in our GitHub.

In order to measure the accuracy of our findings, we used the margin of errors for population estimates present in the 2018 ACS data to run our analysis on the lower and upper bounds of each block group’s population estimates. This allowed us to measure how much our findings varied due to ACS data inaccuracies. There wasn’t a significant change in our findings for African American, Asian, Latino, or White populations, or for different median household income ranges, no matter which population estimate we used.

To err on the side of caution throughout this methodology, we state our findings with the lowest of the three values we calculated (e.g., “at least 63% of jurisdictions”).

The only demographic group for which the findings varied significantly was Native Americans, so we didn’t use those findings in our analysis.

To determine whether focusing on a smaller geography would affect our findings, we completed a secondary analysis at the block level using 2010 data and found even greater disparities (more in the next section and Limitations).

Race and Ethnicity Analysis

Most- and Least-Targeted Block Groups

For the majority of jurisdictions we analyzed, the most-targeted block groups had a higher Black or Latino population while block groups that were never or infrequently targeted tended to have a higher White population when compared to the jurisdiction as a whole.

In a majority of 38 jurisdictions, more Blacks and Latinos lived in block groups that were most targeted, while more Whites lived in those that were least targeted.

In at least 84% of departments (32), a higher proportion of Black or Latino residents lived in the most-targeted block groups compared to the jurisdiction overall. Looking only at Black residents, a higher proportion lived in the most-targeted block groups in 66% of jurisdictions (25), and for Latinos alone, it’s 55% of jurisdictions (21).

This same phenomenon was less common for Asian residents. In at least 34% of jurisdictions (13), Asian populations in the most-targeted block groups exceed the jurisdiction’s median Asian population. It was the least common for White people. In at least 21% of jurisdictions (8) a higher proportion of White residents live in the block groups most targeted by PredPol’s software than the jurisdiction overall.

Conversely, when we looked at the block groups least targeted by PredPol’s software, their demographics were reversed. For at least 74% of the policing agencies in our data (28 jurisdictions) the proportion of White residents in the least-targeted block groups was higher than the jurisdiction overall. This was true for Blacks and Latinos much less often, in at least 16% (6) and 18% (7) of jurisdictions, respectively.

Analyzing the most-targeted blocks from all 38 jurisdictions, we found the African American and Latino proportion increased by 28% and 16% on average, and the average White population decreased by 17%. The opposite trend was true for the least-targeted blocks.

As predictions increased, the proportion of Blacks and Latinos in block groups increased. The opposite was true for Whites.

In Salisbury, Maryland, at least 26% of residents in the jurisdiction’s median block group are Black, according to the Census Bureau. However, the Black population jumped to at least 5%, on average, for block groups that were most targeted by PredPol.

In Portage, Michigan, the most-targeted block groups contained at least nine times as many Black residents as the median-targeted block groups in the city and at least seven times as many Black residents as the city overall.

And the number of predictions in these most-targeted areas was often overwhelming.

In one block group in Jacksonville, Texas (block group 1 of the 950500 census tract), PredPol predicted that either an assault or a vehicle burglary would occur at one of various locations in that block group 12,187 times over nearly two years. That’s 19 predictions each and every day in an area with a population o​​f 1,810 people. This block group’s population is at least 62% Black and Latino and between 15% and 21% White.

In fact, at least 83% of Jacksonville’s Black population lived in block groups that were targeted more than 7,500 times in two years. This was many times more than the percentage of the city’s White population that lived in those block groups (at least 23%).

When we asked PredPol about it, the company said Jacksonville was misusing the software for some of the time, using too many daily shifts, which resulted in extra predictions per day. (See more in the Company Response section.) The Jacksonville police did not respond to requests for comment.

Block Groups Above and Below the Median

We also found that for at least 76% of the jurisdictions in our data (29), a majority of a jurisdiction’s Black or Latino population lived in the block groups PredPol targeted more than the median. A majority of Asian residents lived in these block groups for at least 55% of jurisdictions in our data.

The algorithm largely spared White residents from the same level of scrutiny it recommended for Black and Latino residents.

For more than half (20) of the jurisdictions in our data, the majority of White residents lived in block groups that were targeted less than the median or not at all. The same could only be said for the Black population in four jurisdictions and for the Latino population in seven.

Block-Level Race Analysis

Advocates for hot spot policing stress that the small size of the prediction area is crucial. To determine whether focusing on a smaller geography would affect our findings, we completed a secondary analysis at the block level using 2010 Census data. To reduce the effects of population shifts over the ensuing decade, we limited this analysis to block groups with at least one prediction in our dataset where Black, Latino, and White populations did not change more than 20% between the 2010 Census and the 2018 ACS. Asian and Native American populations were too small for this secondary analysis. For our dataset, 20% proved to be a good threshold for selecting block groups where the demographic population shifts were small.

In the resulting 135 reasonably stable block groups (2% of the block groups in our data), we found that 89 of the targeted blocks within them had even higher concentrations of Black and Latino residents than the overall block group. (See more in the Limitations section.)

In some cases, zooming in on blocks showed that predictions that appeared to target majority White block groups had in fact targeted the blocks within them where people of color lived. For example, every single prediction in a majority White block group in Los Angeles’s Northridge neighborhood (block group 2 of the 115401 census tract) occurred on a block whose residents were almost all Latino. The most-targeted block in a majority White block group in Elgin, Illinois. (block group 1 of the 851000 Census tract), had seven times more Black residents than the rest of the block group.

For 36 (78%) of the 46 stable, majority-White block groups, predictions most frequently targeted the blocks inside of them that had higher percentages of Black or Latino residents. In only 18 (36%) of the 50 stable, majority-Black and -Hispanic block groups did the most-targeted blocks have higher percentages of White people than the block group overall.

Correlation Between Predictions and Race

We analyzed the relationship between the volume of predictions a block group received and its race and ethnic makeup using the Spearman correlation coefficient. We calculated the correlation coefficient for all 38 jurisdictions individually. For each jurisdiction, we calculated four coefficients, one for each race/ethnicity in our analysis. Thus, we had 38 × 4 coefficients. We visualized the distribution to surface the underlying trend.

The data suggests that as the number of predictions in a block group increases, the Black and Latino proportion of the population increases and the White and Asian proportion of the population decreases. While the median correlation is low, there is a lot of variation. This may be the result of the algorithm echoing existing policing practices or because some jurisdictions in the data are much more segregated than others.

As mentioned previously, PredPol’s prediction boxes are much smaller than a block group. Since the correlation coefficients are calculated at the level of the block group, they would not pick up the sort of targeting that we describe in the previous section, where even within some White-majority block groups, the most-targeted blocks were the ones where people of color lived. Thus these correlation coefficients are more conservative than the one carried out at the level of a census block.

We were not able to carry out this analysis at that more granular level due to the limitations of the block-level Census demographic data available to us.

As the number of predictions in a block group increased, the Black and Latino proportion of the population increased

Race/Ethnicity Composition of Deciles

To observe how the compositions of different race/ethnicity groups changed across block groups as a property of predictions, we binned the block groups into discrete buckets based on the number of predictions they received and calculated the proportion of the race/ethnicity and income groups in our analysis that lived in the collection of block groups in each bin.

After calculating these values for each of our 38 jurisdictions individually, we calculated the mean value for each bucket across all jurisdictions. This is shown in the chart below. The figure shows that, on average, as the number of predictions a block group received increases, the proportion of the Black and Latino populations increases and the White population decreases.

Neighborhoods with the most predictions had the lowest share of White residents.

Our analysis showed that the most-targeted block groups had a higher Black or Latino population than the jurisdiction as a whole, while block groups that were never or infrequently targeted tended to have a higher percentage of White residents than the jurisdiction as a whole.

To see how the demographic composition changed for any individual jurisdiction, see our GitHub here.

Wealth and Poverty Analysis

Joining prediction data with the Census Bureau’s 2018 American Community Survey data also gave us insight into the financial strata of those living in areas targeted by PredPol.

The federal poverty line, at $26,200 a year income for a family of four, is widely criticized as too low a measure to provide an accurate picture of all the people experiencing financial and food insecurity in America. To capture a broader swath of lower-income families than the poverty line allows, we chose a different federal metric: the income threshold for public school students to qualify for the federal free and reduced lunch program, which is $48,000 annually for a family of four. We rounded down to $45,000 because that was as close as the Census data could get us.

In our 38 jurisdictions, we observed significant variation in the upper income range. Some had almost no households that made more than $200,000, while for others they made up 15% of the jurisdiction. To account for the variation, we used three different higher income ranges to try to capture wealthier neighborhoods in different municipalities. These ranges were chosen using what was available in the Census’s table for household Income in the past 12 months.

We counted the number of households in each Census block group with an annual income of $45,000 or less as well as the following groupings: $75,000 to $100,00, $125,000 to $150,000, and more than $200,000. We then calculated what percentage of each jurisdiction’s portion of these income groups was located in block groups in the most-, median- and least-targeted areas for PredPol predictions, as we had for the racial and ethnic analysis.

Most- and Least-Targeted Block Groups

Our analysis found that, compared to the jurisdiction as a whole, a higher proportion of a jurisdiction’s low-income households lived in the block groups PredPol’s software targeted the most, and a higher proportion of middle-class and wealthy households lived in the block groups it targeted the least.

In at least 71% of jurisdictions (27) in our data set, a higher proportion of low-income households (annual income $45,000 or less) lived in the block groups most targeted by PredPol’s software compared to the jurisdiction overall. This was true for households that made more than $200,000 in at least 21% of jurisdictions (8).

In 30 jurisdictions, the most-targeted block groups had poorer households.

Looking at the most-targeted blocks in all 38 jurisdictions in our dataset, the proportion of households that earned less than $45,000 on average increased by 18%, and the average proportion of households that earned more than $200,000 decreased by 26%. The opposite trend was true for the least-targeted blocks.

As predictions increased, poorer households increased and wealthy ones decreased.

In some places, the disparity was even more dramatic. In Haverhill, Massachusetts, for instance, at least 21% of the jurisdiction’s 4,503 low-income households were located in the most-targeted block groups. In Decatur, Georgia, at least one in three (34%) of the jurisdiction’s low-income households lived in two block groups that PredPol targeted constantly—more than 11,000 predictions each over almost three years.

We also looked at the distribution of wealthier households in jurisdictions and compared those to PredPol predictions. We found that block groups that were never targeted tended to be wealthier. For a majority of the jurisdictions in our data, Census block groups that PredPol targeted the least were composed of more households that earned at least $200,000 a year than in the jurisdiction overall.

In Merced, California, for instance, the least-targeted block groups had at least 10 wealthy households on average. The median-targeted block groups had none. And in Birmingham, Alabama, the median block group didn’t have a single wealthy household. But block groups where PredPol never made predictions had at least 34 wealthier households on average.

To see how the demographic composition of the neighborhoods changed in an individual jurisdiction based on the software’s targeting, see our GitHub here.

Block Groups Above and Below the Median

We also found that for 33 jurisdictions (87%), the majority of the jurisdiction’s low-income households were located in the block groups targeted more than the median. In only 13 jurisdictions (34%) did a majority of households earning $200,000 or more live in block groups targeted more than the median.

Correlation Between Predictions and Income

We analyzed the relationship between the volume of predictions a block group received and the income range of the people living there. For each jurisdiction, we calculated four coefficients, one for each income range in our analysis. Thus, we had 38 × 4 coefficients. We visualized the distribution to surface the underlying trend.

We found a weak positive correlation between the proportion of households that make less than $45,000 a year and the number of predictions a block group receives and a weak negative correlation for the rest of the income levels. This means the data suggests that as the prediction count increases, the proportion of households that make less than $45,000 a year increases.

The proportion of households earning less than $45,000 a year positively correlated with predictions

Income Composition of Deciles

To observe how the composition of household income ranges changed across block groups as a function of predictions, we binned the block groups into discrete buckets based on the number of predictions they received and calculated the proportion of people of each income range in our analysis that lived there.

After calculating the distribution for each of our 38 jurisdictions individually, we calculated the mean value for each bucket across all block groups. This is shown in the figure below. The figure shows the same trend we observed in our previous analysis: Looking at the data for all 38 jurisdictions together, on average, as the number of predictions a block group received increases, the proportion of households that make less than $45,000 a year increases.

As predictions increased, average household income decreased

Our analysis found that, compared to the jurisdiction as a whole, a higher proportion of a jurisdiction’s low-income households lived in the block groups PredPol’s software targeted the most, and a higher proportion of wealthy households lived in the block groups it targeted the least. We also found that across the entire distribution as the predictions a block group received increased, the proportion of households making $45,000 a year or less also increased. To see how the composition changed for individual jurisdictions, see our Github here.

Public Housing Analysis

As we continued to explore these most-predicted areas, we noticed a large number were in and around public housing complexes, home to some of the nation’s poorest residents.

Using HUD’s online housing lookup tool, we gathered the locations of 4,001 public or private subsidized housing communities, homeless shelters, and elderly and special needs housing in the jurisdictions in our data. We then looked at the frequency with which PredPol predicted a crime would occur there.

For 22 jurisdictions in our data (57%), more than three-quarters of their public housing facilities were located in block groups that PredPol targeted more than the median. For some jurisdictions, a majority of public housing was located in the most-targeted block groups:

In Jacksonville, 63% of public housing was located in the block groups PredPol targeted the most.

In Elgin, 58% of public housing was located in the block groups PredPol targeted the most.

In Portage; Livermore, California; Cocoa, Florida; South Jordan, Utah; Gloucester, New Jersey; and Piscataway, every single public housing facility was located in block groups that were targeted the most.

In 10 jurisdictions, PredPol predicted crimes in blocks with public housing communities nearly every single day the program was in use there. (Since this analysis did not require Census demographic data, we counted the number of predictions for their locations.)

We were able to get arrest data for some of these departments, but when we compared it to the rate and type of predictions made, they could be miles apart.

For example, PredPol predicted that assault would occur an average of five times a day at the Sweet Union Apartments, a public housing community in Jacksonville—3,276 predictions over the 614 days that the Jacksonville Police Department used the software during the period we analyzed. PredPol said Jacksonville had at some point created too many shifts, so it was receiving repeat predictions. The police department did not respond to requests for comment.

It is unknown whether police increased patrols in those areas as a result (see more in Limitations). Arrest data provided by the Jacksonville police showed that officers made 31 arrests there over that time. Only four were for domestic violence or assault. The majority of the other 27 violations were outstanding warrants or drug possession.

Stops, Arrests, and Use of Force

We sought to determine the effect of PredPol predictions on commonly collected law enforcement data: stops, arrests, and use of force.

To do that, we made more than 100 public records requests to 43 agencies in our data for their use-of-force, crime, stop, and arrest data from 2018 through 2020. We focused on jurisdictions where PredPol predictions disproportionately targeted Black, Latino, or low-income neighborhoods and where the software predicted nonproperty crime types.

We also requested “dosage” data, which is PredPol’s term for data the software provides agencies that track when officers visit each prediction box and how much time they spent there—but the requests were roundly denied by nearly every agency, many on the grounds that the agency has stopped using PredPol and could no longer access the information.

Some agencies refused to give us any data at all; others gave us some data. Only two—Plainfield, New Jersey, and Portage—gave us all the types of data we requested.

We obtained data for pedestrian or traffic stops from eight agencies, arrest data from 11 agencies, and officer use-of-force incidents from five agencies. Some of the use-of-force records were provided as written reports rather than data, so we pulled out the metadata to build spreadsheets. Each set of new data was then checked against the original records by another journalist on the project.

We geolocated each arrest, stop, or use of force incident to a latitude/longitude coordinate. This allowed us to check whether the incident occurred on the same day as a PredPol prediction and within 250 feet of the center of the 500-by-500-foot box suggested for patrol (called “inside the box” by PredPol).

When an agency did not provide us with any data, we gathered jurisdiction-level arrest statistics from the FBI’s Uniform Crime Reporting program.

Stop, Arrest, and Use of Force Analysis

PredPol claims that using its software is likely to lead to fewer arrests because sending officers to the company’s prediction boxes creates a deterrent effect. However, we did not observe PredPol having a measurable impact on arrest rates, in either direction. (See Limitations for more about this analysis.)

While these findings are limited, a closer examination of the block groups that PredPol targeted most frequently suggests that the software recommended that police return to the same majority Black and Latino blocks where they had already been making arrests.

When we compared per capita arrests in the block groups that PredPol targeted most frequently—those in the top 5% for predictions—with the rest of the jurisdiction, we found they had higher arrests per capita than both the least-targeted block groups and the jurisdiction overall. These areas of high arrests also have higher concentrations of Black and Latino residents than the overall jurisdiction, according to Census data.

Advertisement

For example, data provided by Salisbury, Georgia, from 2018 to 2020 shows per capita arrests on the most-targeted block groups, those in the top 5% for predictions, were nearly seven times the arrest rate of the jurisdiction as a whole. The proportion of Black and Latino residents living in these most-targeted block groups is twice that of the jurisdiction as a whole, according to Census figures.

Neighborhoods with the most crime predictions had higher arrest rates.

This same pattern repeated for all 11 departments that provided us with disaggregated arrest data: The block groups most targeted by PredPol had both higher percentages of Black or Latino residents and higher arrests per capita than the jurisdiction overall.

We found a similar pattern for the agencies that provided us with data about use-of-force incidents. For three out of the five of them, per capita use-of-force rates were higher in the most-targeted block groups than the overall jurisdiction.

Advertisement

In Plainfield, per capita use-of-force rates in the jurisdiction’s most-targeted block groups were nearly two times the entire jurisdiction’s rate. In Niles, Illinois, per capita use-of-force in the most-targeted block groups was more than two times the jurisdiction’s rate. In Piscataway., it was more than 10 times the jurisdiction’s rate.

Arrests and use-of-force incidents are influenced by far too many variables to attribute statistical changes or any particular contact directly to PredPol predictions without further evidence.

How We Determined Predictive Policing Software Disproportionately Targeted Low-Income, Black, and Latino Neighborhoods

Between 2018 and 2021, more than one in 33 U.S. residents were potentially subject to police patrol decisions directed by crime-prediction software called PredPol.

The company that makes it sent more than 5.9 million of these crime predictions to law enforcement agencies across the country—from California to Florida, Texas to New Jersey—and we found those reports on an unsecured server.

Gizmodo and The Markup analyzed them and found persistent patterns.

Residents of neighborhoods where PredPol suggested few patrols tended to be Whiter and more middle- to upper-income. Many of these areas went years without a single crime prediction.

By contrast, neighborhoods the software targeted for increased patrols were more likely to be home to Blacks, Latinos, and families that would qualify for the federal free and reduced lunch program.

These communities weren’t just targeted more—in some cases, they were targeted relentlessly. Crimes were predicted every day, sometimes multiple times a day, sometimes in multiple locations in the same neighborhood: thousands upon thousands of crime predictions over years. A few neighborhoods in our data were the subject of more than 11,000 predictions.

The software often recommended daily patrols in and around public and subsidized housing, targeting the poorest of the poor.

“Communities with troubled relationships with police—this is not what they need,” said Jay Stanley, a senior policy analyst at the ACLU Speech, Privacy, and Technology Project. “They need resources to fill basic social needs.”

Yet the pattern repeated nearly everywhere we looked:

Neighborhoods in Portage, Michigan, where PredPol recommended police focus patrols have nine times the proportion of Black residents as the city average. Looking at predictions on a map, local activist Quinton Bryant said, “It’s just giving them a reason to patrol these areas that are predominantly Black and Brown and poor folks.”

In Birmingham, Alabama, where about half the residents are Black, the areas with the fewest crime predictions are overwhelmingly White. The neighborhoods with the most have about double the city’s average Latino population. “This higher density of police presence,” Birmingham-based anti-hunger advocate Celida Soto Garcia said, “reopens generational trauma and contributes to how these communities are hurting.”

the city’s average Latino population. “This higher density of police presence,” Birmingham-based anti-hunger advocate Celida Soto Garcia said, “reopens generational trauma and contributes to how these communities are hurting.” In Los Angeles, even when crime predictions seemed to target a majority White neighborhood, like the Northridge area, they were clustered on the blocks that are almost 100% Latino. The neighborhoods in the city where the software recommended police spend the most time were disproportionately poor and more heavily Latino than the city overall. “These are the areas of L.A. that have had the greatest issues of biased policing,” said Thomas A. Saenz, president and general counsel of the LA-based Latino civil rights group MALDEF.

100% Latino. The neighborhoods in the city where the software recommended police spend the most time were disproportionately poor and more heavily Latino than the city overall. “These are the areas of L.A. that have had the greatest issues of biased policing,” said Thomas A. Saenz, president and general counsel of the LA-based Latino civil rights group MALDEF. About 35 miles outside of Boston, in Haverhill, Massachusetts, PredPol recommended police focus patrols in neighborhoods that had three times the Latino population and twice the low-income population as the city average. “These are the communities that we serve,” said Bill Spirdione, associate pastor of the Newlife Christian Assembly of God and executive director of the Common Ground food pantry.

In the Chicago suburb of Elgin, Illinois, neighborhoods with the fewest crime predictions were richer, with a higher proportion than the city average of families earning $200,000 a year or more. The neighborhoods with the most predictions didn’t have a single one; instead, they had twice as many low-income residents and more than double the percentage of Latino residents as the city average. “I would liken it to policing bias-by-proxy,” Elgin Police Department deputy chief Adam Schuessler said in an interview. The department has stopped using the software.

Overall, we found that the fewer White residents lived in an area—and the more Black and Latino residents lived there—the more likely PredPol would predict a crime there. The same disparity existed between richer and poorer communities.

In neighborhoods most targeted by Prediction software, Black and Latino populations were higher

“No one has done the work you guys are doing, which is looking at the data,” said Andrew Ferguson, a law professor at American University, who is a national expert on predictive policing. “This isn’t a continuation of research. This is actually the first time anyone has done this, which is striking because people have been paying hundreds of thousands of dollars for this technology for a decade.”

It’s impossible for us to know with certainty whether officers spent their free time in prediction areas, as PredPol recommends, and whether this led to any particular stop, arrest, or use of force. The few police departments that answered that question either said they couldn’t recall or that it didn’t result in any arrests, and the National Association of Criminal Defense Lawyers said its members are not informed when crime prediction software leads to charges.

Jumana Musa, director of that group’s Fourth Amendment Center, called the lack of information a “fundamental hurdle” to providing a fair defense.

“It’s like trying to diagnose a patient without anyone fully telling you the symptoms,” Musa said. “The prosecution doesn’t say, ‘The tool that we purchased from this company said we should patrol here.’”

That’s because they don’t know either, according to the National District Attorneys Association, which polled a smattering of members and found that none had heard of it being part of a case.

Only one of 38 law enforcement agencies in our analysis, the Plainfield Police Department in New Jersey, provided us with more than a few days of PredPol-produced data indicating when officers were in prediction boxes—and that data was sparse. None of it matched perfectly with arrest reports during that period, which were also provided by the agency.

We found the crime predictions for our analysis through a link on the Los Angeles Police Department’s public website, which led to an open cloud storage bucket containing PredPol predictions for not just the LAPD but also for dozens of other departments. When we downloaded the data on Jan. 31, 2021, it held 7.4 million predictions dating back to Feb. 15, 2018. Public access to that page is now blocked.

We limited our analysis to U.S. law enforcement agencies with at least six months of predictions and removed predictions generated outside of contract dates, which were likely testing or trial periods. That left 5.9 million predictions provided to 38 agencies over nearly three years.

Who uses PredPol

PredPol, which renamed itself Geolitica in March, criticized our analysis as based on reports “found on the internet.” But the company did not dispute the authenticity of the prediction reports, which we provided, acknowledging that they “appeared to be generated by PredPol.”

Company CEO Brian MacDonald said our data was “incomplete,” without further explanation, and “erroneous.” The errors, he said, were that one department inadvertently doubled up on some shifts, resulting in additional predictions, and that the data for at least 20 departments in the cache included predictions that were made after the contract period and not delivered to the agencies.

We explained that we had already discovered date discrepancies for exactly 20 departments and were not using that data in our final analysis and volunteered to share the analysis dates with him for confirmation. He instead offered to allow us to use the software for free on publicly available crime data instead of reporting on the data we had gathered. After we declined, he did not respond to further emails.

Only 13 out of 38 departments responded to requests for comment about our findings and related questions, most with a written statement indicating they no longer use PredPol.

One exception was the Decatur Police Department in Georgia. “The program as well as the officers’ own knowledge of where crime is occurring assists our department in utilizing our patrol resources more efficiently and effectively,” public information officer Sgt. John Bender said in an emailed statement. A third of Decatur’s low-income households were in a pair of neighborhoods that were each the subject of more than 11,000 crime predictions in two years.

As predictions increased, average household income decreased

Except for Elgin, Illinois, whose deputy chief called the software “bias by proxy,” none of the 38 agencies that used PredPol during our analysis period expressed concern about the stark demographic differences between the neighborhoods that received the most and least predictions.

We asked MacDonald whether he was concerned about the race and income disparities. He didn’t address those questions directly, but rather said the software mirrored reported crime rates, “to help direct scarce police resources to protect the neighborhoods most at risk of victimization.” The company has long held a position that because the software doesn’t include race or other demographic information in its analysis, that “eliminates the possibility for privacy or civil rights violations seen with other intelligence-led or predictive policing models.”

Yet according to a research paper, PredPol co-founders determined in 2018 that the algorithm would have targeted Black and Latino neighborhoods up to 400% more than White residents in Indianapolis had it been used there.

MacDonald said in his email that the company did not provide the study to its law enforcement clients because it “was an academic study conducted independently of PredPol.” The authors presented the paper at an engineering conference that’s not part of the usual police circuit, the 2018 IEEE International Conference on Systems, Man and Cybernetics.

The study authors developed a potential tweak to the algorithm that they said resulted in a more even distribution of crime predictions, but they found the predictions were less in line with later crime reports, making it less accurate than the original, although still “potentially more accurate” than human predictions.

MacDonald said the company didn’t adjust its algorithm in response.

“Such a change would reduce the protection provided to vulnerable neighborhoods with the highest victimization rates,” he said.

While MacDonald responded to some written questions by email, none of the company’s leaders would agree to an interview for this story.

To use PredPol’s algorithm, police departments set up an automatic feed of crime reports, which experts and police said include incidents reported by both the public and by officers, and choose which crimes they want to be predicted. The algorithm uses three variables to come up with future crime predictions: the date and time, the location, and the type of past crime reports.

The predictions consist of 500-by-500-foot boxes marked on a map listing the police shift during which the crimes are most likely to occur. PredPol advises officers to “get in the box” during free time. Officials in some cities said officers frequently drove to prediction locations and completed paperwork there.

How predictive policing works

In his email to Gizmodo and The Markup, MacDonald said the company’s choice of input data ensures the softare’s predictions are unbiased.

“We use crime data as reported to the police by the victims themselves,” he said. “If your house is burglarized or your car stolen, you are likely to file a police report.”

But that’s not always true, according to the federal Bureau of Justice Statistics (BJS). The agency found that only 40% of violent crimes and less than a third of property crimes were reported to police in 2020, which is in line with prior years.

The agency has found repeatedly that White crime victims are less likely to report violent crime to police than Black or Hispanic victims.

In a special report looking at five years of data, BJS found an income pattern as well. People earning $50,000 or more a year reported crimes to the police 12% less often than those earning $25,000 a year or less.

Wealthy and White victims of violent crime are less likely to report to police

This disparity in crime reporting would naturally be reflected in predictions.

“There’s no such thing as crime data,” said Phillip Goff, co-founder of the nonprofit Center for Policing Equity, which focuses on bias in policing. “There is only reported crime data. And the difference between the two is huge.”

MacDonald didn’t respond to questions about these studies and their implications, but PredPol’s founders acknowledged in their 2018 research paper that place-based crime prediction algorithms can focus on areas that are already receiving police attention, creating a feedback loop that leads to even more arrests and more predictions there.

We examined more than 270,000 arrests in 11 cities using PredPol that provided those records to us (most refused) and found that locations with lots of predictions tended to have high arrest rates in general, suggesting the software was largely recommending officers patrol areas they already frequented.

Five cities provided us with data on officer use of force, and we found a similar pattern. In Plainfield, per capita use-of-force rates were nearly double the city average in the neighborhoods with the most predictions. In Niles, Illinois, per capita use of force was more than double the city average in high-prediction neighborhoods. In Piscataway, New Jersey, the arrest rate was more than 10 times the city average in those neighborhoods.

Arrests per capita relative to jurisdiction average

“It’s a reason to keep doing what they’re already doing,” said Soto Garcia, the Birmingham-based activist, “which is saying, ‘This area sucks.’ And now they have the data to prove it.”

Take the 111-unit Buena Vista low-income housing complex in Elgin. Six times as many Black people live in the neighborhood where Buena Vista is located than the city average.

Police made 121 arrests at the complex between Jan. 1, 2018, and Oct. 15, 2020, according to records provided by the city, many for domestic abuse, several for outstanding warrants, and some for minor offenses, including a handful for trespassing by people excluded from the complex.

Those incidents, along with 911 calls, fed the algorithm, according to Schuessler, the Elgin Police Department’s deputy chief.

As a result, PredPol’s software predicted that burglaries, vehicle crimes, robberies, and violent crimes would occur there every day, sometimes multiple times a day—2,900 crime predictions over 29 months.

By comparison, the software only predicted about 5% as many crimes, 154, in an area about four miles north of Buena Vista where White residents are the majority.

Neighborhoods with the most predictions had the lowest share of White residents

Schuessler said police spent a lot of time at Buena Vista because of a couple of police programs, not software predictions.

Frequent police presence at Buena Vista, whatever led them there, had steep consequences for one family.

Brianna Hernandez had spent two years on a waiting list to get into Buena Vista. When she found an intent-to-evict notice on her door last year, she said she broke down in tears in the kitchen that would no longer be hers. It was November 2020. Daily covid-19 infection rates in Illinois had spiked to an all-time high, and hospitals were stuffed to capacity with the sick and the dying.

A few months earlier, Hernandez’s longtime boyfriend Jonathan King had stopped by Buena Vista to drop off cash for expenses for her and their three small children.

He was sitting on her car in the parking lot, waiting, when officer Josh Miller of the police department’s Crime Free Housing Unit rolled by in an unmarked car.

“You know you’re not supposed to be here, right?” King remembers Miller asking him.

The city’s crime-free housing ordinance requires all leases to allow eviction if the renters, their relatives, or guests are involved in criminal activity, even nearby, and allows the city to punish landlords that don’t deal with it.

King, now 31, said Buena Vista had banned him years before when he was on parole for a robbery he committed as a minor in Chicago 14 years earlier.

“They told him that once you got off probation you would be able to come back,” Hernandez said. “Apparently, that didn’t happen.”

It was King’s third arrest for trespassing at Buena Vista. He ran for it, and when officers caught up to King, they said they found a gun nearby, which King denies belongs to him. Miller arrested him for trespassing and weapons possession. The arrest came at the time of a PredPol prediction, but Schuessler said that’s not what led to it. That case is still pending.

“I know he’s banned, but what can a man do?” Hernandez asked. “He has kids.”

She said the arrest led to the eviction notice from Buena Vista. (Buena Vista wouldn’t confirm or deny it.) Hernandez remembers her 4-year-old and 5-year-old children asking, “Why are we going to a hotel?” and struggling for an answer. “They want to know why we’re moving stuff out. Why this and why that…. I wanted to sit down and cry.”

Robert Cheetham, the creator of a PredPol competitor, HunchLab, said he wrestled with the vicious cycle crime prediction algorithms could create.

“We felt like these kinds of design decisions mattered,” he said. “We wanted to avoid a situation where people are using the patrol area maps as an excuse for being around too much and in a way that wouldn’t necessarily be helpful.” He said his company tried to solve the problem by evening out the number of predictions delivered to each neighborhood.

Advocates in at least six cities we spoke to were unaware the software was being used locally. Even those involved in government-organized social justice committees said they didn’t have a clue about it.

“It did not come up in our meetings,” said Kenneth Brown, the pastor of Haverhill’s predominantly Black and Latino Calvary Baptist Church, who chaired a citywide task force on diversity and inclusion last year.

Calcasieu Parish, La., which started receiving predictions on April 9, 2019, refused to confirm it was using the software. Robert McCorquodale, an attorney with the sheriff’s office who handles public records requests, cited “public safety and officer safety” as the reasons and said that, hypothetically, he wouldn’t want would-be criminals to outwit the software.

“I don’t confess to be an expert in this area,” he said, “but I feel like this is not a public record.”

We kept Calcasieu in our data because its predictions began in the middle of our analysis period and continued until the end, suggesting it is a legitimate new client. Calcasieu’s predictions were not among the most disparate in our data, and removing them would not meaningfully alter the results of our analysis.

Gizmodo and The Markup also found that some policing agencies were using the software to predict crimes PredPol advises against. These include drug crimes, which research has shown are not equally enforced, and sex crimes, both of which MacDonald said the company advises clients against trying to predict.

However, we found four municipalities used PredPol to predict drug crimes between 2018 and 2021: Boone County, Indiana; Niles, Illinois; Piscataway, New Jersey; and Clovis, California. Clovis was also one of three departments using the software to predict sexual assaults. The other two were Birmingham and Fort Myers, Florida.

When we asked MacDonald about it, he said policing agencies make their own decisions on how to use the software.

“We provide guidance to agencies at the time we set them up and tell them not to include event types without clear victimization that can include officer discretion, such as drug-related offenses,” he wrote. “If they decide to add other event types later that is up to them.”

Thomas Mosier, the police chief in Piscataway, said in an interview that he doesn’t recall receiving any instructions about not predicting certain crime types. The other agencies declined to comment about it or ignored our questions altogether.

Nearly every agency also combined fundamentally different crime types into a single prediction. For instance, authorities in Grass Valley, California, mixed assaults and weapons crimes with commercial burglaries and car accidents.

MacDonald said, “research and data support the fact that multiple crime types can be concentrated in specific crime hotspots.”

Christopher Herrmann, a criminologist at the John Jay College of Criminal Justice, disagreed.

“Crime is very specific,” Herrmann said. “A serial murderer is not going to wake up one day and start robbing people or start stealing cars or selling drugs. The serial shoplifter isn’t going to start stealing cars. A serial rapist isn’t going to start robbing people.”

A study looking at crime patterns in Philadelphia found that “hot spots of different crime types were not found to overlap much,” and a 2013 book about predictive policing published by the RAND Corporation recommended against mixing crimes for predictions.

When we asked police departments that made arrests at the time and locations of PredPol predictions whether the software had brought them to the locations, they generally wouldn’t comment.

Corey Moses, for instance, was stopped by the LAPD on Feb. 11, 2019, for smoking a Newport cigarette in a nonsmoking area by a train station in MacArthur Park during the time of a crime prediction period there. The officer ran Moses’s name and discovered he had a warrant for an unpaid fine for fare evasion. Moses was cuffed, searched, and thrown in jail for the night.

“Sometimes you gotta really be doing some stupid stuff for the police to bother you, and then sometimes you don’t,” said Moses, who is Black and 41 years old. “You can just be at the wrong place at the wrong time.”

The LAPD didn’t respond to questions about whether the officer was responding to a PredPol prediction.

We did not try to determine how accurately PredPol predicted crime patterns. Its main promise is that officers responding to predictions prevent crimes by their presence.

But several police departments have dropped PredPol’s software in recent years, saying they didn’t find it useful or couldn’t judge its effectiveness. These include Piscataway; West Springfield, Massachusetts; and Los Angeles, Milpitas, and Tracy, California.

“As time went on, we realized that PredPol was not the program that we thought it was when we had first started using it,” Tracy Police Department chief of staff Sgt. Craig Koostra said in a written statement. He did not respond to a request to elaborate.

Some agencies soured on the software quickly. In 2014, a year after signing up, Milpitas Police Department lieutenant Greg Mack wrote in an evaluation that the software was “time consuming and impractical” and found no evidence that using it significantly lowered crime rates.

In his email, MacDonald declined to provide the number of clients the company has now or had during the analysis period but stated that the number of U.S. law enforcement agencies in our data set was not an accurate count of its clients since 2018. Of the 38 U.S. law enforcement agencies in our analysis, only 15 are still PredPol customers—and two of those said they aren’t using the software anymore, despite paying for it.

Even PredPol’s original partner, the LAPD, stopped using it last year.

The department said it was a financial decision. But it came after the LAPD’s inspector general said it couldn’t determine if the software was effective and members of the Stop LAPD Spying Coalition protested at a police commission meeting, waving signs reading “Data Driven Evidence Based Policing = Pseudoscience” and “Crime Data Is Racist.”

The result was an end to a relationship begun under former police chief, Bill Bratton, who had sent one of his lieutenants to UCLA to find interesting research that could be applied to crime-fighting. He ran across P. Jeffrey Brantingham, an anthropologist whose early work involved devising models for how ancient people first settled the Tibetan plateau.

“Each time mathematics interfaces itself with a new discipline, it is invigorated and renewed,” Brantingham and PredPol co-founder George Mohler, now a computer scientist at Indiana University–Purdue University Indianapolis wrote in a National Science Foundation grant application in 2009. Brantingham’s parents were academics who pioneered the field of environmental criminology, the study of the intersection of geography and crime. And he said he learned a lot at their feet.

“I didn’t realize it, but I was accumulating knowledge by osmosis, hearing about crime and criminal behavior while spending time with my parents,” Brantingham said in a 2013 profile in UCLA’s student newspaper.

“Criminals are effectively foragers,” he added. “Choosing what car to steal is like choosing which animal to hunt.”

Collaborating with LAPD burglary detectives, Brantingham and Mohler developed an algorithm to predict property crime and tested it out. It was credited with lowering property crimes by 9% in the division using it, while these crimes rose 0.2% in the rest of the city.

The academic research that led to PredPol was funded by more than $1.7 million in grants from the National Science Foundation. UCLA Ventures and a pair of executives from telephone headset manufacturer Plantronics invested $3.7 million between 2012 and 2014 to fund the nascent commercial venture.

Around the same time, the U.S. Department of Justice began encouraging law enforcement agencies to experiment with predictive policing. It has awarded grants to at least 11 cities since March 2009, including PredPol clients in Newark, New Jersey; Temple Terrace, Florida; Carlsbad and Alhambra, California; and the LAPD, which received $3 million for various projects.

But PredPol has now lost luster in academic circles: Last year, more than 1,400 mathematicians signed an open letter begging their colleagues not to collaborate on research with law enforcement, specifically singling out PredPol. Among the signatories were 13 professors, researchers, and graduate students at UCLA.

MacDonald in turn criticized the critics. “It seems irresponsible for an entire profession to say they will not cooperate in any way to help protect vulnerable communities,” he wrote in his email to Gizmodo and The Markup.

Ferguson, the American University professor, said that whatever PredPol’s future, crime predictions made by software are here to stay—though not necessarily as a standalone product. Rather, he said, it’s becoming part of a buffet of police data offerings from larger tech firms, including Oracle, Microsoft, Accenture, and ShotSpotter, which uses sound detection to report gunshots and bought the crime prediction software HunchLab.

When we reached out to those companies for comment all except for Oracle, which declined comment, distanced themselves with predictive policing—even though in the past all of them had pitched or publicized their products being used for it and HunchLab was a PredPol competitor.

PredPol’s original name was formed from the words predictive and policing, but even it is now distancing itself from the term—MacDonald called it a “misnomer”—and is branching out into other data services, shifting its focus to patrol-officer monitoring during its rebranding this year as Geolitica.

And that, too, was Ferguson’s point.

“These big companies that are going to hold the contracts for police [data platforms] are going to do predictive analytics,” Ferguson said.

“They’re just not going to call it predictive policing,” he added. “And it’s going to be harder to pull apart for journalists and academics.”

Read the full peer-reviewed analysis on which this report is based here.

Crime Prediction Software Promised to Be Free of Biases. New Data Shows It Perpetuates Them

Decades ago, when imagining the practical uses of artificial intelligence, science fiction writers imagined autonomous digital minds that could serve humanity. Sure, sometimes a HAL 9000 or WOPR would subvert expectations and go rogue, but that was very much unintentional, right?

And for many aspects of life, artificial intelligence is delivering on its promise. AI is, as we speak, looking for evidence of life on Mars. Scientists are using AI to try to develop more accurate and faster ways to predict the weather.

But when it comes to policing, the actuality of the situation is much less optimistic. Our HAL 9000 does not assert its own decisions on the world—instead, programs which claim to use AI for policing just reaffirm, justify, and legitimize the opinions and actions already being undertaken by police departments.

AI presents two problems: tech-washing, and a classic feedback loop. Tech-washing is the process by which proponents of the outcomes can defend those outcomes as unbiased because they were derived from “math.” And the feedback loop is how that math continues to perpetuate historically-rooted harmful outcomes. “The problem of using algorithms based on machine learning is that if these automated systems are fed with examples of biased justice, they will end up perpetuating these same biases,” as one philosopher of science notes.

Far too often artificial intelligence in policing is fed data collected by police, and therefore can only predict crime based on data from neighborhoods that police are already policing. But crime data is notoriously inaccurate, so policing AI not only misses the crime that happens in other neighborhoods, it reinforces the idea that the neighborhoods they are already over-policed are exactly the neighborhoods that police are correct to direct patrols and surveillance to.

How AI tech washes unjust data created by an unjust criminal justice system is becoming more and more apparent.

In 2021, we got a better glimpse into what “data-driven policing” really means. An investigation conducted by Gizmodo and The Markup showed that the software that put PredPol, now called Geolitica, on the map disproportionately predicts that crime will be committed in neighborhoods inhabited by working-class people, people of color, and Black people in particular. You can read here about the technical and statistical analysis they did in order to show how these algorithms perpetuate racial disparities in the criminal justice system.

Gizmodo reports that, “For the 11 departments that provided arrest data, we found that rates of arrest in predicted areas remained the same whether PredPol predicted a crime that day or not. In other words, we did not find a strong correlation between arrests and predictions.” This is precisely why so-called predictive policing or any data-driven policing schemes should not be used. Police patrol neighborhoods inhabited primarily by people of color--that means these are the places where they make arrests and write citations. The algorithm factors in these arrests and determines these areas are likely to be the witness of crimes in the future, thus justifying heavy police presence in Black neighborhoods. And so the cycle continues again.

This can occur with other technologies that rely on artificial intelligence, like acoustic gunshot detection, which can send false-positive alerts to police signifying the presence of gunfire.

This year we also learned that at least one so-called artificial intelligence company which received millions of dollars and untold amounts of government data from the state of Utah actually could not deliver on their promises to help direct law enforcement and public services to problem areas.

This is precisely why a number of cities, including Santa Cruz and New Orleans, have banned government use of predictive policing programs. As Santa Cruz’s mayor said at the time, “If we have racial bias in policing, what that means is that the data that’s going into these algorithms is already inherently biased and will have biased outcomes, so it doesn’t make any sense to try and use technology when the likelihood that it’s going to negatively impact communities of color is apparent.”

Next year, the fight against irresponsible police use of artificial intelligence and machine learning will continue. EFF will continue to support local and state governments in their fight against so-called predictive or data-driven policing.

This article is part of our Year in Review series. Read other articles about the fight for digital rights in 2021.

Police Use of Artificial Intelligence: 2021 in Review