Incident 124: Algorithmic Health Risk Scores Underestimated Black Patients’ Needs

Description: Optum's algorithm deployed by a large academic hospital was revealed by researchers to have under-predicted the health needs of black patients, effectively de-prioritizing them in extra care programs relative to white patients with the same health burden.
Alleged: Optum developed an AI system deployed by unnamed large academic hospital, which harmed Black patients.

Suggested citation format

Lutz, Roman. (2019-10-24) Incident Number 124. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
Report Count
Incident Date
Sean McGregor, Khoa Lam


New ReportNew ReportDiscoverDiscover

Incidents Reports

Care for some of the sickest Americans is decided in part by algorithm. New research shows that software guiding care for tens of millions of people systematically privileges white patients over black patients. Analysis of records from a major US hospital revealed that the algorithm used effectively let whites cut in line for special programs for patients with complex, chronic conditions such as diabetes or kidney problems.

The hospital, which the researchers didn’t identify but described as a “large academic hospital,” was one of many US health providers that employ algorithms to identify primary care patients with the most complex health needs. Such software is often tapped to recommend people for programs that offer extra support—including dedicated appointments and nursing teams—to people with a tangle of chronic conditions.

Researchers who dug through nearly 50,000 records discovered that the algorithm effectively low-balled the health needs of the hospital’s black patients. Using its output to help select patients for extra care favored white patients over black patients with the same health burden.

When the researchers compared black patients and white patients to whom the algorithm assigned similar risk scores, they found the black patients were significantly sicker, for example with higher blood pressure and less well-controlled diabetes. This had the effect of excluding people from the extra care program on the basis of race. The hospital automatically enrolled patients above certain risk scores into the program, or referred them for consideration by doctors.

The researchers calculated that the algorithm’s bias effectively reduced the proportion of black patients receiving extra help by more than half, from almost 50 percent to less than 20 percent. Those missing out on extra care potentially faced a greater chance of emergency room visits and hospital stays.

“There were stark differences in outcomes,” says Ziad Obermeyer, a physician and researcher at UC Berkeley who worked on the project with colleagues from the University of Chicago and Brigham and Women’s and Massachusetts General hospitals in Boston.

The paper, published Thursday in Science, does not identify the company behind the algorithm that produced those skewed judgments. Obermeyer says the company has confirmed the problem and is working to address it. In a talk on the project this summer, he said the algorithm is used in the care of 70 million patients and developed by a subsidiary of an insurance company. That suggests the algorithm may be from Optum, owned by insurer UnitedHealth, which says its product that attempts to predict patient risks, including costs, is used to “manage more than 70 million lives.” Asked by WIRED if its software was that in the study, Optum said in a statement that doctors should not use algorithmic scores alone to make decisions about patients. “As we advise our customers, these tools should never be viewed as a substitute for a doctor’s expertise and knowledge of their patients’ individual needs,” it said.

Keep Reading The latest on artificial intelligence , from machine learning to computer vision and more

The algorithm studied did not take account of race when estimating a person’s risk of health problems. Its skewed performance shows how even putatively race-neutral formulas can still have discriminatory effects when they lean on data that reflects inequalities in society.

The software was designed to predict patients’ future health costs, as a proxy for their health needs. It could predict costs with reasonable accuracy for both black patients and white patients. But that had the effect of priming the system to replicate unevenness in access to healthcare in America—a case study in the hazards of combining optimizing algorithms with data that reflects raw social reality.

When the hospital used risk scores to select patients for its complex care program it was selecting patients likely to cost more in the future—not on the basis of their actual health. People with lower incomes typically run up smaller health costs because they are less likely to have the insurance coverage, free time, transportation, or job security needed to easily attend medical appointments, says Linda Goler Blount, president and CEO of nonprofit the Black Women’s Health Imperative.

A Health Care Algorithm Offered Less Care to Black Patients

A widely used algorithm that predicts which patients will benefit from extra medical care dramatically underestimates the health needs of the sickest black patients, amplifying long-standing racial disparities in medicine, researchers have found.

The problem was caught in an algorithm sold by a leading health services company, called Optum, to guide care decision-making for millions of people. But the same issue almost certainly exists in other tools used by other private companies, nonprofit health systems and government agencies to manage the health care of about 200 million people in the United States each year, the scientists reported in the journal Science.

Correcting the bias would more than double the number of black patients flagged as at risk of complicated medical needs within the health system the researchers studied, and they are already working with Optum on a fix. When the company replicated the analysis on a national data set of 3.7 million patients, they found that black patients who were ranked by the algorithm as equally as in need of extra care as white patients were much sicker: They collectively suffered from 48,772 additional chronic diseases.

"It's truly inconceivable to me that anyone else's algorithm doesn't suffer from this," said Sendhil Mullainathan, a professor of computation and behavioral science at the University of Chicago Booth School of Business, who oversaw the work. "I'm hopeful that this causes the entire industry to say, 'Oh, my, we've got to fix this.' "

The algorithm wasn't intentionally racist — in fact, it specifically excluded race. Instead, to identify patients who would benefit from more medical support, the algorithm used a seemingly race-blind metric: how much patients would cost the health-care system in the future. But cost isn't a race-neutral measure of health-care need. Black patients incurred about $1,800 less in medical costs per year than white patients with the same number of chronic conditions; thus the algorithm scored white patients as equally at risk of future health problems as black patients who had many more diseases.

Machines increasingly make decisions that affect human life, and big organizations — particularly in health care — are trying to leverage massive data sets to improve how they operate. They utilize data that may not appear to be racist or biased but may have been heavily influenced by longstanding social, cultural and institutional biases — such as health-care costs. As computer systems determine which job candidates should be interviewed, who should receive a loan or how to triage sick people, the proprietary algorithms that power them run the risk of automating racism or other human biases.

In medicine, there is a long history of black patients facing barriers to accessing care and receiving less effective health care. Studies have found black patients are less likely to receive pain treatment, potentially lifesaving lung cancer surgery or cholesterol-lowering drugs, compared with white patients. Such disparities probably have complicated roots, including explicit racism, access problems, lack of insurance, mistrust of the medical system, cultural misunderstandings or unconscious biases that doctors may not even know they have.

Mullainathan and his collaborators discovered that the algorithm they studied, which was designed to help health systems target patients who would have the greatest future health-care needs, was predicting how likely people were to use a lot of health care and rack up high costs in the future. Since black patients generally use health care at lower rates, the algorithm was less likely to flag them as likely to use lots of health care in the future.

The algorithm would then deepen that disparity by flagging healthier white patients as in need of more intensive care management.

"Predictive algorithms that power these tools should be continually reviewed and refined, and supplemented by information such as socio-economic data, to help clinicians make the best-informed care decisions for each patient," Optum spokesman Tyler Mason said. "As we advise our customers, these tools should never be viewed as a substitute for a doctor's expertise and knowledge of their patients' individual needs."

Ruha Benjamin, an associate professor of African American studies at Princeton University, drew a parallel to the way Henrietta Lacks, a young African American mother with cervical cancer, was treated by the medical system. Lacks is well known now because her cancer cells, taken without her consent, are used throughout modern biomedical research. She was treated in a separate wing of Johns Hopkins Hospital in an era when hospitals were segregated. Imagine if today, Benjamin wrote in an accompanying article, Lacks were "digitally triaged" with an algorithm that didn't explicitly take into account her race but underestimated her sickness because it was using data that reflected historical bias to project her future needs. Such racism, though not driven by a hateful ideology, could have the same result as earlier segregation and substandard care.

"I am struck by how many people still think that racism always has to be intentional and fueled by malice. They don't want to admit the racist effects of technology unless they can pinpoint the bigoted boogeyman behind the screen," Benjamin said.

The software used to predict patients' need for more intensive medical support was an outgrowth of the Affordable Care Act, which created financial incentives for health systems to keep people well instead of waiting to treat them when they got sick. The idea was that it would be possible to simultaneously contain costs and keep people healthier by identifying those patients at greatest risk for becoming very sick and providing more resources to them. But because wealthy, white people tend to utilize more health care, such tools could also lead health systems to focus on them, missing an opportunity to help some of the sickest people.

Christine Vogeli, director of evaluation and research at the Center for Population Health at Partners HealthCare, a nonprofit health system in Massachusetts, said when her team first tested the algorithm, they mapped the highest scores in their patient population and found them concentrated in some of the most affluent suburbs of Boston. That led them to use the tool in a limited way, supplementing it with other information, rather than using it off the shelf.

"You're going to have to make sure people are savvy about it ... or you're going to have an issue where you're only serving the richest and most wealthy folks," Vogeli said.

Such biases may seem obvious in hindsight, but algorithms are notoriously opaque because they are proprietary products that can cost hundreds of thousands of dollars. The researchers who conducted the new study had an unusual amount of access to the data that went into the algorithm and what it predicted.

They also found a relatively straightforward way to fix the problem. Instead of just predicting which patients would incur the highest costs and use the most health care in the future, they tweaked the algorithm to make predictions about their future health conditions.

Suchi Saria, a machine learning and health-care expert at Johns Hopkins University, said the study was fascinating because it showed how, once a bias is detected, it can be corrected. Much of the scientific study of racial disparities in medicine provides evidence of inequity, but correcting those problems might require sweeping social and cultural changes, as well as individual behavior changes by thousands of providers. In contrast, once a flawed algorithm is identified, the bias can be removed.

"The cool thing is we could easily measure the bias that has historically existed, switch out the algorithm and correct the bias," Saria said. The trickier part may be developing an oversight mechanism that will detect the biases in the first place.

Saria said that one possibility is that data experts could potentially test companies' algorithms for bias, the same way security firms test whether a companies' cyber defenses are sufficient.

Racial bias in a medical algorithm favors white patients over sicker black patients

Millions of dollars are being spent to develop artificial intelligence software that reads x-rays and other medical scans in hopes it can spot things doctors look for but sometimes miss, such as lung cancers. A new study reports that these algorithms can also see something doctors don’t look for on such scans: a patient’s race.

The study authors and other medical AI experts say the results make it more crucial than ever to check that health algorithms perform fairly on people with different racial identities. Complicating that task: The authors themselves aren’t sure what cues the algorithms they created use to predict a person’s race.

Evidence that algorithms can read race from a person’s medical scans emerged from tests on five types of imagery used in radiology research, including chest and hand x-rays and mammograms. The images included patients who identified as Black, white, and Asian. For each type of scan, the researchers trained algorithms using images labeled with a patient’s self-reported race. Then they challenged the algorithms to predict the race of patients in different, unlabeled images.

Radiologists don’t generally consider a person’s racial identity—which is not a biological category—to be visible on scans that look beneath the skin. Yet the algorithms somehow proved capable of accurately detecting it for all three racial groups, and across different views of the body.

For most types of scan, the algorithms could correctly identify which of two images was from a Black person more than 90 percent of the time. Even the worst performing algorithm succeeded 80 percent of the time; the best was 99 percent correct. The results and associated code were posted online late last month by a group of more than 20 researchers with expertise in medicine and machine learning, but the study has not yet been peer reviewed.

The results have spurred new concerns that AI software can amplify inequality in health care, where studies show Black patients and other marginalized racial groups often receive inferior care compared to wealthy or white people.

Machine-learning algorithms are tuned to read medical images by feeding them many labeled examples of conditions such as tumors. By digesting many examples, the algorithms can learn patterns of pixels statistically associated with those labels, such as the texture or shape of a lung nodule. Some algorithms made that way rival doctors at detecting cancers or skin problems; there is evidence they can detect signs of disease invisible to human experts.

Judy Gichoya, a radiologist and assistant professor at Emory University who worked on the new study, says the revelation that image algorithms can “see” race in internal scans likely primes them to also learn inappropriate associations.

“We have to educate people about this problem and research what we can do to mitigate it.” Judy Gichoya, radiologist and assistant professor, Emory University

Medical data used to train algorithms often bears traces of racial inequalities in disease and medical treatment, due to historical and socioeconomic factors. That could lead an algorithm searching for statistical patterns in scans to use its guess at a patient’s race as a kind of shortcut, suggesting diagnoses that correlate with racially biased patterns from its training data, not just the visible medical anomalies that radiologists look for. Such a system might give some patients an incorrect diagnosis or a false all-clear. An algorithm might suggest different diagnoses for a Black person and white person with similar signs of disease.

“We have to educate people about this problem and research what we can do to mitigate it,” Gichoya says. Her collaborators on the project came from institutions including Purdue, MIT, Beth Israel Deaconess Medical Center, National Tsing Hua University in Taiwan, University of Toronto, and Stanford.

Previous studies have shown that medical algorithms have caused biases in care delivery, and that image algorithms may perform unequally for different demographic groups. In 2019, a widely used algorithm for prioritizing care for the sickest patients was found to disadvantage Black people. In 2020, researchers at the University of Toronto and MIT showed that algorithms trained to flag conditions such as pneumonia on chest x-rays sometimes performed differently for people of different sexes, ages, races, and types of medical insurance.

Paul Yi, director of the University of Maryland’s Intelligent Imaging Center, who was not involved in the new study showing algorithms can detect race, describes some of its findings as “eye opening,” even “crazy.”

Radiologists like him don’t typically think about race when interpreting scans, or even know how a patient self-identifies. “Race is a social construct and not in itself a biological phenotype, even though it can be associated with differences in anatomy,” Yi says.

Frustratingly, the authors of the new study could not figure out how exactly their models could so accurately detect a patient’s self-reported race. They say that will likely make it harder to pick up biases in such algorithms.

Follow-on experiments showed that the algorithms were not making predictions based on particular patches of anatomy, or visual features that might be associated with race due to social and environmental factors such as body mass index or bone density. Nor did age, sex, or specific diagnoses that are associated with certain demographic groups appear to be functioning as clues.

The fact that algorithms trained on images from a hospital in one part of the US could accurately identify race in images from institutions in other regions appears to rule out the possibility that the software is picking up on factors unrelated to a patient’s body, says Yi, such as differences in imaging equipment or processes.

Whatever the algorithms were seeing, they saw it clearly. The software could still predict patient race with high accuracy when x-rays were degraded so that they were unreadable to even a trained eye, or blurred to remove fine detail.

Luke Oakden-Rayner, a coauthor on the new study and director of medical imaging research at Royal Adelaide Hospital, Australia, calls the AI ability the collaborators uncovered “the worst superpower.” He says that despite the unknown mechanism, it demands an immediate response from people developing or selling AI systems to analyze medical scans.

A database of AI algorithms maintained by the American College of Radiology lists dozens for analyzing chest imagery that have been approved by the Food and Drug Administration. Many were developed using standard data sets used in the new study that trained algorithms to predict race. Although the FDA recommends that companies measure and report performance on different demographic groups, such data is rarely released.

Oakden-Rayner says that such checks and disclosures should become standard. “Commercial models can almost certainly identify the race of patients, so companies need to ensure that their models are not utilizing that information to produce unequal outcomes,” he says.

Yi agrees, saying the study is a reminder that while machine-learning algorithms can help human experts with practical problems in the clinic, they work differently than people. “Neural networks are sort of like savants, they’re singularly efficient at one task,” he says. “If you train a model to detect pneumonia, it’s going to find one way or another to get that correct answer, leveraging whatever it can find in the data.”

These Algorithms Look at X-Rays-and Somehow Detect Your Race

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents