Incident 92: Apple Card's Credit Assessment Algorithm Allegedly Discriminated against Women
Suggested citation format
CSET Taxonomy ClassificationsTaxonomy Details
In November 2019, customers of Goldman-Sachs and Apple's Apple Card, the first credit offering by Goldman-Sachs, claimed that there was gender discrimination in the credit assessment algorithm that distributes credit lines, with men receiving significantly higher credit limits than women with equal credit qualifications. Apple co-founder Steve Wozniak confirmed this also happened with him and his wife and the New York Department for Financial Services have launched an investigation regarding the discrimination claim. In response to this incident, Goldman Sachs made a statement that it has not and will never make decisions based on factors like gender, race, age, sexual orientation or any other legally prohibited factors when determining credit worthiness.
In November 2019, Apple Card clients claimed that the credit assessment algorithm possesses a gender bias in favor of men.
Harm Distribution Basis
AI System Description
Goldman-Sachs uses a credit assessment algorithm that factors credit score, credit report, and reported income to determine credit lines for clients
Sector of Deployment
Financial and insurance activities
Relevant AI functions
Perception, Cognition, Action
machine learning, daya analytics
data analytics, recommendation engine, decision support
Goldman Sachs, Apple Card, Apple, Steve Wozniak, New York Department of Financial Services
The Equal Credit Opportunity Act (ECOA) prohibits credit discrimination on the basis of race, color, religion, national origin, sex, marital status, age, or because you get public assistance.
credit score, credit report, reported income
What started with a viral Twitter thread metastasized into a regulatory investigation of Goldman Sachs’ credit card practices after a prominent software developer called attention to differences in Apple Card credit lines for male and female customers.
David Heinemeier Hansson, a Danish entrepreneur and developer, said in tweets last week that his wife, Jamie Hansson, was denied a credit line increase for the Apple Card, despite having a higher credit score than him.
“My wife and I filed joint tax returns, live in a community-property state, and have been married for a long time. Yet Apple’s black box algorithm thinks I deserve 20x the credit limit she does,” Hansson tweeted.
Hansson detailed the couple’s efforts to bring up the issue with Apple’s customer service, which resulted in a formal internal complaint. Representatives repeatedly assured the couple there was no discrimination, citing the algorithm that makes Apple Card’s credit assessments. Jamie Hansson’s credit limit was ultimately bumped up to equal his, but he said this failed to address the root of the problem.
Hansson’s tweets drew the attention of Linda Lacewell, superintendent of New York’s State Department of Financial Services, who announced Saturday that her office would investigate the Apple Card algorithm over claims of discrimination.
“This is not just about looking into one algorithm,” she wrote in a Medium post. “DFS wants to work with the tech community to make sure consumers nationwide can have confidence that the algorithms that increasingly impact their ability to access financial services do not discriminate and instead treat all individuals equally and fairly.”
Apple didn’t immediately respond to a request for comment from The Washington Post.
With the spread of automation, more and more decisions about our lives are made by computers, from credit approval to medical care to hiring choices. The algorithms — formulas for processing information or completing tasks — that make these judgments are programmed by people and thus often reproduce human biases, unintentionally or otherwise, resulting in less favorable outcomes for women and people of color. But the public, and even companies themselves, often have little visibility into how algorithms operate.
“Women tend to be better credit risks. While it is illegal to discriminate the data indicates that controlling for income, and other things, women are better credit risks,” said Aaron Klein, a Brookings Institution fellow. “So giving men better terms of credit is both illegal and seems to be inconsistent with international experience.”
Past iterations of Google Translate have struggled with gender bias in translations. Amazon was forced to jettison an experimental recruiting tool in 2017 that used artificial intelligence to score candidates because the prevalence of male candidates resulted in the algorithm penalizing résumés that included “women’s” and downgrading candidates who attended women’s colleges. A study published last month in Science found racial bias in a widely used health-care risk-prediction algorithm made black patients significantly less likely than white patients to get important medical treatment.
“It does not matter what the intent of the individual Apple reps are, it matters what the algorithm they’ve placed their complete faith in does,” Hansson tweeted. “And what it does is discriminate.”
Dozens of people shared similar experiences after Hansson’s tweets went viral, including Apple co-founder Steve Wozniak, who indicated his credit limit is 10 times that of his wife. The outcry prompted Goldman Sachs to issue a response Sunday stressing that credit assessments are made based on individual income and creditworthiness, which could result in family members having “significantly different credit decisions.”
“In all cases, we have not and will not make decisions based on factors like gender,” Andrew Williams, a spokesman for Goldman Sachs, said in a statement.
Released in August through a partnership with Goldman Sachs, the Apple Card is a “digital first,” numberless credit card “built on simplicity, transparency and privacy,” according to a news release.
The algorithm responsible for credit decisions for the Apple Card is giving females lower credit limits than equally qualified males. Those are the allegations that began spreading as consumers took to social media with complaints about Apple's credit card designed to work with Apple Pay and on various Apple devices.
The controversy began on November 7 when entrepreneur David Heinemeier Hansson, the creator of the Ruby on Rails programming tool, posted a lengthy, and angry, thread to Twitter complaining of his wife's experience with the Apple Card.
“The @AppleCard is such a [expletive] sexist program. My wife and I filed joint tax returns, live in a community-property state, and have been married for a long time. Yet Apple’s black box algorithm thinks I deserve 20x [sic] the credit limit she does. No appeals work,” Hansson tweeted. “It gets even worse. Even when she pays off her ridiculously low limit in full, the card won’t approve any spending until the next billing period. Women apparently aren’t good credit risks even when they pay off the [expletive] balance in advance and in full.”
Hansson goes on to describe his experience dealing with Apple Card's customer support regarding the issue. He says customer service reps assured him there was no discrimination involved and that the outcomes he and his wife were seeing were due to the algorithm.
“So let’s recap here: Apple offers a credit card that bases its credit assessment on a black-box algorithm that [six] different reps across Apple and [Goldman Sachs] have no visibility into. Even several layers of management. An internal investigation. IT’S JUST THE ALGORITHM!” Hansson wrote (emphasis his). “...So nobody understands THE ALGORITHM. Nobody has the power to examine or check THE ALGORITHM. Yet everyone we’ve talked to from both Apple and [Goldman Sachs] are SO SURE that THE ALGORITHM isn’t biased and discriminating in any way. That’s some grade-A management of cognitive dissonance.”
David Heinemeier Hansson tweeted a lengthy statement outlining his frustration with Apple Card. (Tweet edited for language).
Hansson's tweets prompted others to share similar experiences, most notably Apple co-founder Steve Wozniak. The same thing happened to us,” Wozniak tweeted. “I got 10x [sic] the credit limit. We have no separate bank or credit card accounts or any separate assets. Hard to get to a human for a correction though. It's big tech in 2019.”
Filmmaker Lexi Alexander said she and a group of her friends applied for an Apple Card to see if the allegations were true. What they found confirmed the accounts made by Hansson and Wozniak. “A bunch of us applied [for] this card today. It takes 5 sec on your iPhone [and] it doesn’t show up on your credit history (I’ve been told). Apple Card then makes you a credit limit [and] APR offer which you can accept or deny. I’m currently trying to recover from the sexist slap in my face,” Alexander tweeted. “Like it’s really really bad. Male friends with bad credit score and irregular income got way better offers than women with perfect credit and high incomes. There were 12 of us, 6 women 6 men. We just wanted to see what’s up and it was not pretty.”
As complaints about the Apple Card went viral, Goldman Sachs, the New York-based bank which backs the Apple Card, issued a statement on November 10. In the statement Goldman Sachs said the issue stems from the fact that credit decisions regarding the Apple Card are based on individual credit lines and histories, not those shared with family members.
“As with any other individual credit card, your application is evaluated independently,” the Goldman Sachs statement said. “We look at an individual’s income and an individual’s creditworthiness, which includes factors like personal credit scores, how much personal debt you have, and how that debt has been managed. Based on these factors, it is possible for two family members to receive significantly different credit decisions...In all cases, we have not and will not make decisions based on factors like gender.”
The contributing factors cited by Goldman Sachs would seem to contradict those offered by people such as Hansson and Wozniak.
CNBC reported that the discrimination allegations have spurred the New York Department of Financial Services (DFS) to launch an official investigation into Goldman Sachs’ credit card practices. “DFS is troubled to learn of potential discriminatory treatment in regards to credit limit decisions reportedly made by an algorithm of Apple Card, issued by Goldman Sachs,” Linda Lacewell, superintendent for the DFS, told CNBC, “The Department will be conducting an investigation to determine whether New York law was violated and ensure all consumers are treated equally regardless of sex.”
According to CNBC, Goldman Sachs was aware of the potential bias when the Apple Card rolled out in August. But the the bank opted to have credit decisions made on an individual basis to avoid the complexity that comes with dealing with co-signers and other shared accounts.
The Black Box Problem
While these reports of bias related to the Apple Card are surely drawing attention due to the high-profile names attached, it's far from the first case of an widely-used AI algorithm exhibiting bias. Incidents of algorithmic bias in healthcare, lending, and even criminal justice applications have been discovered in recent years. And experts at many major technology companies and research institutions are working diligently to address bias in AI.
“Part of the problem here is that, as with many AI and machine learning algorithms, the Apple Card’s is a black box; meaning, there is no framework in place to trace the algorithm’s training and decision-making,” Irina Farooq, chief product officer at data analytics company, Kinetica, told Design News in a prepared statement. “For corporations, this is a significant legal and PR risk. For society, this is even more serious. If we cede our decision-making to AI, whether for ride-sharing refunds, insurance billing, or mortgage rates, we risk subjecting ourselves to judgment with no appeal, to a monarchy of machines where all the world is a data set, and all the men and women, merely data.”
Farooq echoed the statements of many concerned with bias in AI by stating that the algorithms we employ are only as fair as the data they are trained with. “The parameters of what the algorithm should take into account when analyzing a data set are still set by people. And the developers and data scientists doing this work may not be aware of the unconscious biases the parameters they’ve put in place contain,” she said. “We don’t know what the parameters were for the Apple Card’s credit determinations, but if factors included annual income without considering joint property ownership and tax filings, women, who in America still make 80.7 [cents] for every man’s dollar, would be at an inherent disadvantage.”
On November 11, following the announcement of the New York DFS investigation, Carey Halio, CEO of Goldman Sachs Bank USA, released another statement on behalf of the bank, pledging to work to ensure its algorithms are not exhibiting bias and to ask any customers who feel they may have been affected to reach out.
“We have not and never will make decisions based on factors like gender. In fact, we do not know your gender or marital status during the Apple Card application process,” Halio wrote. “We are committed to ensuring our credit decision process is fair. Together with a third party, we reviewed our credit decisioning process to guard against unintended biases and outcomes.”
When tech entrepreneur David Heinmeier Hansson recently took to Twitter saying the Apple Card gave him a credit limit that was 20 times higher than his wife's, despite the fact that she had a higher credit score, it may have been the first major headline about algorithmic bias you read in your everyday life. It was not the first — there have been major stories about potential algorithmic bias in child care and insurance — and it won't be the last.
The chief technology officer of project management software firm Basecamp, Heinmeier was not the only tech figure speaking out about algorithmic bias and the Apple Card. In fact, Apple's own co-founder Steve Wozniak had a similar experience. Presidential candidate Elizabeth Warren even got in on the action, bashing Apple and Goldman, and regulators said they are launching a probe.
Goldman Sachs, which administers the card for Apple, has denied the allegations of algorithmic gender bias, and has also said it will examine credit evaluations on a case-by-case basis when applicants feel the card's determination is unfair.
Goldman spokesman Patrick Lenihan said algorithmic bias is an important issue, but the Apple Card is not an example of it. "Goldman Sachs has not and will never make decisions based on factors like gender, race, age, sexual orientation or any other legally prohibited factors when determining credit worthiness. There is no 'black box.'" he said, referring to a term often used to describe algorithms. "For credit decisions we make, we can identify which factors from an individual's credit bureau issued credit report or stated income contribute to the outcome. We welcome a discussion of this topic with policymakers and regulators."
As AI and the algorithms that underlie technology become an increasingly large part of everyday life, it's important to know more about the technology. One of the major claims made by technology firms using algorithms in decisions like credit scoring is that algorithms are less biased than human beings. That's being used in areas like job hiring: The state of California recently passed a rule to encourage the development of more job-based algorithms to remove human bias from the hiring process. But it is far from 100% scientifically proven that an AI that relies on code written by humans, as well as data fed into it as a learning mechanism, will not reflect the existing biases of our world.
Here are key points about AI algorithms that will factor in future headlines.
1. A.I. is already being used widely in key areas of life
As Hansson and his wife found out, AI systems are becoming more commonplace in areas that everyday people rely on.
This technology is not only being introduced in credit and job hiring but insurance, mortgages and child welfare.
In 2016, Allegheny County, Pennsylvania, introduced a tool called the Allegheny Family Screening Tool. It is a predictive-risk modeling tool that is used to help with child welfare call-screening decisions when concerns of child maltreatment are raised to the county's department of human services.
The system collected data on each person in the referral and uses it to create an "overall family score." That score determines the likelihood of a future event.
Allegheny did face some backlash, but one conclusion was that it created "less bad bias." Other places, including Los Angeles, have used similar technology in an attempt to improve child welfare, and it is an example of how AI systems will be used in ways that can affect people in large ways, and as a result, it is important to know how those systems can be flawed.
2. A.I. can be biased
Most AI is created from a process called machine learning, which is teaching a computer something by feeding them thousands of pieces of data to help them learn the information of the data set by itself.
An example would be giving an AI system thousands of pictures of dogs, with the purpose of teaching the system what a dog is. From there the system would be able to look at a photo and decide whether it is a dog or not based on that past data.
So what if the data you are feeding a system is 75% golden retrievers and 25% Dalmations?
Postdoctoral researcher at the AI Now Institute, Dr. Sarah Myers West, says these systems are built to reflect the data they are fed, and that data can be built on bias.
"These systems are being trained on data that's reflective of our wider society," West said. "Thus, AI is going to reflect and really amplify back past forms of inequality and discrimination."
One real-world example: While the human manager-based hiring process can undoubtedly be biased, debate remains over whether algorithmic job application technology undoubtedly removes human bias. The AI learning process could incorporate the biases of the data they are fed — for example, the resumes of top-performing candidates at top firms.
3. People who program A.I. can be biased
The AI Now Institute has also found biases in the people who are creating AI systems. In an April 2019 study, they found that only 15% of the AI staff at Facebook are women, and only 4% of their total workforce are black. Google's workforce is even less diverse, with only 10% of their AI staff being women and 2.5% of their workers black.
Joy Buolamwini, a computer scientist at MIT, found during her research on a project that would project digital masks onto a mirror, that the generic facial recognition software she was using would not identify her face unless she used a white colored mask.
She found that her system could not identify the face of a black woman, because the data set it was running on were overwhelmingly lighter-skinned.
"Quite clearly, it's not a solved problem," West said. "It's actually a very real problem that keeps resurfacing in AI systems on a weekly, almost daily basis."
4. Algorithms are not public information
AI algorithms are completely proprietary to the company that created them.
"Researchers face really significant challenges understanding where there's algorithmic bias because so many of them are opaque," West said.
Even if we could see them, it doesn't mean we would understand, says co-director of the Digital Platforms and Democracy Project, and Shorenstein Fellow at Harvard University, Dipayan Ghosh.
"It's difficult to draw any conclusions based on source code," Ghosh said. "Apple's proprietary creditworthiness algorithm is something that not even Apple can easily pin down, and say, 'Okay, here is the code for this,' because it probably involves a lot of different sources of data and a lot of different implementations of code to analyze that data in different siloed areas of the company."
To take things a step further, companies like Apple write their code to be legible to Apple employees, and it may not make sense to those outside of the company.
5. There is limited government oversight of A.I.
Right now there is little government oversight of AI systems.
"When AI systems are being used in areas that are of incredible social, political and economic importance, we have a stake in understanding how they are affecting our lives," West said. "We currently don't really have the avenues for the kind of transparency we would need for accountability."
One presidential candidate is trying to change that. New Jersey Senator Cory Booker sponsored a bill earlier this year called "The Algorithmic Accountability Act."
The bill requires companies to look at flawed algorithms that could create unfair or discriminatory situations for Americans. Under the bill, the Federal Trade Commission would be able to create regulations to 'conduct impact assessments of highly sensitive automated decision systems.' That requirement would impact systems under the FTC's jurisdiction, new or existing.
Booker isn't the first politician to call for better regulation of AI. In 2016, the Obama administration called for development within the industry of algorithmic auditing and external testing of big data systems.
6. Algorithms can be audited, but it is not a requirement
While government oversight is rare, an increasing practice is third-party auditing of algorithms.
The process involves an outside entity coming in and analyzing how the algorithm is made without revealing trade secrets, which is a large reason why algorithms are private.
Ghosh says this is happening more frequently, but not all of the time.
"It happens when companies feel compelled by public opinion or public sway to do something because they don't want to be called out having had no audits whatsoever," Ghosh said.
Ghosh also said that regulatory action can happen, as seen in the FTC's numerous investigations into Google and Facebook. "If a company is shown to harmfully discriminate, then you could have a regulatory agency come in and say 'Hey, we're either going to sue you in court, or you're going to do X,Y and Z. Which one do you want to do?'"
This story has been updated to include a comment from Goldman Sachs that it has not and will never make decisions based on factors like gender, race, age, sexual orientation or any other legally prohibited factors when determining credit worthiness.
US regulators are investigating whether Apple’s credit card, launched in August, is biased against women. Software engineer David Heinemeier Hansson reported on social media that Apple had offered him a spending limit 20 times higher than his wife, Jamie Heinemeier Hansson. When Jamie spoke to customer service at Goldman Sachs, the bank behind the Apple card, she was told her credit limit was determined by an algorithm, and bank reps couldn’t explain why it came to the conclusion it did.
A spokesman for Goldman told Bloomberg, “Our credit decisions are based on a customer’s creditworthiness and not on factors like gender, race, age, sexual orientation or any other basis prohibited by law.” Apple and Goldman claim to use applicants’ credit score, information in their credit report, and income to establish credit limits.
There is no evidence yet that the algorithm is sexist, beyond these anecdotes. But a lack of transparency has been a recurring theme. Goldman didn’t respond to questions from Quartz about the exact mechanisms it used to determine Jamie Heinemeier Hansson’s credit limit. Further information about which quantitative measures it used in this process—high-powered machine learning? Eighth-grade algebra?—could offer clues about what, if anything, went wrong here.
For example, in 2018 when Goldman wanted to show off its quantitative prowess by forecasting the winner of the soccer World Cup, its researchers turned to machine learning. They could have used basic statistics, but that would not have been as precise. Goldman’s quants said a prediction method that harnessed machine learning methods (such as random forest, Bayesian ridge regression, and a gradient boosted machine) was five times more accurate than using a simpler statistical regression.
The problem with using a machine learning method is that it makes it hard to explain how a prediction works. Machine learning tools are, for the most part, black boxes: For what they promise in accuracy, data scientists using them lose the ability to understand how much each factor matters to the ultimate outcome of a prediction (in statistics, this is called “inference”).
For the World Cup, Goldman’s researchers knew that the variables of team strength, individual player strength, and recent performance were important predictors, but quantifying precisely how much each matters to the outcome of a match was impossible. While a regression-based model would have been a blunter tool, it would have allowed the researchers to clearly state how much of an effect each variable had on their prediction. Basically, it would have been better on transparency, but worse on forecasting.
And in the end, Goldman’s fancy algorithm did a pretty poor job of predicting the World Cup anyway. A model that was at least easier to explain may have been more useful.
In the case of the Apple Card, we don’t know for sure whether Goldman used machine learning to inform its system for calculating credit limits, but it seems likely it did, and by doing so may have put primacy on precision above all else. As mathematician Cathy O’Neal recently told Slate, when companies choose to use algorithms, “[t]hey look at the upside—which is faster, scalable, quick decision-making—and they ignore the downside, which is that they’re taking on a lot of risk.”
Data science, as a field, tends to focus on making predictions. This narrow goal may lead companies further away from thinking about bias or how well they can explain decision-making methodologies to regulators and the public at large. It can also lead to less consideration of the shortcomings of data fed into algorithmic models—some research suggests credit scoring is discriminatory, and any model incorporating that data will reflect that bias. But in many cases in modern data science, if the model makes a forecast “better” in statistical terms, its other effects may be overlooked.
The possibility that Apple Card applicants were subject to gender bias opens a new frontier for the financial services sector in which regulators are largely absent, argues Karen Mills.
In late August, the Apple Card debuted with a minimalist look and completely “no fee” model, creating a frenzy of anticipation. Millions signed up to be alerted for the release. Designed to boost traffic to its slow-to-be-adopted Apple Pay system and increase consumer dependency on iPhones, the Apple Card marked another significant innovation in access to financial services.
Fast forward two months, and Apple Card may now find its place in history for a less positive reason—the dark side of the technological revolution rearing its ugly head. Last week, Danish programmer David Heinemeier Hansson tweeted that after both he and his wife Jamie applied for the Apple Card with much of the same or shared financial information, he was astonished to receive a credit limit 20 times higher, despite his wife’s higher credit score.
Cue the viral tweet storm that followed, rife with accusations of bias in Goldman Sachs’s underwriting model. (Goldman developed and issued the card.) Adding fuel to the fire, Apple co-founder Steve Wozniak shared that the same thing had happened to him and his wife. Officials from the New York Department of Financial Services quickly chimed in, assuring the Twittersphere that they would investigate.
Technology is undeniably transforming the financial services industry. Fintechs, Big Tech, and banks are using increasing volumes of data, artificial intelligence, and machine learning to build new algorithms to determine creditworthiness. The lending process, which was historically plagued by frictions, is becoming potentially more accurate, efficient, and cost effective.
For small-business lending, technology is changing the game, providing access to capital for more small businesses that need it to grow and succeed. But when lending relies on algorithms to make loan and underwriting decisions, as the Apple Card situation illustrates, the potential for discrimination grows.
Should the customers be able to see what pieces of data may have led to a loan rejection or a lower credit limit? Should regulators have access to the algorithms and test them for the impact they have on underserved or protected classes?
The Apple Card situation has raised these questions in a visible way and the public engagement has been strong and immediate. Clearly, this is a new frontier for the financial services sector—and the industry’s regulators are also operating without a roadmap. We need to stop arguing about more versus less financial regulation and begin the hard work of creating smart regulation. This would include at least three parts, all of which are all hard to accomplish:
Disclosure rules on who gets to see what is in the credit algorithms.
Increased expertise at the regulatory agencies.
Data collection to know who is getting loans and where the gaps are occurring.
The Apple Card fiasco is not going to be an isolated incident—it’s the canary in the coal mine for the financial services industry and regulators playing catch up to the implications of the fintech revolution. For all the promise that comes with the Apple Card or other new innovations for deploying capital, if creditworthy customers are being shut out, that’s a problem. Even worse, if we don't understand why, we can’t fix it.
Advocates of algorithmic justice have begun to see their proverbial “days in court” with legal investigations of enterprises like UHG and Apple Card. The Apple Card case is a strong example of how current anti-discrimination laws fall short of the fast pace of scientific research in the emerging field of quantifiable fairness.
While it may be true that Apple and their underwriters were found innocent of fair lending violations, the ruling came with clear caveats that should be a warning sign to enterprises using machine learning within any regulated space. Unless executives begin to take algorithmic fairness more seriously, their days ahead will be full of legal challenges and reputational damage.
What happened with Apple Card?
In late 2019, startup leader and social media celebrity David Heinemeier Hansson raised an important issue on Twitter, to much fanfare and applause. With almost 50,000 likes and retweets, he asked Apple and their underwriting partner, Goldman Sachs, to explain why he and his wife, who share the same financial ability, would be granted different credit limits. To many in the field of algorithmic fairness, it was a watershed moment to see the issues we advocate go mainstream, culminating in an inquiry from the NY Department of Financial Services (DFS).
At first glance, it may seem heartening to credit underwriters that the DFS concluded in March that Goldman’s underwriting algorithm did not violate the strict rules of financial access created in 1974 to protect women and minorities from lending discrimination. While disappointing to activists, this result was not surprising to those of us working closely with data teams in finance.
There are some algorithmic applications for financial institutions where the risks of experimentation far outweigh any benefit, and credit underwriting is one of them. We could have predicted that Goldman would be found innocent, because the laws for fairness in lending (if outdated) are clear and strictly enforced.
And yet, there is no doubt in my mind that the Goldman/Apple algorithm discriminates, along with every other credit scoring and underwriting algorithm on the market today. Nor do I doubt that these algorithms would fall apart if researchers were ever granted access to the models and data we would need to validate this claim. I know this because the NY DFS partially released its methodology for vetting the Goldman algorithm, and as you might expect, their audit fell far short of the standards held by modern algorithm auditors today.
How did DFS (under current law) assess the fairness of Apple Card?
In order to prove the Apple algorithm was “fair,” DFS considered first whether Goldman had used “prohibited characteristics” of potential applicants like gender or marital status. This one was easy for Goldman to pass — they don’t include race, gender or marital status as an input to the model. However, we’ve known for years now that some model features can act as “proxies” for protected classes.
If you’re Black, a woman and pregnant, for instance, your likelihood of obtaining credit may be lower than the average of the outcomes among each overarching protected category.
The DFS methodology, based on 50 years of legal precedent, failed to mention whether they considered this question, but we can guess that they did not. Because if they had, they’d have quickly found that credit score is so tightly correlated to race that some states are considering banning its use for casualty insurance. Proxy features have only stepped into the research spotlight recently, giving us our first example of how science has outpaced regulation.
In the absence of protected features, DFS then looked for credit profiles that were similar in content but belonged to people of different protected classes. In a certain imprecise sense, they sought to find out what would happen to the credit decision were we to “flip” the gender on the application. Would a female version of the male applicant receive the same treatment?
Intuitively, this seems like one way to define “fair.” And it is — in the field of machine learning fairness, there is a concept called a “flip test” and it is one of many measures of a concept called “individual fairness,” which is exactly what it sounds like. I asked Patrick Hall, principal scientist at bnh.ai, a leading boutique AI law firm, about the analysis most common in investigating fair lending cases. Referring to the methods DFS used to audit Apple Card, he called it basic regression, or “a 1970s version of the flip test,” bringing us example number two of our insufficient laws.
A new vocabulary for algorithmic fairness
Ever since Solon Barocas’ seminal paper “Big Data’s Disparate Impact” in 2016, researchers have been hard at work to define core philosophical concepts into mathematical terms. Several conferences have sprung into existence, with new fairness tracks emerging at the most notable AI events. The field is in a period of hypergrowth, where the law has as of yet failed to keep pace. But just like what happened to the cybersecurity industry, this legal reprieve won’t last forever.
Perhaps we can forgive DFS for its softball audit given that the laws governing fair lending are born of the civil rights movement and have not evolved much in the 50-plus years since inception. The legal precedents were set long before machine learning fairness research really took off. If DFS had been appropriately equipped to deal with the challenge of evaluating the fairness of the Apple Card, they would have used the robust vocabulary for algorithmic assessment that’s blossomed over the last five years.
The DFS report, for instance, makes no mention of measuring “equalized odds,” a notorious line of inquiry first made famous in 2018 by Joy Buolamwini, Timnit Gebru and Deb Raji. Their “Gender Shades” paper proved that facial recognition algorithms guess wrong on dark female faces more often than they do on subjects with lighter skin, and this reasoning holds true for many applications of prediction beyond computer vision alone.
Equalized odds would ask of Apple’s algorithm: Just how often does it predict creditworthiness correctly? How often does it guess wrong? Are there disparities in these error rates among people of different genders, races or disability status? According to Hall, these measurements are important, but simply too new to have been fully codified into the legal system.
If it turns out that Goldman regularly underestimates female applicants in the real world, or assigns interest rates that are higher than Black applicants truly deserve, it’s easy to see how this would harm these underserved populations at national scale.
Financial services’ Catch-22
Modern auditors know that the methods dictated by legal precedent fail to catch nuances in fairness for intersectional combinations within minority categories — a problem that’s exacerbated by the complexity of machine learning models. If you’re Black, a woman and pregnant, for instance, your likelihood of obtaining credit may be lower than the average of the outcomes among each overarching protected category.
These underrepresented groups may never benefit from a holistic audit of the system without special attention paid to their uniqueness, given that the sample size of minorities is by definition a smaller number in the set. This is why modern auditors prefer “fairness through awareness” approaches that allow us to measure results with explicit knowledge of the demographics of the individuals in each group.
But there’s a Catch-22. In financial services and other highly regulated fields, auditors often can’t use “fairness through awareness,” because they may be prevented from collecting sensitive information from the start. The goal of this legal constraint was to prevent lenders from discrimination. In a cruel twist of fate, this gives cover to algorithmic discrimination, giving us our third example of legal insufficiency.
The fact that we can’t collect this information hamstrings our ability to find out how models treat underserved groups. Without it, we might never prove what we know to be true in practice — full-time moms, for instance, will reliably have thinner credit files, because they don’t execute every credit-based purchase under both spousal names. Minority groups may be far more likely to be gig workers, tipped employees or participate in cash-based industries, leading to commonalities among their income profiles that prove less common for the majority.
Importantly, these differences on the applicants’ credit files do not necessarily translate to true financial responsibility or creditworthiness. If it’s your goal to predict creditworthiness accurately, you’d want to know where the method (e.g., a credit score) breaks down.
What this means for businesses using AI
In Apple’s example, it’s worth mentioning a hopeful epilogue to the story where Apple made a consequential update to their credit policy to combat the discrimination that is protected by our antiquated laws. In Apple CEO Tim Cook’s announcement, he was quick to highlight a “lack of fairness in the way the industry [calculates] credit scores.”
Their new policy allows spouses or parents to combine credit files such that the weaker credit file can benefit from the stronger. It’s a great example of a company thinking ahead to steps that may actually reduce the discrimination that exists structurally in our world. In updating their policies, Apple got ahead of the regulation that may come as a result of this inquiry.
This is a strategic advantage for Apple, because NY DFS made exhaustive mention of the insufficiency of current laws governing this space, meaning updates to regulation may be nearer than many think. To quote Superintendent of Financial Services Linda A. Lacewell: “The use of credit scoring in its current form and laws and regulations barring discrimination in lending are in need of strengthening and modernization.” In my own experience working with regulators, this is something today’s authorities are very keen to explore.
I have no doubt that American regulators are working to improve the laws that govern AI, taking advantage of this robust vocabulary for equality in automation and math. The Federal Reserve, OCC, CFPB, FTC and Congress are all eager to address algorithmic discrimination, even if their pace is slow.
In the meantime, we have every reason to believe that algorithmic discrimination is rampant, largely because the industry has also been slow to adopt the language of academia that the last few years have brought. Little excuse remains for enterprises failing to take advantage of this new field of fairness, and to root out the predictive discrimination that is in some ways guaranteed. And the EU agrees, with draft laws that apply specifically to AI that are set to be adopted some time in the next two years.
The field of machine learning fairness has matured quickly, with new techniques discovered every year and myriad tools to help. The field is only now reaching a point where this can be prescribed with some degree of automation. Standards bodies have stepped in to provide guidance to lower the frequency and severity of these issues, even if American law is slow to adopt.
Because whether discrimination by algorithm is intentional, it is illegal. So, anyone using advanced analytics for applications relating to healthcare, housing, hiring, financial services, education or government are likely breaking these laws without knowing it.
Until clearer regulatory guidance becomes available for the myriad applications of AI in sensitive situations, the industry is on its own to figure out which definitions of fairness are best.
Did our AI mess up? Flag the unrelated incidents