Incident 188: Argentinian City Government Deployed Teenage-Pregnancy Predictive Algorithm Using Invasive Demographic Data
Suggested citation format
About the automatic prediction of teenage pregnancies
After studying the methodology of the artificial intelligence system supposedly capable of predicting teenage pregnancies, mentioned by the Governor of Salta, Juan Manuel Urtubey, we found serious technical and conceptual errors, which question the reported results and compromise the use of said tool, especially all in the case of such a sensitive issue.
On 4/11/2018, in the television program “El Diario de Mariana”, the Governor of Salta, Juan Manuel Urtubey, described an artificial intelligence system supposedly capable of predicting teenage pregnancies:
“Recently we launched a program with the Ministry of Early Childhood […] to prevent adolescent pregnancy using artificial intelligence with a world-renowned software company, which we are carrying out a pilot plan. You can today with the technology you have, you can see, five or six years before, with name and surname and address, which is a girl, a future teenager, who is 86% predestined to have a teenage pregnancy.”
Previously, on 3/20/2018, at the “Microsoft Data & AI Experience 2018” event, Urtubey had already mentioned this topic:
“The examples that you referred to in the case of the prevention of teenage pregnancy and the issue of school dropouts are very clear examples of that. We have clearly defined, with name and surname, 397 cases of children that we know, from a universe of 3000, who inexorably drop out of school. We have 490-odd, almost 500 cases of girls who, we know, we have to go look for today.”
Different journalistic media associated these declarations of the Gdor. Urtubey to a document available on github signed by Facundo Davancens, an employee of Microsoft Argentina. That document ends by thanking "the Ministry of Early Childhood of the Provincial Government of Salta" and "Microsoft".
After carefully studying the methodology detailed in that document, we found serious technical and conceptual errors, which cast doubt on the results reported by the Gdor. Urtubey, and that compromise the use of the generated tool, in an issue as sensitive as teenage pregnancy.
We briefly and colloquially list some of the most serious problems we have encountered:
Problem 1: Artificially oversized results
The study details the following procedure:
Construct a set of statistical rules to try to determine if a teenager will have a pregnancy in the future.
Those rules are built based on known data (the “training data”). So, the statistical rules are made in the image and likeness of the training data.
Once the statistical rules are built, they should be tested using new, unknown data (the “evaluation data”), thus calculating their “accuracy” (how many times it is correct in the predictions).
The problem here is that the evaluation data (in step 3) includes almost identical replicas of many training data. And therefore, the reported results are strongly overstated. It leads to the erroneous conclusion that the prediction system works better than it actually does. (In the annex below we give more details of this problem.)
Problem 2: Potentially skewed data
The other problem, which is key and insurmountable, is that we strongly doubt the reliability of the data used in this study.
Data on adolescent pregnancies have a tendency to be biased or incomplete, due to the fact that they are a sensitive and confidential subject, difficult to access. For example, in many families, teenage pregnancies tend to be hidden, and even clandestinely terminated. Therefore, the data used has the risk of including more adolescent pregnancies from certain sectors of society than from others.
Thus, even if the methodology used to build and evaluate the systems were correct, the statistical rules built on these data would yield erroneous conclusions, which would reflect the distortions in the data.
Problem 3: Inadequate data
The data used was extracted from a survey of adolescents living in the province of Salta containing personal information (age, ethnicity, country of origin, etc), about their environment (number of people with whom they live, if they have hot water in the bathroom, etc) and whether she had completed or was undergoing, at the time of the survey, a pregnancy.
These data are not adequate to answer the question posed: if an adolescent will have a pregnancy in the future (for example, in 5 or 6 six years). For that, it would be necessary to have data collected 5 or 6 years before the pregnancy occurs.
With the current data, in the best of cases, the system could determine if an adolescent has had, or now has, a pregnancy. It is to be expected that the conditions and characteristics of an adolescent would have been very different 5 or 6 years earlier.
Both methodological problems and unreliable data pose the risk of misleading policymakers.
This case is an example of the dangers of using the results of a computer as revealed truth. Artificial intelligence techniques are powerful and demand responsibility from those who employ them. In interdisciplinary fields such as this one, it should not be lost sight of that they are just one more tool, which must be complemented with others, and in no way replace the knowledge or intelligence of an expert, especially in fields that have a direct influence on public health issues. and vulnerable sectors.
Addendum: More details of problem 1
The process used to obtain the reported results is technically incorrect. A basic principle of machine learning is being violated: that the data on which the system is evaluated must be different from the data used to train it. If this principle is violated, that is, if there is contamination of training data in the data on which it is validated, the results will be invalid.
In the system described on github by the author, contamination of evaluation data arises quite subtly. The system uses a method to balance the number of samples of each class called SMOTE. This method generates new "synthetic" samples by replicating the samples of the minority class (at risk of pregnancy, in this case) X times with small variations from the original sample. The problem arises because the author does this data replication before splitting the data into training and evaluation. This division is done randomly, so it is very likely that a sample appears in the training set and some of its replicas appear in the evaluation data. When evaluating on these replicated data, the consequence is that the accuracy value is overstated. Given this problem, it is impossible to know what the true accuracy of this system is.
This can be understood using an example. Suppose that instead of using the characteristics considered in this work (age, neighborhood, ethnicity, country of origin, etc.), we simply use the first and last name of each adolescent. Clearly, a system that had only that information as input would not be able to learn to extrapolate and make decisions on new data. But, in the case of using SMOTE as it has been used, it would be easy to learn to memorize the training data perfectly and then predict with very high accuracy the evaluation data since it would contain replicas of these same names and surnames. In the case that we are studying, the first and last name are not being used as input, but a series of characteristics are used that, if we think about it carefully, allow the same problem to occur. For example, a system that learns that a 16-year-old adolescent, who lives in the El Milagro neighborhood, Creole, without disabilities, of Argentine origin, with hot water in the bathroom and who lives with 4 people where the head of household did not abandon studies have a risk of teenage pregnancy, by evaluating the system with data where almost identical replicas of these characteristics occur, you will be able to predict without problem the class of these replicas. Since, due to the use of SMOTE prior to dividing the data into training and evaluation sets, a high proportion of the minority class samples seen in the evaluation will have been seen during training, this results in an oversized accuracy value. .
Note: It should be noted that at the time of writing this document, others have found and reported a very similar vision, published on the same page where the original description of the prediction system was published. Link: https://github.com/facundod/case-studies/issues/2
A government leader in Argentina hailed the AI, which was fed invasive data about girls. The feminist pushback could inform the future of health tech.
In 2018, while the Argentine Congress was hotly debating whether to decriminalize abortion, the Ministry of Early Childhood in the northern province of Salta and the American tech giant Microsoft presented an algorithmic system to predict teenage pregnancy. They called it the Technology Platform for Social Intervention.
“With technology you can foresee five or six years in advance, with first name, last name, and address, which girl—future teenager—is 86 percent predestined to have an adolescent pregnancy,” Juan Manuel Urtubey, then the governor of the province, proudly declared on national television. The stated goal was to use the algorithm to predict which girls from low-income areas would become pregnant in the next five years. It was never made clear what would happen once a girl or young woman was labeled as “predestined” for motherhood or how this information would help prevent adolescent pregnancy. The social theories informing the AI system, like its algorithms, were opaque.
The system was based on data—including age, ethnicity, country of origin, disability, and whether the subject’s home had hot water in the bathroom—from 200,000 residents in the city of Salta, including 12,000 women and girls between the ages of 10 and 19. Though there is no official documentation, from reviewing media articles and two technical reviews, we know that "territorial agents" visited the houses of the girls and women in question, asked survey questions, took photos, and recorded GPS locations. What did those subjected to this intimate surveillance have in common? They were poor, some were migrants from Bolivia and other countries in South America, and others were from Indigenous Wichí, Qulla, and Guaraní communities.
Although Microsoft spokespersons proudly announced that the technology in Salta was “one of the pioneering cases in the use of AI data” in state programs, it presents little that is new. Instead, it is an extension of a long Argentine tradition: controlling the population through surveillance and force. And the reaction to it shows how grassroots Argentine feminists were able to take on this misuse of artificial intelligence.
In the 19th and early 20th centuries, successive Argentine governments carried out a genocide of Indigenous communities and promoted immigration policies based on ideologies designed to attract European settlement, all in hopes of blanquismo, or “whitening” the country. Over time, a national identity was constructed along social, cultural, and most of all racial lines.
This type of eugenic thinking has a propensity to shapeshift and adapt to new scientific paradigms and political circumstances, according to historian Marisa Miranda, who tracks Argentina’s attempts to control the population through science and technology. Take the case of immigration. Throughout Argentina’s history, opinion has oscillated between celebrating immigration as a means of “improving” the population and considering immigrants to be undesirable and a political threat to be carefully watched and managed.
More recently, the Argentine military dictatorship between 1976 and 1983 controlled the population through systematic political violence. During the dictatorship, women had the “patriotic task” of populating the country, and contraception was prohibited by a 1977 law. The cruelest expression of the dictatorship’s interest in motherhood was the practice of kidnapping pregnant women considered politically subversive. Most women were murdered after giving birth and many of their children were illegally adopted by the military to be raised by “patriotic, Catholic families.”
While Salta’s AI system to “predict pregnancy” was hailed as futuristic, it can only be understood in light of this long history, particularly, in Miranda’s words, the persistent eugenic impulse that always “contains a reference to the future” and assumes that reproduction “should be managed by the powerful.”
Due to the complete lack of national AI regulation, the Technology Platform for Social Intervention was never subject to formal review and no assessment of its impacts on girls and women has been made. There has been no official data published on its accuracy or outcomes. Like most AI systems all over the world, including those used in sensitive contexts, it lacks transparency and accountability.
Though it is unclear whether the technology program was ultimately suspended, everything we know about the system comes from the efforts of feminist activists and journalists who led what amounted to a grassroots audit of a flawed and harmful AI system. By quickly activating a well-oiled machine of community organizing, these activists brought national media attention to how an untested, unregulated technology was being used to violate the rights of girls and women.
“The idea that algorithms can predict teenage pregnancy before it happens is the perfect excuse for anti-women and anti-sexual and reproductive rights activists to declare abortion laws unnecessary,” wrote feminist scholars Paz Peña and Joana Varon at the time. Indeed, it was soon revealed that an Argentine nonprofit called the Conin Foundation, run by doctor Abel Albino, a vocal opponent of abortion rights, was behind the technology, along with Microsoft.
“[The technology program] is a patriarchal contrivance,” said Ana Pérez Declercq, director of the Observatory of Violence Against Women. “It confounds socioeconomic variables to make it seem as if the girl or woman is solely to blame for her situation. It is totally lacking any concern for context. This AI system is one more example of the state's violation of women's rights. Imagine how difficult it would be to refuse to participate in this surveillance.” She added that families depend on the program’s sponsoring agency, the Ministry of Early Childhood, for services like vaccinations and free milk. In a country that ended 2021 with half its population living in poverty, this is crucial support that vulnerable girls and women can’t afford to risk by speaking out.
The Applied Artificial Intelligence Laboratory at the University of Buenos Aires highlighted the platform’s serious technical and design errors and challenged the developers' claims that the model made “correct predictions 98.2 percent of the time.” Technical reviews were based on incomplete information because the system lacked transparency. Nevertheless it was revealed that the system database included ethnic and socioeconomic data, but included nothing about access to sex education or contraception, which public health efforts worldwide recognize as the most important tools in reducing rates of teenage pregnancy. “Methodological problems such as the unreliability of the data pose the risk of leading policy makers to take misguided actions,” said Diego Fernandez Slezak, director of the lab.
While Salta’s plan to predict pregnancy was publicly critiqued by academics and journalists, feminist activists used this media attention to enforce a measure of public accountability, even in face of a complete lack of AI regulation by the state. This effective resistance to the AI system was possible because Argentine feminists had already built a powerful social movement.
Everything about this is a nightmare.
It’s a nightmarish mix of Big Tech overreach and state authoritarianism.
The Argentinian province of Salta approved the development of a Microsoft algorithm in 2o18 that allegedly could determine which low-income “future teens” would be likely to get pregnant, a shocking investigation by Wired reveals.
The algorithm — which Microsoft called “one of the pioneering cases in the use of AI data” — used demographic data including age, ethnicity, disability, country of origin, and whether or not their home had hot water in its bathroom, to determine which of the women and girls living in a small Argentinian town were “predestined” for motherhood.
The opaque program, which was celebrated on national television by then-governor Juan Manuel Urtubey, was offered to the province by Microsoft in 2018, at the same time Argentina’s Congress was debating whether to decriminalize abortion, Wired notes.
The magazine’s reporting found that the women and girls Microsoft’s algorithm identified as would-be teen moms were often disenfranchised in various ways, from having poor backgrounds and migrant families to indigenous heritage.
The algorithm, known as the Technology Platform for Social Intervention, is noteworthy due to the fact that an American company like Microsoft chose to deploy such a program in a country with a long history of surveillance and population control measures.
Those leanings are apparent in the lack of transparency surrounding the program. For one, the Argentinian government never formally assessed the algorithm’s impact on girls and women.
Worse yet, according to Wired, the program involved the deployment of “territorial agents” who surveyed those identified by the AI as being predestined for pregnancy, took photos of them, and even recorded their GPS locations.
It’s still unclear what the provincial or national governments did with the data and how — or if — they related to the abortion debate.
In 2020, Argentina voted to decriminalize abortion, a historic moment for the South American nation — but the program’s very existence should be cause for concern.
The report should serve as a warning of the potentially dangerous intersection between American AI tech and authoritarianism, and offers a much-needed reminder that we have, for the time being, much less to fear from the algorithms themselves than from the humans behind them.
In one of the most WTF uses of artificial technology yet, Microsoft has created one of the most bizarre algorithms ever. In an effort to show off its advancements in AI, the computer software company has crafted an AI to predict the impregnation of teenage girls. Yeah, bet you wish you never read that.
Microsoft AI predicts specific teenage pregnancy
Via Wired, Microsoft presented an AI algorithm to the Argentinian Congress back in 2018 that could predict teenage pregnancies. Presented during a period when the government was debating the decriminalisation of abortion, the tech giant created an AI forged in the fires of dystopia.
The algorithm was developed to predict the lines of lower-income “future teens”. Microsoft’s AI would take the name and addresses of preteen girls and predict the next “five or six years” of their lives.
Microsoft's data would be used to determine which girls were “86% predestined to have an adolescent pregnancy”. The AI’s database was built on the data of “200,000 residents in the city of Salta, including 12,000 women and girls between the ages of 10 and 19.”
Wired reports that the tech giant sent “territorial agents” to citizens’ houses to question them, including young girls. These agents asked questions, took photos and recorded GPS locations of the participants.
The surveys consisted of low-income families in Argentina. Additionally, a large section of the database consisted of migrant families that moved to the region from places such as South America.
This is viewed as a success
The Microsoft pregnancy prediction algorithm is viewed as a success by the company. According to the report, spokespeople for the company claimed that the Argentinian project was “one of the pioneering cases in the use of AI data”.
That may indeed be the case. However, it's also an example of AI algorithms being used in an incredibly creepy and dangerous way. Additionally, talk of this algorithm has been kept on the down low, likely because its an off-putting foray into eugenics for the Big Tech company.
Even now, years after its inception, there's no word on whether or not the project has been terminated. Additionally, there's no data on what the Argentinian government is planning to do with the girls that have been marked for “predestined” teenage pregnancies.
Only one update has happened since the use of this algorithm: abortion has been decriminalised in Argentina. This means that those who do end up facing teenage pregnancy have a way out if they choose to. However, we don't know if Microsoft's systems affected this change or not.