Associated Incidents

We present the results of 7 months of work of Eticas and Ana Bella Foundation with the available data and affected women and other stakeholders. As mentioned above, it is a part of a broader external auditing project where Eticas, in collaboration with other civil society organizations, reverse engineers and assesses the impact of algorithms in different fields. With this External Audit project we aim to develop methodological tools to externally audit automated risk assessment systems in the absence of access to the code, input, output, and administrative data to provide methodological tools to community organizations for externally auditing algorithms with social impact and advocating for policy change. In this way, we seek to support bottom-up algorithmic auditing movements conducted by third-party organizations and end-user groups.
When we started the External Audit of VioGén, we had concerns around transparency, independent oversight, accountability, end-user engagement and the transition to ML. After conducting the audit, we can confirm that:
VioGén is not transparent. We could not access any system data or information beyond what has been produced by experts involved in the definition of the system. Neither external auditors nor women groups have any kind of access to the VioGén data. For a publicly-funded, high-impact system like VioGén, this is unacceptable.
VioGén has not been independently assessed or audited. The publicly available resources and surveys regarding the validity and desirability of VioGén have been conducted by individuals who either work for or have vested interests in the ministry and police forces. External auditors or researchers have no official or public path to access the data, and access seems to be provided by the Ministry at their discretion.
VioGén is not accountable. While the Ministry of the Interior sees VioGén as a recommender system, the high rates of prima facie acceptance of the algorithmic results (95%) points to an automated system, which should be held to further scrutiny as per the Régimen Jurídico de la Función Pública.
VioGén does not engage end-users. In our fieldwork we have found that women and women organizations have never been approached about the system, neither in its design phase nor later on during the different decisions on how to alter the VioGén system. Also, we have found that 80% of the women who have used the system have negative comments about it. They are not informed of what it does or how it works, which leads to distrust.
The VioGén transition to ML raises new questions. Even though the literature explores the process of transitioning to a ML version for VioGén, the nature and extent of the collaboration between SAS and the ministry has not been publicly disclosed. While the lack of a public and open debate on this process would in itself be concerning, the fact that the technical evolution of the system is being decoupled from state of the art research and oversight is bound to lead to further problems.
The auditing process has also allowed us to go beyond our initial concerns to identify new issues that deserve attention.
Firstly, we want to highlight that through this audit, we have found that the VioGén system adapts the clustering of risk assessments to the resources available. This means that the system only gives the number of “extreme” risk scores it can afford, and so funding cuts have a direct and quantifiable impact on the chances that women will receive effective protection after seeking police protection. As the number of VioGén cases is growing each year, there are more women receiving police protection. While in 2015 around 3,000 women received police protection -with medium, high, and extreme risk scores-, in 2021 this number rose up to almost 9,000 women. Yet, there is still a big gap between women who receive police protection over those who do not, despite reporting the case of gender violence to the police. In terms of calibration, we are concerned by the number of cases that the VioGén system “discards” by giving them an “unappreciated” risk score. As it is currently designed, the risk score given by VioGén is not only determined by the objective facts that the questionnaire intends to unearth, but also the overall distribution of gender-violence cases, which is determined by the available resources. Therefore, in 2021, only 1 out of 7 women who reached out to the police for protection actually received it.
This is even more serious if we take into account the barriers we have identified to accessing VioGén, which are one of the reasons why only 21.7% of women victims of domestic violence seek protection. These figures mean that only 3% of the women who are victims of gender violence receive a risk score of “medium” or above and, therefore, effective police protection.
Secondly, we have identified that not having children has a significant negative impact on how extreme risk cases are perceived. Our data analysis shows that women who were killed by their partners and did not have children were systematically assigned lower risk scores than those who did, with a recall difference between groups of 44%.
We would also like to call into question the representativity of the AUC value of the H-scale claimed by the lead researchers of VioGén. While it is true that with the data available the H-scale is capable of identifying extreme risk cases that can lead to homicide, the fact that only 1 in 4 cases of homicide occur after a previous police report indicates how the majority of homicide victims will remain unprotected, even with the deployment of VPR5.0-H. This means that even though VioGén is now better equipped to identify certain cases of extreme risk, most homicide cases will remain unaddressed.
Fourthly, we have observed that VioGén is, in practice, an automated system with minimal and inconsistent human oversight. Police officers only increase the risk observed in 5% of cases, a figure that goes down when they feel overworked. This is highly problematic, as a non-accountable implementation of human oversight (“human in the loop”) can lead to explainability and transparency problems. If police officers do not have clear instructions on when and how to intervene, their role can re-introduce bias into the system, and women may receive different scores depending on who files their case. Assessing the role of human oversight over time should be part of any audit and transparency efforts.
While our sample is not representative of the broader population of victims and lawyers and therefore our findings are not generalizable, our fieldwork raises important questions that need to be studied more systematically, and ideally addressed at the institutional level.