Harnessing machine learning to combat domestic violence in India

One in three women globally will endure a form of domestic abuse in their lifetimes – with the prevalence being even higher in South Asian countries such as India. Given the complexity of identifying and addressing intimate partner violence, Sneha Shashidhara discusses the application of machine learning techniques in this context. While recognising the potential of this technology, she also highlights ethical concerns and the need for clear guidelines and safeguards.

Domestic violence (DV) is a widespread issue that affects communities all over the world, causing immense suffering for victims who often feel isolated. The World Health Organization (WHO) reports that one in three women globally will experience some form of domestic abuse in their lives. This issue is even more common in South Asian countries like India, where gender norms and cultural practices often reinforce the subordination of women from a young age.

In India, traditional gender roles and a deeply patriarchal society, combined with the societal preference for male children, create a toxic environment where women are systematically marginalised and disempowered. Cultural norms such as dowry and child marriage further exacerbate the lack of agency among women, perpetuating cycles of abuse and inequality. The staggering statistics from the National Family Health Survey (NFHS), 2015-16, reveal that almost 30% of Indian women aged 15-49 have experienced DV in just one year, highlighting the urgent need for effective interventions.

The power of machine learning

Understanding that intimate partner violence (IPV) is complex and hard to tackle, researchers are exploring new solutions like machine learning (ML). ML uses large amounts of data to create computer models to find patterns or predict future events. These techniques are useful for identifying risk factors for DV and determining how likely someone is to be a victim or perpetrator. Studies have shown that ML is better at predicting risky behaviour because it can analyse multiple factors at once and see how they interact, even in complicated ways (Simonian et al. 2019, Sundström and Schön 2020). Unlike traditional methods, ML does not need the data to follow strict rules. For example, ML can look at various factors like socioeconomic status, past violence, substance abuse, and social interactions to predict the risk of future IPV. This allows for more accurate identification of high-risk individuals or situations, leading to better-targeted interventions and prevention strategies.

However, models trained with a small dataset can be unreliable and may not work well in different situations. To make them more reliable, it is important to keep the training data separate from the testing data. Cross-validation is a method that improves accuracy by training the model multiple times with different data splits and then averaging the results. An ideal accuracy score is 1.0, and anything above 0.5 means the model is better than random guessing.

Machine learning in the context of intimate partner violence

Studies such as by Capaldi et al. (2012) have sorted the risk factors of intimate partner violence IPV into four main categories: individual traits (like age¹, history of abuse, and mental health), relationship dynamics (such as infidelity), community factors (like poverty), and societal (legal framework, gender and social norms). Amusa et al. (2020) used ML to predict factors contributing to IPV by interviewing 1,816 married women in South Africa. They found that fear of the husband was the strongest indicator of experiencing IPV. McDougal et al. (2021) employed ML to uncover new factors affecting Marital Sexual Violence (MSV) in India. Their ‘neural network model’ pointed to elements like exposure to violence, sexual behaviour, decision-making power, and socioeconomic status.

We developed machine learning models to predict IPV using data from 66,013 women in the National Family Health Survey (NFHS), 2015-16 (Shashidhara et al. 2024). Our model correctly identifies 78% of actual IPV cases in the test sample. One of our models focuses only on less sensitive questions, excluding questions like the woman's perception of her husband’s attitudes and behaviours, which is likely to result in more honest answers. This model could be used by field health workers to identify women who are at higher risk of experiencing IPV.

Our model finds that the strongest predictors of IPV are the husband's alcohol use and a history of violence in the wife’s family, where her mother also experienced IPV. Alcohol impairs judgment and self-control, increases financial stress, and makes peaceful conflict resolution less likely. Additionally, women who believe it is acceptable for their partner to be violent in certain situations, like neglecting children, arguing, cooking mistakes, or leaving the house, are more likely to experience IPV.

Signs that the husband is controlling, like controlling money, keeping track of where the wife goes, or limiting her friends, often mean violence could happen. This controlling behaviour might be to make the wife feel like she has no freedom or cannot exit the relationship. It could also be because the husband thinks the wife is cheating. Other things like being poor, having less education, or living in hot places also make it more likely for violence to happen.

Ethical considerations and safeguards

While ML shows promise in predicting and preventing IPV, it also brings up important ethical concerns. A key issue is the risk of unfairly targeting men labeled as high-risk for ‘pre-crime’ actions, which could lead to unintended consequences like stigmatisation. It is crucial to ensure that using ML in this way does not violate individual rights or result in discrimination.

Additionally, follow-up monitoring for high-risk households needs to be handled with care. Intrusive actions could unintentionally harm women by causing suspicion or conflict at home. Instead, interventions should focus on supporting communities as a whole, creating an atmosphere of support and awareness rather than surveillance.

It is crucial to create clear rules and protections to handle these ethical concerns. This includes keeping data private, getting informed consent, and reducing bias in algorithms². We should also focus on community-based efforts, like education, support services, and economic programmes, to address the root causes of IPV.

Summing up

Even with progress, challenges still exist in predicting and addressing IPV because of underreporting and the stigma victims face. Studies show that many victims stay silent out of fear of retaliation or being socially shunned, which keeps a culture of silence and impunity alive. The Covid-19 pandemic has made this worse, causing a global rise in domestic violence cases.

ML techniques should be seen as one part of a broader approach to tackling IPV. It is important to develop strategies that address systemic inequalities, challenge patriarchal norms, and offer support services for victims. Prioritising ethical issues like data privacy and reducing algorithm bias is also key to ensuring these technologies benefit everyone.

The views expressed in this post are solely those of the authors, and do not necessarily reflect those of the I4I Editorial Board.

I4I is now on Substack. Please click here (@Ideas for India) to subscribe to our channel for quick updates on our content

Notes:

Research suggests that younger adults, particularly those in their late teens to early 30s, are more likely to be involved in IPV as perpetrators – relative to older adults.
In the context of ML models predicting IPV, algorithm bias refers to the systematic errors or inaccuracies that can occur when an ML model produces unfair outcomes, often reflecting or exacerbating existing societal biases. This bias can arise from several sources, including biased or unrepresentative training data, the choice of features used in the model, or the way the algorithm processes the data. For example, if the training data used to develop an ML model is skewed toward a specific demographic (such as more data from certain age groups, genders, or socioeconomic backgrounds), the model may disproportionately predict IPV risk for that demographic, potentially leading to unfair or inaccurate risk assessments for other groups. This can result in both false positives (incorrectly identifying someone as high-risk for IPV) and false negatives (failing to identify someone who is at risk), with significant ethical and practical consequences.