RCTs for policymaking: Ethical and methodological considerations

The last decade has seen an increased adoption of randomised controlled trials (RCTs) for answering policy questions in developing countries. RCTs are being preferred over other research methods mainly for their reduced risk of bias. However, multiple researchers have cautioned against the acceptance of this hierarchy in research designs. In this post, Sneha P situates this debate in the Indian context and discusses when RCTs are appropriate for informing decision-making in policy.

The last decade has seen an increased adoption of randomised controlled trials (RCTs)¹ for answering policy questions in developing countries. This includes questions surrounding socio-politically sensitive subjects like selection of political candidates (Casey et al. 2019), sexual safety in partner selection (Angelucci et al. 2016), religious education (Bryan et al. 2020) and most recently, enforcement of payment for utilities in slums (Coville et al. 2020).

Part of this emerging culture of ‘evidence-based policy’ is an implied pecking order in research designs, preferring RCTs over other research methods for their reduced risk of bias.²More recently, ‘RCTs at scale’ are recommended over conventional, smaller RCTs, as they allow for observation over a greater number of variables and can help ascertain 'general-equilibrium effects' (Muralidharan and Neihaus 2017).³

However, multiple researchers have cautioned against the acceptance of this hierarchy (Ravallion 2020) not only for its implications for research publication norms (Jatteau 2017) but also for ethical and methodological considerations and, importantly, policy relevance. In this post, I situate this debate in the Indian context, and discuss when RCTs are appropriate for informing Indian policy decision-making (Menon 2020).

Ethical concerns

Unlike medical RCTs, social science RCTs are often not mandated to have ‘informed consent’⁴ or full disclosure. In a systematic review of economics RCTs, it was found that only 10% of the studies discuss informed consent, and 12% of studies intentionally left participants ignorant. None of the studies discussed ‘participation incentivisation’.⁵ Moreover, 84% of studies conducted in former colonies had authors based in institutions in the US or Western Europe (Hoffmann 2020).

Another ethical principle governing medical RCTs is ‘equipoise’ or the genuine uncertainty about the outcomes of an intervention prior to experimentation. This is often missing in social science RCTs. Consider a study in Delhi, where participants were offered financial incentives to obtain a driving licence as quickly as procedurally possible (Bertrand et al. 2007). The authors found that participants made extra-legal payments to obtain licenses without actually learning how to drive. Arguably, any Indian driver could have predicted this outcome.

RCTs also tend to adopt a ‘data-maximalist approach’ in research design wherein participants are expected to share extensive personal information and – in some cases – bodily samples, in order to explore all possible channels of impact.⁶

In other cases, the human cost of such a time-intensive methodological choice is more serious. In 2016, the state government of Jharkhand mandated Aadhaar⁷-based biometric authentication (ABBA) for receiving public distribution system (PDS) entitlements. In partnership with the government, a large-scale RCT was initiated to evaluate 6-8 months of mandated ABBA in 132 blocks of the state. During this phase, households could not opt out of ABBA, and there were no alternative arrangements for them to access their entitlements.⁸ This is despite there being early journalistic evidence of authentication failures and even instances of starvation. ABBA also does not solve the problem of quantity fraud, which is arguably a more serious issue than identity fraud in the PDS (Drèze 2016).⁹

Ultimately, three years later, the published findings of the RCT were consistent with these early reports that ABBA caused “pain without gain” (Drèze et al. 2017, Muralidharan et al. 2020). The authors estimated exclusion of up to 25,000 beneficiaries in the study blocks over the study period,¹⁰ and describe the challenges of the reconciliation exercise that the state implemented 11 months later to adjust distribution based on the new records of digital transactions.

Clearly, what policies like ABBA need by way of evaluation are shorter pilot studies, continual monitoring, and tight feedback loops into study design¹¹ – rather than a one-time, long-term evaluation.

Methodological concerns

The policy relevance of a research finding in any given context requires a certain degree of generalisability of results. Because RCTs often do not use stratified samples,¹² and report ‘intent-to-treat’¹³ effects, they are less amenable to heterogeneity analysis (comparison across sub-groups) than other methods. This includes understanding 'heterogeneity in selection' and in 'treatment effects' – which are often policy-relevant insights. This is another way of saying, that there is limited understanding of who chooses to receive the programme and why, and how the intervention impacts different groups. Kabeer and Datta (2020) illustrate this with a West Bengal-based livelihoods RCT where a majority of participants that refused treatment belonged to a religious minority. As Barrett and Carter (2010) put it, RCTs have a ‘faux exogeneity’ problem, where a treatment is seemingly exogenous (independent) in implementation but actually agents do not receive it in a uniform manner.

Even when RCT samples are large and variation between groups is reported, we know that these treatment effects are likely to be sensitive to economy- or institution-wide factors (Pritchett and Sandefur 2013), even more than personal characteristics, which is hard to control for. The effects are also likely to vary over time (Rosenzweig and Udry 2019).

Cartwright (2010) exemplifies the danger of transporting policies across context through a ‘simple induction’ with the relatively unsuccessful case of Bangladesh’s Integrated Nutrition Programme (INP) that was modelled on Tamil Nadu’s INP. These are important considerations in the Indian context where transportability, even across states, demands a high degree of generalisability of the results.

Further, RCTs are not passive observational research processes and are often participative/intrusive in the implementation process. For example, researchers are present at the site of implementation, eliminating technology, design and implementation flaws, in an effort to isolate the effect of the intervention on the outcomes of interest. However not only does this introduce Hawthorne effects¹⁴ but in the process, it mutes dynamics – including political and power dynamics – that would otherwise be present in a policy scale-up (Das 2020).

Finally, respondents may dynamically adapt their preferences and choices to an RCT. For example, Das et al. (2013) show that when households anticipate an intervention (in this case an educational grant), they will pre-emptively change their consumption patterns, which obfuscate the observed study effects which can no longer be attributed to the treatment alone.

Considerations for policymakers

Consider environmental conservation, an area that requires urgent policy action and multiple-stakeholder consultation, but also scientific studies. Policy research in such a discipline necessitates ‘methodological hybridity’ (Ali 2020), of which RCTs could be one component. It is important to assess the suitability of RCTs to the policy question and context, before adoption.

Based on all the concerns highlighted above, I compile the following (non-exhaustive) list of considerations for policymakers/practitioners while choosing RCTs:

Is there enough uncertainty about the nature of impact of the treatment on participants to merit an experiment?
If governments/researchers/other stakeholders have prior beliefs on impact, can the research design be informed by these priors? For example,
- If their prior on impact is positive, can treatment be randomly phased-in so that the control group is not deprived of the intervention?
- If their prior is negative, will the study be adapted upon first observation of harm?
- If certain groups are expected to be impacted differently, can the study be stratified to minimise negative effects and maximise positive effects?
Would participants have the opportunity to opt-in and opt-out in an informed manner?
Is data collected non-intrusive, minimal, and restricted to the objective of the study under consideration?
Does the policy question require the investment of a large-scale trial (cost effectiveness)?
Can the study duration meet the urgency of the policy question (time-effectiveness)?
Does the programme or policy require continual monitoring or a one-time evaluation?
What are the limits of the context within which the potential findings of the trial will be held valid?
Will the study design be reviewed with an ethics lens, by those familiar with the context of the study?
Will the administration of the intervention within the study resemble the administration of the intervention in an eventual scale-up effort?
Are there sufficient checkpoints during the study to observe and adapt to unintended/harmful effects?
What other methodologies can be simultaneously adopted to complement and inform the trial?

Observation and participation

Various RCT design improvements and alternatives have been suggested by researchers to overcome some of the concerns discussed above.¹⁵ However, as Drèze (2020) describes it, “good policy requires understanding – not just evidence”. This includes, among other things, “observation, reasoning, theory, tradition and debate”. Indeed, observational surveys like the Public Evaluation of Entitlement Programmes (PEEP) Surveys and Annual Status of Education Report (ASER) surveys, often conducted with minimal resources, have been able to inform reforms in education, employment, and social security programmes.

Another important component of the policy design process is participatory and deliberative dialogue (Rao 2020), given that participants are citizens in a representative democracy with a right to express their policy preferences. For instance, it was interactions with workers and household interviews that were more informative about the serious issue of payment delays than RCTs or other rigorous, impersonal data analysis (Drèze 2020). Qualitative research helps shed light on time-critical issues, cross-context validity (Gisselquist 2020), intangible outcomes (Kabeer & Datta 2020), and causes of programme failure (Rao et al. 2017).

Concluding remarks

It is important to clarify that much of the discussion in this post is applicable to other methodologies of policy research, but often such conditions are exacerbated in RCTs, which are more resource-intensive, time-intensive, and intellectually ambitious than other methods.

For coherence, I list ethical and methodological considerations as distinct concerns, but these are interlinked. For instance, being inconsiderate toward varied effects and/or socio-political dynamics can lead to ethically insensitive design choices. As India sees a greater adoption of evidence-based policy, how such evidence is generated will require careful examination over the coming years.

The author is grateful to Niranjan Rajadhyaksha, Jean Drèze, Vijayendra Rao, Vikram Sinha, Vaidehi Tandel, Chinmaya Kumar, Anirudh Tagat, Ashwin Nair, Ashwin MB, Tanvi Ravel Mehta, Abdul WA Mohammed, and Anmol Somanchi, for their comments and suggestions.

I4I is now on Telegram. Please click here (@Ideas4India) to subscribe to our channel for quick updates on our content.

Notes:

An RCT is a method of experimentation which aims to evaluate an intervention by randomly allocating the intervention or 'treatment' across comparison groups. The 'control' group does not receive the intervention (immediately or ever), or receives a placebo treatment. The control group does not receive the intervention (immediately or ever), or receives a placebo treatment.
For example, see the CART Principles by Innovations for Poverty Action, which advises practitioners that quasi-experimental methods should only be used when an “RCT is not possible” (Cowman et al. 2016).
Put simply, RCTs, when implemented at a larger scale, allows for the intervention to interact with changes in time, geography, respondent type, intervention type, and other such dynamic conditions. This allows researchers to develop a theoretical framework about how all these variables interact and provide insight into the effect of the treatment in 'general' conditions, that is, with fewer restricting assumptions. This is why the authors argue that an RCT at the state level for instance, can discern a wider range of effects of something like private education than an RCT at a district or village level.
Informed consent is the process of providing potential research subjects with all information pertaining to the research project they would be involved in – including risks and benefits to them from the project – for them to be able to make an informed decision about participating.
There is even less discussion about the importance of communicating the results of the study to its participants. Some researchers argue how this should be viewed as a minimum compensation for study participation (Alderman et al. 2013).
For example, see National Science Foundation grant (Award Abstract #1123899) investigating the impact of microfinance in India proposed to collect data on “nutrition, food security, health expenditures, physiological indicators of stress through cortisol measurements in hair samples, and psychological stress measures.” from households in the treatment and control groups.
Aadhaar or Unique Identification (UID) number is a 12-digit identification number linked to an individual’s biometrics (fingerprints, iris, and photographs), issued to Indian residents by the Unique Identification Authority of India (UIDAI) on behalf of the Government of India.
Note that these are legal entitlements under the National Food Security Act, 2013.
Quantity fraud refers to the physical leakages from the PDS such as fraudulent practices by ration dealers while identity fraud refers to problems arising from ghost beneficiaries and beneficiary impersonation.
While Jharkhand state itself revoked the mandatory status of the Aadhaar card in October 2017, as per the study, both the intervention that is ABBA, and the study’s endline surveys continued up to December 2017 (Muralidharan et al. 202 0). In September 2018, the Supreme court ruling restored the state’s ability to mandate Aadhaar for social programmes including PDS.
As we have seen in the COVID-19 context, medical RCTs are often called off at the first sign of harm.
Stratified sampling is a sampling method where the sample is designed to represent multiple sub-groups or ‘strata’.
Intent-to-treat estimates capture the difference in average outcomes between the treatment group (computed over all those that are assigned to the treatment group, irrespective of whether they actually receive the treatment/intervention or not) and the control group (computed over all assigned to the control group that are not meant to receive the treatment/intervention).
Hawthorne effects describe a bias introduced in empirical research as a result of participants being aware that they are being studied and consequently modifying their behaviour, based on the work of Elton May and Fritz Roethlisberger at the Hawthorne plant of the Western Electric Company in the 1920s.
This includes programme-driven iterative adaptation to enable feedback loops (Samji et al. 2018); Experiment-As-Markets, to exploit prior knowledge in welfare maximising way (Narita 2019), Bayesian additive regression Trees (Green and Kern 2012); and recursive partitioning (Athey and Imbens 2016) to study heterogeneity; and multiple statistical surrogates (Athey et al. 2016) to understand long-term effects.