On the perils of embedded experiments

There is growing interest in ‘embedded experiments’, conducted by researchers and policymakers as a team. Aside from their potential scale, the main attraction of these experiments is that they seem to facilitate speedy translation of research into policy. Discussing a case study from Bihar, Jean Drèze argues that this approach carries a danger of distorting both policy and research.

Evidence-based policy is the rage, to the extent that even village folk in Jharkhand (where I live) sometimes hold forth about the importance of ‘ebhidens’, as they call it. No one, of course, would deny the value of bringing evidence to bear on public policy, as long as evidence is understood in a broad sense and does not become the sole arbiter of decision-making. However, sometimes evidence-based policy gets reduced to an odd method that consists of using randomised controlled trials (RCTs) to find out ‘what works’, and then ‘scale up’ whatever works. That makes short shrift of the long bridge that separates evidence from policy. Sound policy requires not only evidence – broadly understood – but also a good understanding of the issues, considered value judgements, and inclusive deliberation (Drèze 2018a, 2020a).

Enormous energy has been spent on the quest for rigorous evidence, much less on the integrity of the process that leads from evidence to policy. As illustrated in an earlier contribution to Ideas for India (Drèze et al. 2020), it is not uncommon for the scientific findings of an RCT to be embellished in the process. This follow-up post presents another case study that may help to convey the problem. It also illustrates a related danger – casual jumps from evidence to policy advice. The risk of a short-circuit is particularly serious in ‘embedded experiments’, where the research team works ‘from within’ a partner government in direct collaboration with policymakers.

The case study pertains to an experiment conducted in Bihar in 2012-2013 and reported in Banerjee, Duflo, Imbert, Mathew and Pande (2020)¹. This is a large-scale, influential experiment by some of the leading lights of the RCT movement – indeed, a formidable quartet of first-rate economists reinforced by one of India’s brightest civil servants, Santhosh Mathew. The high technical standards of the study are not in doubt, and nor is the integrity of the authors. And yet, I would argue that something is amiss in their accounts of the findings and policy implications of this study.

An experiment backfires

Briefly, the said experiment consists of testing a new financial management system for India’s Mahatma Gandhi National Rural Employment Guarantee Act (MNREGA)², optimistically called ‘just-in-time’ financing. In the old system, gram panchayats (village councils) received advance funds from the state government for MNREGA work, replenished as per requests routed via block and district offices. In the new system, work comes first, and then the requisite funds are transferred (from the state government to gram panchayat, as before) in response to direct electronic invoices³. This innovation is presented as an application of ‘e-governance’. Henceforth, I shall refer to it as the ‘Bihar experimental transfer system’ (BETS or BET system, interchangeably).

The stated purpose of this new system was to boost MNREGA spending in Bihar, where baseline expenditure was very low. As the abstract of the project’s registration document puts it: “This project evaluates an innovative policy intervention which aims at enhancing supply of public employment in Bihar…If the intervention successfully increases spending⁴, we will use it as a way to evaluate the impact of NREGA on people's lives”⁵. The intervention, however, had the opposite effect: MNREGA spending declined by 24% in the ‘treatment’ areas (subject to intervention), compared with ‘control’ areas (no intervention). It also led to much longer delays in wage payments: an extra 47 days in treatment areas during the intervention period, in addition to the 64 days in control areas (Banerjee et al. 2020, Table 5)⁶. One hundred and eleven days must feel like an eternity to people who have no other means of subsistence than their meagre wages.

The project ran into rough weather from the word go⁷. The two-month ‘setup period’ was a little chaotic, as anyone familiar with the ways of the Bihar government would expect. For the first four months of the “intervention period” (September-December 2012), there was virtually no treatment – the BET system was more or less idle, because of a central-government freeze on the release of MNREGA funds to Bihar. After that, the treatment kicked in, but hurdles persisted. The district coordinators were “often ineffective”. A critical software patch did not materialise (due to “lack of coordination” between two ministries), forcing BETS to fall back on a cumbersome system of double data entry. Banks “could not keep pace” with BETS’ equally cumbersome system of piecemeal payment advices. There were “repeated complaints from district officials about the delays caused by the new financial system”. The experiment was called off by the Bihar government after just three months of active treatment (January-March 2013) – a minuscule window to evaluate this sort of intervention⁸.

This experiment is best seen in the context of the history of MNREGA wage payments, and also of the recent drive towards e-governance of social programmes in India.

The quest for timely payments

The history of MNREGA wage payments is a sad song. MNREGA workers have a legal right to payment within 15 days, but in practice, wages are routinely delayed for weeks or months. This has been the bane of the programme for more than 10 years – ever since cash payments were replaced with bank payments to workers’ accounts. Aside from causing serious hardships to MNREGA workers, payment delays have a destructive effect on the entire programme, because they sap workers’ interest in it. The good health of MNREGA rests on the active participation of workers at every stage – project selection, work applications, asset creation, social audits, and more. When workers lose interest, MNREGA loses its sharp edge.

One state that had been relatively successful, by 2012, in ensuring timely wage payments was Andhra Pradesh (AP)⁹. The state had fixed timelines for every step of the payment process, and the delays were tightly monitored. In this as in many other domains, AP had also been at the forefront of e-governance initiatives. From 2009 onwards, it started deploying a payment system called “electronic fund management system” (e-FMS). This was done step by step over a period of time – according to one knowledgeable observer, “it took nearly two years for AP to streamline the e-FMS”. By 2012, the e-FMS was considered a success and steps to extend the system to the whole of India had been initiated by the Ministry of Rural Development (MoRD) at the Centre. The e-FMS was a more advanced system than BETS – it involved direct payments to workers’ accounts (not gram panchayat accounts), avoided BETS’ onerous ‘double data-entry’ burden, and covered material as well as labour expenses.

We shall return to the significance of AP’s experience. Meanwhile, it is worth noting that while the deployment of the e-FMS in AP can probably be regarded as a case of successful e-governance, the recent history of e-governance and digital payments in India also includes many problematic or even counter-productive initiatives¹⁰. A lot depends on the context, including the level of digital preparedness – that is where AP had a big head-start.

All this is better understood today than it was in 2012, when the BETS experiment began. But enough was already known at that time to anticipate that unrolling this new system in a tearing hurry in a poorly prepared state like Bihar (with a set-up period of just two months) might lead to a jam in wage payments. Indeed, the additional delays were huge, whether one looks at data from MNREGA’s “management and information system” (MIS) or from the authors’ household survey (Banerjee et al. 2020, Tables 4 and 5). In terms of statistical significance, too, delays in wage payments dwarfed other treatment effects throughout the intervention period. This was a veritable elephant in the room.

Reduced leakages?

The authors, however, pay sporadic attention to this elephant. To their credit, they are upfront about the payment delays, but they do not pursue the matter very far, nor do they mention that workers have a legal right to payment within 15 days. Instead, they follow a different trail – the reduction of MNREGA expenditure in treatment areas, they argue, reflects a reduction in “leakages”. From this point of view, it may seem, the intervention did quite well.

The authors invoke three smoking guns in favour of the “reduced leakages” hypothesis. As the abstract of the paper succinctly puts it: “program expenditures dropped by 24 percent, while employment slightly increased; there were fewer fake households in the official database; and program officials’ personal wealth fell by 10 percent”. Three smoking guns is certainly a lot of smoke, if not quite enough perhaps to claim that “[w]e looked for evidence of a reduction in corruption, and we found plenty” (Duflo 2017, page 6).

I believe that there are other possible explanations for the reduction of MNREGA expenditure in treatment areas (including – yes – delays in wage payments) but let us leave that aside. More importantly, it seems to me that even if there was some reduction in leakages during the intervention period, it may or may not be due to the treatment. Indeed, as mentioned earlier, there was virtually no treatment during the first four months of the seven-month “intervention period”¹¹. Interestingly, the estimated effects of the intervention were very similar in both phases – before and after the treatment really kicked in (Banerjee et al. 2020, Figure 2 and Tables 2-5). Much was happening in the treatment areas other than the treatment: training, monitoring, field visits, damage control, and a major investment in material and human resources (“including computers, data entry operators, generators to ensure power supply, internet access, scanners, and printers” down to the gram panchayat level)¹². Differences in outcomes between control and treatment areas reflect all aspects of the intervention, but it is the treatment – the BET system – that gets credit for them. To be more specific, it is quite possible that the ‘buzz’ created by the intervention in treatment areas had a chilling effect on the crooks, unrelated to the new payment system. That would explain all the smoking guns, and the fact that the intervention had similar effects with and without the treatment.

Having said this, it is possible that the BET system did help to reduce corruption. Let us assume that this diagnosis is correct. A plausible conclusion would be as follows: “The study produced some evidence of a reduction in leakages in the intervention period. However, the new system also led to large delays in wage payments. Bihar, it appears, is not ready for it, and the system should not be scaled up until the payment issues are resolved. Meanwhile, workers should be compensated for the delays, as prescribed under the law.” Instead, the apparent decline in leakages became the main focus and message of the paper.

Perhaps the problem of enhanced delays would have faded away over time, as the BET system improved. We shall never know, since the experiment was discontinued after just three months of active treatment. There was no support for it, as the BET system had “failed to create any winners” (Banerjee et al. 2020, page 66), except possibly the central government in so far as it saved money. Block and district officials actively opposed the new system, arguing that it was “destroying” MNREGA (page 42). According to the authors, this opposition was self-serving: the said officials “had obvious reasons to resent the program” (page 42), that is, they were corrupt functionaries trying to protect their gravy. Could it be, however, that some of them – corrupt or not – actually had a point? MNREGA functionaries in India, even the corrupt ones, have invaluable knowledge of the nuts and bolts of the programme. There is no indication that they received a fair hearing in this situation.

The dissemination accounts

So far, I have focussed on the academic account of the experiment (Banerjee et al. 2020). That account is very thorough, precise, and nuanced. However, troubling issues arise when we consider summary accounts of the findings presented in various forums for dissemination purposes – let us call them “dissemination accounts”. In many of these accounts, the delays in wage payments are not mentioned. This is the case, for instance, with Esther Duflo’s glowing account of the experiment in her influential Richard T. Ely lecture (Duflo 2017, pages 5-7) as well as with Abhijit Banerjee’s oral presentation of it at the Center for Global Development (CGD) on 6 December 2016, posted on the J-PAL (Abdul Latif Jameel Poverty Action Lab) website¹³. Santhosh Mathew’s follow-up remarks at the same event, and his own summary of the experiment (in Mathew and Goswami 2016), are also silent on payment delays. So is Clément Imbert’s three-minute digest of the study on YouTube, modestly called “How to cut corruption in India in one simple move” by the hosts.

The summary of the experiment on the J-PAL website is also misleading. It says: “After a year under the new system [!], the evaluation found significant impacts along two key dimensions” – these are “reduced corruption” and “more efficient fund distribution”¹⁴. Based on this, the reform is applauded as a “success” [sic]. Only those who care to click on a follow-up link for “more details” get to learn, way down, about the payment delays. When delays are finally mentioned, an impression is created that it was a temporary problem “during initial implementation stages” – in fact, it persisted throughout the intervention period. Another gushing J-PAL summary of the experiment (Walsh and Carter 2018) is completely silent on payment delays.

From evidence to policy

In some dissemination accounts of the BETS experiment, it is also stated that the treatment was smoothly “scaled up” at the national level. As the J-PAL website puts it: “Influenced by the Bihar study, the Government of India asked all states to shift to a similar fund-flow system for MGNREGS wage payments.”

This statement is intriguing at first. As correctly noted in Banerjee et al. (2020), it is the e-FMS, not BETS, that was gradually extended across the country between 2012 and 2015¹⁵. Remember, e-FMS had been patiently tested in AP from 2009 onwards, and active steps for the national deployment had already been taken by mid-2012, before the Bihar experiment began.

On closer examination, the basis of the scale-up statement is the fact that the BETS study was cited in a MoRD note to the Cabinet, dated June 2015, where the Ministry proposes a new payment system for MNREGA (Duflo 2017, footnote 9; Banerjee et al. 2020, page 42). This new payment system is based on a more centralised e-FMS platform (known as National e-FMS or Ne-FMS) and the so-called direct benefit transfer (DBT) protocol¹⁶. Interestingly, the main argument of the Cabinet Note for this new system is that it will help to ensure timely wage payments. But the note also refers to other possible benefits of “e-governance based fund flows”, including reduced leakages, and it does cite the BETS study in that context. Assuming that the Ne-FMS-cum-DBT system can be regarded as a scale-up of the BET system in some sense, one might accept that “the experiment was actually instrumental in getting this program scaled up across all states” (Duflo 2017, page 7)¹⁷.

This view, however, raises an interesting question. Why would the scale-up of BETS be a good thing, as Duflo implies, let alone a “policy success”, as the J-PAL website puts it? Even if we accept that BETS helped to reduce corruption, the fact remains that it also led to longer delays in wage payments. There is a bit of a trade-off here. Perhaps the assumption is that the delays were just a kind of ‘teething problem’, but the study provides no evidence of that. This is a good example of casual jump from evidence to policy advice (Drèze 2018a).

J-PAL is explicitly committed to this sort of long jump, or ‘research translation efforts’ as the website puts it (J-PAL, 2018). In this case, we are told, these efforts included posting and paying a full-time “Policy Consultant” in the Ministry of Rural Development (J-PAL, 2021). The consultant’s job included “writing policy memos and coordinating meetings to generate buy-in for the scale-up”. One really wonders how this consultant ‘translated’ the factual evidence presented in the BETS study into policy prescriptions.

As it turned out, the transition to Ne-FMS-cum-DBT from 2015 onwards caused enormous confusion for years and led to a new generation of payment problems – not just continued delays but also rejected payments, diverted payments, and blocked payments¹⁸. Rejected payments alone added up to a staggering 16% of MNREGA wage payments in 2016-2017, according to the MoRD itself¹⁹. In the last few years, many MNREGA workers did not receive their wages at all due to technical complications. Some of the glitches are truly bizarre, like MNREGA wages being redirected to Airtel wallets that workers know nothing about (Drèze 2018c). New forms of corruption – what might be called ‘e-corruption’ – have also flourished in this period. All this may (or may not) get sorted out in due course, but as of now, there is little reason to celebrate the scale-up of the BET system, if it happened at all.

A missed opportunity

None of this is to deny that there is much to learn from the BETS study. Serious attention to the payment delays, however, would have made it even more enlightening. Indeed, the experiment was an opportunity to draw attention to the dangers of hasty rollout of new technologies in fragile settings. That would have been a very salutary warning, around the beginning of a long period of repeated rejigging of MNREGA payment technologies at the cost of MNREGA workers. MNREGA wage payments have gone through a whole series of new systems over the years, but the government is still unequal to the simple task of ensuring payment within 15 days, as prescribed under the Act. The latest innovations, notably the Aadhaar Payment Bridge System, have led to some improvements, but also to many new problems²⁰. The problems may be temporary, but as each new system creates its own temporary problems, lasting for years in many cases, MNREGA workers’ right to timely payment ends up being persistently denied.

Concluding remarks

The picture that emerges from this case study is a little unsettling. A counter-productive experiment, aborted within a few months, ended up being perceived as a model exercise in evidence-based policy.

Let me end with some general remarks, drawing not only on this case study but also on the companion study (Drèze et al. 2020). If you had the patience to read both, perhaps you noticed some parallels between the two focus experiments – one on the public distribution system (PDS) in Jharkhand (Muralidharan et al. 2020), the other on MNREGA in Bihar (Banerjee et al. 2020). In both cases, the study is very thorough, but the findings were misrepresented in some of the dissemination accounts. And in both cases, the dissemination accounts lean in the same direction: playing up the anti-corruption effects of the treatment and underplaying the adverse effects on the intended beneficiaries – PDS cardholders in one case, and MNREGA workers on the other.

‘Do no harm’ is often (not always) considered as an important ethical principle of RCTs. Both experiments adversely affected a significant number of poor people. In the BETS experiment, the harm was fairly predictable: the authors themselves explain why one might expect the new system to lead to longer payment delays (Banerjee et al. 2020, page 47), and an average MNREGA worker would have seen it coming too. One possible defence of these experiments is that the respective governments were planning the intervention anyway, and that the research team was just evaluating it. If an intervention has harmful effects, we might as well get to know about it. This defence, however, assumes a clear separation of responsibilities between the government and the research team. In at least one case (the Bihar RCT), the distinction was more than a little blurred²¹.

Both experiments belong to a genre of RCTs that may be called ‘embedded experiments’ – at-scale experiments conducted by a research team that attaches itself to a government department. Without denying their possible value, it seems to me that these experiments are fraught with dangers. These include imbibing the perspective of the partner government, distorting its priorities, diverting senior officials from their primary responsibilities, conceding some control over research findings, exercising power without accountability, overestimating government capacity, underestimating State oppression, inappropriate randomisation, violation of consent norms, misuse of privileged access, conflicts of interest, revolving-door effects, distracting incentives, and contamination. It would be good to see more engagement with these issues in the slim literature on the ethics of RCTs in development economics²².

Just to pursue one of these issues briefly, informed consent is obviously hard to ensure in embedded experiments at scale. The registration document of the Bihar RCT asserts that “informed consent will be ensured in all cases”, but that presumably referred (implicitly) to the small minority of households being surveyed, not to all those affected by the BET system. Informed consent was impractical in this case, but then it is all the more important to be confident that the treatment will do no harm. Randomising a variable (like a wage payment system) that has the potential to disrupt people’s livelihood or well-being, if it can be justified at all, requires the strongest possible safeguards.

The perils of embedded experiments are particularly serious when the research team is dealing with corruption-ridden governments like those of Bihar and Jharkhand. No doubt senior officials of these governments know how to make a good impression on visiting researchers, and some of them (including Santhosh Mathew) have a genuine commitment to better governance. But the general attitude of government officials in Bihar and Jharkhand is not particularly kind to the underprivileged. This amplifies the hazards of embedded experiments, including the risk of doing harm.

Embedded experiments also blur the distinction between evidence and policy (one a scientific matter, the other a political issue). When a research team attaches itself to a government department, it is likely to be involved in or at least consulted on policy matters from time to time. That may seem like a good thing to someone who believes that economists are well placed to fix public policies across the table with friendly bureaucrats “before political constituencies have calcified around them” (Muralidharan and Niehaus 2017). Policy advice, however, is a very different game from economic research, and economists have no obvious gift for it on their own.

Embedded experiments, and embedded research in general, are likely to proliferate in the near future. On one side, researchers are under growing pressure from funding agencies to demonstrate ‘policy impact’ (not a very health trend in my view). On the other, policymakers have their own reasons to welcome resourceful research teams and feel increasingly free to do so. The bells are ringing for a fertile marriage of convenience. This does not preclude constructive partnerships, but it does call for some hard thinking about possible safeguards for embedded experiments.

As a starter, let me propose a few basic safeguards. First, an embedded experiment should not involve any personal rewards for civil servants who participate in it. Second, the findings of an embedded experiment that fails to run its full course should be taken with a pinch of salt. Third, at-scale experiments should be preceded with some sort of safety trial. Fourth, responsibility for compensation in the event of harm being done should be clarified in advance. Finally, the findings of an embedded experiment should not be translated into policy without inclusive consultation.

Now let me rest my case and invite the authors – all friends of mine – to fortify the principles of this approach. If I have missed something, I shall be happy to stand corrected. Hopefully, these case studies will help to pose useful questions at least – some of these issues have been left unattended for too long.

The author is grateful to Clément Imbert for detailed clarifications, and to Harold Alderman, Amit Basole, Christopher Barrett, Diane Coffey, Angus Deaton, Nikhil Dey, Swati Dhingra, Reetika Khera, Ashok Kotwal, J.V. Meenakshi, K. Raju, Martin Ravallion, Anmol Somanchi, and Ankur Sarin for helpful advice.

Notes:

Earlier versions of that paper have been in the public domain since 2014 if not earlier. This case study is based on the final version, leaving out details that are not essential for my purpose.
MNREGA provides a guarantee of employment on local public works to any adult in rural areas who is willing to do manual work at the prescribed wage, up to 100 days per household per year.
For further details, see Banerjee et al. (2020), pages 44-46. Strictly speaking, all this applies to the wage component of MNREGA expenditure. The material component is of no interest in this case study.
Italics added.
Further on, the abstract mentions that “the intervention could also reduce corruption” and that the authors propose to study its “potential impacts on ghost workers and other measures of corruption”.
The figures are from household survey data. The corresponding figures from MNREGA’s public data portal are 20 days and 53 days. The portal, however, understates payment delays (Narayanan et al. 2019).
The facts and quotes in this paragraph are from Banerjee et al. (2015), pages 16-19.
According to the project’s registration document, the “trial” (possibly meaning the entire project) was initially due to continue until June 2014.
See for example, Chopra and Khera (2012). The authors find that in the first six months of 2012, 85% of MNREGA wage payments in AP were made within 14 days – a timeline that would be considered fabulous today.
For some examples, see Dutta (2016, 2018), Aggarwal (2017b), Drèze et al. (2017), J-PAL (2017), Malhotra and Somanchi (2018), Mohan (2018), Dhorajiwala et al. (2019), Drèze et al.(2020), and LibTech India (2020, 2021).
To be more precise: By the end of that four-month period, only 20% of the gram panchayats in treatment areas had made any use of the BET system (Banerjee et al. 2020, page 51). The authors themselves consider that, for practical purposes, “the intervention was not implemented in its initial phase” (Banerjee et al. 2015, page 17).
Banerjee et al. (2020), p. 50; see also Banerjee et al. (2015), pages 16-19.
All the authors except Santhosh Mathew are J-PAL affiliates.
The expression “a year under the new system” is baffling. Perhaps it refers to the entire project period, including setup (July-August 2012) and post-intervention survey (April-June 2013).
Banerjee et al. (2020), pages 66-67. By June 2015, 92% of gram panchayats in the country were covered by e-FMS (Government of India, 2015).
For an outline of this extraordinarily complicated system, see Government of India (2017).
GiveWell (2019) rejected that assumption in its evaluation of a J-PAL multi-million-dollar grant application that presents the Bihar RCT as a success story and the Cabinet Note as evidence of its policy impact. Unconvinced, GiveWell advised restricting the grant to one million dollars.
See Drèze (2018b, 2020b); also Aggarwal (2017a, 2017b), Dhorajiwala et al. (2019), Dutta (2019), Johri (2019), Narayanan et al. (2019), and LibTech India (2021), among others.
Press release, 13 July 2018 (see also Drèze 2018d). The proportion of rejected payments declined later on, but it remains a major problem to this day; see for instance, LibTech India (2021).
See the literature cited earlier (note 18); and also Drèze and Khera (2021) and further reports cited there.
The symbiotic relationship between the government and the research team was well-conveyed by Santhosh Mathew in his CGD presentation, mentioned earlier: “… they [J-PAL] have a dedicated team whose only job is to help people like me to push – so when we need a new document, it’s redone, when I need to make a presentation, it’s they who do it for me, when I need to come and talk to you, they are the ones who prime me as to which way I should spin the ball”. Santhosh Mathew was Principal Secretary in Bihar’s Department of Rural Development when the BETS experiment began, and Joint Secretary in the MoRD at the Centre at the time of J-PAL’s ‘research translation efforts’.
Some of these issues are discussed (without specific reference to embedded experiments) in Bédécarrats et al. (2020), Deaton (2020), Khera (2021) and the World Development “Symposium on Experimental Approaches in Development and Poverty Alleviation”, March 2020.

I4I is on Telegram. Please click here (@Ideas4India) to subscribe to our channel for quick updates on our content