Within development and philanthropy circles, there seems to be a cycle of critique of randomized control trials in operation. Every few months a variety of posts and articles pop up discussing the limitations of RCTs attempting to make the point that RCTs are overhyped or at least substantially less useful than proponents assert.
For instance, Philip Auerswald, an economist at George Mason University who focuses on entrepreneurship, rehashed—though in slightly different form—some of the standard critiques this past week. After engaging in discussion in the comments on Phil’s site I thought it might be useful to address some of these common critiques in a more public and visible space.
The most important point to make up front is that RCTs do have limitations. They are by no means a perfect instrument even theoretically; there are also serious practical limitations in the way RCTs are deployed, reported and interpreted. The second most important point is that most of these limitations are shared by the alternatives to RCTs. I am most frustrated by critiques of RCTs that do not acknowledge this.
While the blog form doesn’t allow a comprehensive layman’s review of the issues—especially given that the true experts continue to argue over many of these issues in peer-reviewed journal pieces that I often barely understand—in this post and the next I’ll attempt to tackle the two most common critiques currently being aired (they do change over time). In this post, I’ll take a look at the External Validity Critique; in the next I’ll review the Transcendental Significance Critique.
The External Validity Critique basically points out that each RCT is anchored in a highly specific context. This includes such things as the implementer carrying out an intervention, often an NGO, the personnel hired by that NGO, local and regional culture and customs, the survey technique, the specific way questions are asked, even the weather. Thus, the critique points out, while the results from a particular RCT may tell you a lot about the impact of a particular program in a particular place during a particular point in time, it doesn’t tell you much about the result of a similar program carried out in a different context. In other words an RCT of microcredit in urban India does not necessarily tell you anything about the impact of microcredit in rural Kenya.
There’s no doubt that external validity is an issue. That’s why proponents of RCTs take replication, or conducting trials of similar interventions in several contexts, so seriously. No one should ever accept the results of a single RCT as definitive. Here I think it’s helpful to compare the use of RCTs in development to RCTs in healthcare.
RCTs form the backbone of proving efficacy and safety of drugs or other healthcare treatments. In that realm you generally don’t hear about questions of external validity. One of the main reasons is the volume of replications using a randomized control approach that have been carried out. Each individual drug or treatment goes through dozens of replications over the course of its development. For instance a drug candidate goes through a process of testing compounds at a molecular level, then testing at a cellular level, then testing in simple animal models, then testing in complex animal models, then Phase 1 clinical trials in human beings, then Phase 2 clinical trials, then Phase 3 clinical trials, all before a drug is even submitted for approval. But there are also decades of RCTs of many drugs and treatments that have provided the basis for expecting that the results shown for a sufficiently diverse population will hold for other populations in other contexts.
We are nowhere near that level of replication for development issues. Why not? Aside from the fact that the widespread use of RCTs in development is a relatively new phenomena, replications are generally discouraged in the development and economics research cultures. Development actors often don’t want to spend the time and money required for rigorous evaluation. Academic economists, on the other hand, are tied to the research publication model which is biased toward novel studies and positive results. It’s a bit ironic that part of the reason that the microcredit RCTs that have received so much attention for finding relatively small impact were publishable in economic journals was that the microcredit movement had promised so much. Without the hype, the meager results found may never have attracted attention from journal editors.
The External Validity Critique doesn’t just apply to RCTs though. It applies to field studies or experiments using any methodology. Every study is conducted in a specific context and is not necessarily valid in other contexts—at least until it is replicated in multiple contexts with similar results. David McKenzie has pointed out that there seems to be a double standard in the application of the External Validity Critique to field experiments using RCTs.
Thus the fundamental flaw at the heart of the External Validity Critique is not truly in RCTs, but in the development policy and academic publishing arenas where replications are discounted and discouraged.
Proponents of RCTs aren’t fully off the hook however. As Jonathan Morduch frequently reminds me and others, those reporting on RCTs could do more to expose important contextual variables in their studies. When reading the results of RCTs you should always be on the lookout for signs that the population studied has unique characteristics—and it should be easier to find those signs in the papers that come out of RCTs.
Even that may not be enough however. In a recent working paper addressing the External Validty Critique, Hunt Allcott and Sendhil Mullainathan review a case where a program had been tested using RCTs in multiple contexts. They found that RCTs weren’t great at predicting effects in new contexts—but they were better than the alternatives. Meanwhile, Allcott and Mullainathan point out that there are many factors that might affect external validity, including what organizations are willing to participate. For instance, they found that MFIs participating in RCTs are systematically different than the average MFI around the world. That would potentially bias results.
In sum, the External Validity Critique deserves consideration. But it too has to be put into context before judging a particular RCT or the approach in general.
Timothy Ogden is an executive partner at Sona Partners, the editor in chief of Philanthropy Action, and co-author of Toyota Under Fire. He also blogs at HBR and SSIR.