This post was originally published on the CGAP blog.
Rich Rosenberg blogged yesterday about an Atlantic article that profiled John Ioannidis’s critique of medical research. The article reminded me of a meeting held in Washington a few years ago. Consumers and producers of microfinance impact studies were brought together to discuss the research agenda. One participant, who is not a researcher, concluded dolefully that microfinance research lags far behind medical research.
My immediate thought was that that claim was probably false: not because microfinance research is so far ahead, but because much medical research seems full of problems. If you read beyond the newspaper headlines, it’s usual to see simple correlations between health conditions and a given diet/activity/lifestyle quickly – and falsely — assumed to be causally determined. Sample sizes are small. Lots of hypotheses get tested, but just a few get published.
According to Ioannidis, it doesn’t get better when you scour the academic medical literature. The willingness to broadcast that fact has made John Ioannidis a rock star in the health statistics world. Health researchers routinely test many, many hypotheses; often rely on small samples; and face fierce competition to get published in top journals. One result, Ioannidis argues, is that the pressure to publish, combined with editors’ penchants for publishing striking (often counter-intuitive) results, means that a lot of results get published that wouldn’t hold up if the sample had been larger or the tests more robust. Ioannidis’s analysis holds especially strongly in non-experimental studies (his simulations suggest that 80% of “results” from non-randomized studies are in fact wrong). Another big result is that randomized controlled trials do much better (25% of health RCTs are wrong, according to Ioannidis).
Of course, being wrong just one-quarter of the time is no cause for celebration. But it does point to a real strength of RCTs – which is even more true of RCTs in microfinance .
The RCTs do better because they are designed from the ground up to test a particular (and narrow) set of hypotheses. That greatly curtails the opportunity for “fishing expeditions” and the chance that one of your 57 hypotheses happens to be statistically significant by a fluke. The questions tend to be far more focused in the microfinance context. Does access to microfinance increase business profit? Business investment? Household consumption? Those microfinancehypotheses usually stem from a clear theoretical model and should show up in clear patterns.
That’s far different from many medical studies, in which a much greater range of plausible hypotheses exist (along with a greater range of incorrect hypotheses). The situation persists because the specific pathways that link diet/activity/lifestyle to health conditions remain poorly understood. So lots of stuff gets tested in the medical literature, and “effects” may emerge that pass standard levels of statistical significance but which are caused by odd outliers or other features common to small data sets – and which turn out to be wrong.
So, on this score, Ioannidis’s criticisms are far less of a concern when it comes to microfinance. We have tighter theoretical understandings, a smaller set of hypotheses, and, usually, bigger samples.
But don’t relax completely. Microfinance research has its own set of concerns. Here are a few:
1) Replication. We don’t (and can’t) replicate studies in the sense that medical researchers can. Medical researchers replicate by trying a similar test again with a different, similar sample. It can be a big help in determining the robustness of the initial finding, over-riding results that do not hold when repeatedly replicated. But in microfinance, to the extent that we replicate, we do it to test the same idea in very different settings. Sure, it worked in Bosnia, but will it work in East Timor? Argentina? What we get is a mapping of the landscape, learning how financial mechanisms work in different contexts. That’s helpful to understand, but because contexts vary so greatly, a positive finding in one place rarely has power to knock out a negative result elsewhere. Replication is crucial, but not for the reasons that replication is crucial in the medical context.
2) External validity. The problem of replication above is tied to a more general problem in extrapolating from one context to another. This is hardly a problem unique to RCTs: it is a generic problem of evaluation. On this, researchers could do a better job of explaining who’s in their samples and how the populations relate to communities in other regions or countries.
3) But is it an interesting parameter? When there’s a debate about RCT results, it’s usually not about whether the finding is wrong or right, but about whether it’s interesting. Did the experimental design yield an estimate of an aspect of microfinance impact that matters most? Researchers deserve much credit for measuring the short-term impact of new urban microfinance branches, say, but if we had magical powers we’d really like to measure the impact of established branches in more typical settings, and we’d want to see longer-term impacts. It’s not fair to downplay the findings that we have just because we lust after an idealized (and unmeasurable) set of parameters. Still, we need to accept that a given study might not give us everything we want to know.
The problem with imperfect studies is that you don’t know how big the bias is (but you know the bias could be really big). RCTs have taken us a huge step forward, and promise to deliver clean estimates of various slices of microfinance impact. In the end, the big question is not the one Ioannidis asks (are the results apt to be right or wrong?). The big questions concern how the particular results matter to our understanding of microfinance broadly.