Foundation of Sand

March 23, 2022

The last decade has seen an explosion of interest in and production of real world evidence (RWE). But I’m afraid there is a day of reckoning approaching. I predict that within the next 18 months we will open our morning paper (e.g., Twitter) to a headline trumpeting the disastrous failure of RWE to identify some catastrophic drug adverse event.

Real world data (RWD) has allowed researchers to generate massive amounts of evidence on epidemiology, natural history, and burden of diseases, as well as the efficacy of and adverse events related to medication or other interventions.  

But there is a very weak link in the chain connecting RWD to accurate, clinically relevant findings. Namely, one must believe that the codes being used in a database are correctly capturing the concepts being investigated.

If the combinations of ICD codes, or algorithms, used to identify patients, outcomes, or comorbidities are wrong, then the conclusions of the research are flawed, and RWE is a house built on a foundation of sand.

Twenty-five years ago, a group of researchers called for all publications using RWE to report the reliability and validity of their data. A decade later, an international consortium defined the lack of algorithm validation as one of the top three methodological problems in the field.

But a 2011 study found fewer than 1 in 20 RWE papers actually cited a measure of code accuracy. And some codes posed only a 50-50 chance of identifying people with the condition of interest.

Even studies that report on algorithm performance usually only report positive predictive value—a statistic that could be argued is almost entirely useless in determining whether the study comes to a valid conclusion. (Mostly because PPV, which varies with disease prevalence, is typically tested in populations with massively elevated prevalence rather than in populations mimicking actual prevalence.)

My 20+ years’ experience publishing in the field confirms this: for every 20 studies we do using RWD, we do at most one or two validating codes or algorithms.

If as a research community we fail to demonstrate the validity of our work, our audience (FDA, clinicians, payers, or the public) has every right to ignore our findings.

The main argument against routinely measuring and reporting algorithm validity is the time and money required. Innovative methods like machine learning or artificial intelligence might one day ameliorate some of these problems—the same way flying cars might reduce congestion on our roads. But, as my son used to say when I lost to him at Monopoly, “Hope is not a strategy.”

So what can we do to avoid the disaster I predicted at the top of this post?

Here are 5 things that will make a difference:

  • As an author of RWD studies, always cite validation work if it exists.
  • If validation was not done, be honest about how your conclusions may be wrong. And don’t just write a bland statement that “misclassification might have occurred.” Give a range of plausible values for the accuracy of the codes you used.
  • Better yet, follow all the recommendations in the RECORD-PE and STROBE guidelines.
  • As a reviewer, recommend to authors that they do 1, 2, 3 above. 
  • Encourage your funder to pay for validation studies—if it helps, point out that RWD studies with validated codes get published in journals with higher impact factors.


I invite you to join me and a panel of experts as we discuss this topic in a webinar on Tuesday, April 12, entitled “Cracking the Code: When and How to Validate ICD Algorithms for RWE”. I hope you will join us.

Share This Story!