Addressing the Research Replication Crisis

Medical schools and teaching hospitals are helping early career researchers learn best practices and how to improve writing skills for research reproducibility.
None

What is a scientific study worth if other researchers can’t achieve the same results? 

Scientists, and medical researchers in particular, sounded the alarm a few years ago after multiple reports from scientists who were unable to duplicate high-profile studies. Some studies have suggested that roughly 50% of preclinical research may be irreproducible. Variability in biological reagents and reference materials, incomplete documentation of methods, poor study design, and inappropriate analysis are among the problems.

“The checks and balances that once ensured scientific fidelity have been hobbled. This has compromised the ability of today’s researchers to reproduce others’ findings,” declared Francis S. Collins, MD, PhD, and Lawrence A. Tabak, DDS, PhD, director and principal deputy director, respectively, of the National Institutes of Health (NIH) in a January 2014 commentary in Nature.

In response, the NIH spearheaded an effort to better train young researchers in experimental methods and ethics, including a discussion of rigor and reproducibility in grant applications. NIH funds have helped to produce online training modules aimed at early career researchers “to make sure that graduate students are getting good training in experimental design and statistics,” says Kristine Willis, PhD, program director in the NIH Division of Genetics and Developmental Biology. The NIH is also awarding grants to institutions to provide training in rigor and reproducibility. “We’re really trying to tackle this on all fronts,” Willis notes. 

“The rigor and reproducibility training is really an effort to make everybody aware of what can happen when things aren’t done by the book and how to avoid those kinds of problems.”

Carrie Cameron, PhD
MD Anderson Cancer Prevention Research Training Program

In addition, some medical schools are developing research training on their own, including workshops, online training modules, and additions to curricula designed to help young researchers conduct stronger science. The AAMC recognized three of these new training efforts as templates that other schools might follow with the 2017 Innovation in Research and Research Education Awards.

Improving training for researchers through innovation

Collins and Tabak maintained in their commentary that with rare exceptions, there is no evidence to suggest that irreproducibility is caused by scientific misconduct. Instead, they blamed poor training in experimental design, too much emphasis on “provocative statements,” and incomplete reporting on design and methods. “And some scientists reputedly use a ‘secret sauce’”—a reference to nonstandard biological reagents or reference materials—“to make their experiments work—and withhold details from publication or describe them only vaguely to retain a competitive edge,” they wrote.

The 2017 AAMC Innovation in Research and Research Education Awards highlighted institutions who implemented exemplary programs to address these problems. “One of the things the AAMC is really good at is being a forum for institutions to share what they’re doing,” says Jodi Yellin, PhD, AAMC director of science policy. 

The Gulf Coast Consortia, seven research institutions in the Houston, Texas, area, won the AAMC’s first prize for a rigor and reproducibility workshop that involved many institutions. A pilot workshop in early 2017 led to larger workshops with more than 130 registrants from 12 institutions.

“What we loved is that it was multi-institutional,” says Yellin.

The Gulf Coast Consortia curriculum addressed analysis and statistics, experimental design, big data, cell-based models, animal models, ethics and scientific integrity, record keeping, and record sharing. Suzanne Tomlinson, PhD, director of research programs, Gulf Coast Consortia for Quantitative Biomedical Sciences, Rice University, led the project.

“The problem is that a lot of research is being published that is not necessarily as definitive as it might seem, given the fact that it has been peer-reviewed and published in a journal, sometimes a high-impact journal,” says Carrie Cameron, PhD, associate director of the MD Anderson Cancer Prevention Research Training Program. Cameron worked on the instructional design and the pre- and postassessments. 
 
There are reasons that medical research fails the rigor and reproducibility test, which the Gulf Coast Consortia addressed in its workshops. Often, record keeping is so sketchy that researchers can’t account for mislabeled data, volume of reagents used, or living conditions of their lab animals, says Cameron.

Statistical methods are another common fault, she says. Researchers sometimes troll through data to find patterns that appear to be statistically significant or cherry-pick favorable results—practices often referred to as “p-hacking.” Says Cameron, “You’ve taken that one little bit that makes it look like your study results are significant and you simply don’t report the rest because you want to get published.”

Rarely, but occasionally, data manipulation reaches the level of fraud. “There are more serious issues, such as improper manipulation of images to show desired results or fabrication of data, which are unethical,” says Cameron. 

“The rigor and reproducibility training is really an effort to make everybody aware of what can happen when things aren’t done by the book and how to avoid those kinds of problems,” says Cameron.

The AAMC’s second prize went to Robert Nicholas, PhD, professor and vice chair for research and education in the Department of Pharmacology, and Mohanish Deshmukh, PhD, professor in the Department of Cell Biology and Physiology. Both are at the University of North Carolina at Chapel Hill.

Nicholas and Deshmukh developed a series of seven 2.5-hour workshops called Best Practices for Reproducibility and Rigor in Research. Workshop topics included industry best practices, experimental design, experimental rigor, record keeping, data acquisition and archiving, and data analysis and reporting. 

Says the AAMC’s Yellin, “These workshops were very detailed and specific in terms of the structure and design.”

“The course was successful and covered the most important topics for conducting rigorous and reproducible research,” Nicholas and Deshmukh report. Forty students from a wide range of research interests and departments participated and offered suggestions that will be incorporated into future workshops.

The third-place award recognized the addition of a prose and code writing component to responsible research training across many scientific disciplines at the University of Miami. The Writing Prose and Writing Code program was developed by Joanna Johnson, PhD, director of writing and director of scientific writing programs at the University of Miami, and Kenneth Goodman, PhD, director of the University of Miami Miller School of Medicine’s Institute for Bioethics and Health Policy. 

“We felt it was important to recognize that scientists had a moral responsibility to learn to write well. It’s hard, it’s difficult, and it takes time. And it’s an essential part of the scientific research process.” 

Joanna Johnson, PhD
University of Miami 

“The reviewers were drawn to this one because communication is an important part of all of this—how the code and prose melded together,” says Yellin.

Johnson and Goodman hit on the idea that writing bad prose and bad code both contribute to the problem of producing research that can’t be reproduced. As a result, the university added prose and code writing to its curriculum on data management, publication and authorship, human subjects protection, and animal research.

Scientific prose is often poor, full of passive voice, hedging, promotion, inaccuracy, and needless complexity, Johnson says. “We felt it was important to recognize that scientists had a moral responsibility to learn to write well. It’s hard, it’s difficult, and it takes time. And it’s an essential part of the scientific research process,” she says.

Similarly, writing code for analyzing data can be done poorly and with improper attention paid to where it came from, Goodman notes. The resulting analysis can be hard to reproduce. Says Goodman, “It’s an unrecognized part of what scientists do every day.”

At NIH, Willis sums up about rigor and reproducibility  issues that it’s hard to pin down the portion of scientific research that is problematic. “That said,” she continued, “As a program director at NIH and as a taxpayer, any problem is too much, right? This is really important, and we want to try to get it down to the absolute minimum.”