The Limits and Power of Peer Review
By Dale E. Hammerschmidt, M.D., and Michael Franklin
The peer review process can protect against scientific fraud, but it isn’t fail-safe.
Earlier this year, the South Korean stem-cell researcher Woo Suk Hwang was discredited for having faked two studies published in the prestigious journal Science. Hwang claimed to have produced human embryonic stem cells by transferring the hereditary material in the nucleus of cells from adults into donated human embryonic cells. Hwang’s research was hailed as a huge step toward therapeutic cloning and in turn toward tissue transplantation without fear of immune reactions. Even Hwang was caught up in the excitement, announcing in 2004 that his goal was to “find the causes of incurable diseases and to offer a new window for cures.”1 We now know that was a bald-faced lie.
In the aftermath of the stem-cell scandal, the peer-review process on which journals depend for credibility has come under scrutiny. Is peer review up to the challenge of detecting scientific fraud? The answer based on our experience at the Journal of Laboratory and Clinical Medicine is yes and no.
Peer review, we believe, is good at detecting when scientists draw the wrong conclusions from empirical data as a result of errors in study design or analysis. The peer reviewer begins with the assumption that he’s not being lied to, and his charge is that of referee rather than sleuth. The question Do the data support the conclusions? comes more naturally than Did this guy simply make this up? One’s peers are often quite helpful at identifying flaws in experimental design or data analysis—honest mistakes or oversights on the part of the researcher. Scientific fraud of the sort involving deliberate fabrication of data or selective reporting of data is not as easy for journal editors or peer reviewers to detect. Nevertheless, journals—as the scene of the crime—have a special obligation to their readership and, in the case of medical research, to patients whose health care providers might ultimately make clinical decisions based on false information.
A Case in Point
During our tenure at the Journal of Laboratory and Clinical Medicine, we have had the misfortune of receiving papers from a few authors who have attempted to hoodwink us into publishing fake research. In these cases, the peer-review process has both fallen short and exceeded expectations.
More than a decade ago, the editors of the Journal of Laboratory and Clinical Medicine published manuscripts that turned out to be fraudulent.2 In August of 1992, we heard concerns that two clinical articles we had already published might be based on false research.3,4 At the same time, a third article by the same author—Aws S. Salim—was under review. The reviewer of this third paper raised similar concerns.
When confronted, Salim proclaimed his innocence but declined to produce the primary data for independent review. In a sense, we were stuck: Ordinarily, allegations of scientific misconduct would be referred to the appropriate academic authority. However, this author had changed affiliations several times, and no institution was eager to claim him as their own. Complicating matters, some of the research was said to have been done in Iraq, where political and military realities stood in the way of even the simplest fact-checking.
The journal’s editorial staff faced a dilemma: either do nothing or conduct an independent investigation. The editorial staff deemed that it had an obligation to its readers and to the scientific community to set the record straight if fraud had occurred. We examined all of Salim’s published works and compared them with his curriculum vitae; we also examined the sequence of the research, both within his personal publication corpus and within the context of scientific knowledge. Things didn’t fit. It just didn’t seem possible that he could in a few years have published a series of single-author papers describing more than 4,000 patients who were followed for up to seven years. At his peak, he was publishing a major single-author paper twice a month—and this with no acknowledgements to or co-authorships by the assistants he would have needed to do so.
Upon closer inspection, an analysis of the timeline of Salim’s work in relation to other developments in medicine also raised suspicion. His studies of allopuinol as an antioxidant, for example, began in humans several years before the publication of studies describing that property of the drug. A claim of independent prior discovery was a bit hard to swallow. Yet this is not the only oddity in the author’s publication record. Animal studies reported during his tenure in Scotland were carried out after the human studies he conducted in Iraq. Inexplicably, the animal study reports close with a call for additional human studies, which—according to the author—he had already conducted years earlier.
Salim’s explanations were less than satisfying, and the editorial staff reached the conclusion that the papers could no longer be considered credible enough to retain the journal’s blessing. If these concerns had been presented to the editors before publication, we would certainly not have published them unless the author could have produced the raw data for independent review. The editors, therefore, published our concerns, published Salim’s response, announced that the response had been found wanting, and withdrew its aegis from the papers. In taking this action, we did not brand the papers as fraudulent per se and retract them. Rather, we called attention to the serious, unresolved concerns about them. We advised readers to form their own opinions about the papers and to exercise caution in relying on them or citing them—in short, to treat them as they would treat non-peer-reviewed preliminary communications, abstracts, or letters.
Yet, setting the record straight might be more challenging than simply publishing a retraction or withdrawing aegis. When a retraction is published, it is linked to the original paper in PubMed, the online database of medical literature. This cross reference is intended to alert scientists to the problem, but retracted papers often continue to be cited in the medical literature. Indeed, since publishing our withdrawal of aegis from Salim’s papers, they have been cited 38 times, and as recently as the March 2006 issue of Pharmacological Reviews. The fact that they continue to be cited in other authors’ articles even after withdrawal of aegis emphasizes the importance of preventing fake research from getting into the medical literature in the first place. Although this is not always possible, we have rejected papers for publication on the suspicion of scientific misconduct.
Fact or Fiction?
In July of 2001, the Journal of Laboratory and Clinical Medicine received a paper from a group of researchers from Athens, Greece. The question asked by the researchers was related to copper kinetics during prolonged inactivity. The method reported by the researchers was startling. Forty trained athletes (running an average of more than 10 km per day) were reportedly studied. Half were restricted to 700 level meters of walking per day for an entire year.
We thought the study was unethical on several counts: the real potential for harm, and the lack of justification for such an extreme study—or even for the use of human subjects at all. Taking a trained athlete and restricting his or her activity for an entire year would predictably lead to a number of health consequences that might take months to resolve and leave permanent residua. Against these considerations, it would have been appropriate to have the highest possible resolution of the issue in animal studies and the shortest possible confirmatory study in
In the unlikely event that a year-long study really proved necessary, carrying out the study would require careful evaluation of the subjects for the potential ill effects of the experience. The authors reported no serial measurements of muscle mass, bone density, maximum VO2, body fat content, plasma lipids levels, etc. Nor did the authors report the results of the subjects’ recovery or how many returned to their previous level of fitness. Finally, there was a pragmatic question that really made our reviewing editor think “fiction” rather than “bad study”: would 40 highly trained athletes ever consent to be in a study that made them sit still for a year? Probably not.
In short, either the article was fiction or something of great potential harm was done to ask a research question of only modest importance. In the face of these rather serious ethical concerns, no evidence was presented as to how much harm actually resulted. That’s not OK.
The authors’ egregious breach of research ethics aroused our skepticism. Would the ethics committee or institutional review board of a medical center ever approve such a study? Not surprisingly, we could not find the Web site for the authors’ claimed institution of origin. Nor could we find it in an online version of the Athens phone book. Of the many institutional affiliations listed in this group of researchers’ 70-plus publications, we could only find one—Hokkaido University in Japan—with a Web presence. In e-mail exchanges, the department head at Hokkaido informed us that he had no recollection of any of the authors who listed Hokkaido University as their sponsoring institution. We then asked a colleague to visit the address for the research institution in Athens, and it turned out to be a private residence. Moreover, none of the European researchers in hypokinesia that we contacted knew the authors or their institutions.
We shared our concerns with the corresponding author, but he never responded. Methinks the gentleman doth protest too little.
Unfortunately, to this day, other—mainly international—journals continue to publish these researchers’ manuscripts. As recently as July 2005, the International Journal of Medical Sciences published a paper by this group. Fortunately, the impact of these authors’ body of published manuscripts seems to be insignificant. Citations of their work are largely self-referential. Their motivation remains a mystery.
What is of greater concern, though, is that manuscripts from this group of researchers continue to pass frequent tests of peer review. There is no mechanism for alerting the broader community of medical editors. For that reason, what might be a communal—and more rigorous—screening process becomes less so. If an author submits to multiple publications, chances increase that eventually one of them will publish the manuscript.
The Trouble with Peer Review
Why is peer review an insufficient barrier and retraction an insufficient remedy to scientific fraud?
Peer reviewers don’t judge whether or not experiments have actually been performed and reported completely. Often such judgments can only be made after extensive familiarity with an author’s work. One of the referees of Salim’s paper noted that the results seemed “almost too good,” but the only reviewer to raise serious questions of validity was one who had reviewed several of Salim’s manuscripts and was surprised by the number of his single-author works. Thus, in the context of peer review, even recurring doubts lack cumulative effect unless the same critical reviewer encounters several questionable articles from the same source. Even then, clever fraudsters can evade detection by avoiding a common downfall of many of those who have been exposed—over-reaching. Peer review simply isn’t designed to render judgments about the authenticity of a manuscript. Rather, this responsibility is shared—not only by journal editors and peer reviewers—but by co-authors, chairs of medicine, division leaders, lab assistants, the entire community of individuals involved in the day-to-day practice of medical research. Perhaps this onus falls most squarely on the co-authors: As those who share most directly in the credit for the published research—or the dishonor it if is discredited, they should be especially diligent.
Even after fraud is suspected, the confidentiality that rules in peer review abets the perpetrator by inhibiting the investigation of allegations against him. In our investigation of Salim, only two journals shared their reviews with us. Our efforts to alert editors of journals that had published manuscripts from the group in Athens were often met with defensiveness or simply ignored. Thus, even serious allegations of fraud may be insufficient to motivate editors to provide corroborative peer judgments or take decisive action.
Peer reviewers cannot be expected to take the time to study a researcher’s body of work or to reconstruct his professional biography. (We can imagine the chilling effect on referee recruitment if this task were to be added.) Nor is it reasonable to expect them routinely to review an author’s credentials if supplied.
However, potential authors can be asked to certify that, in the event that fraud is alleged, they will produce a curriculum vitae and documentation from their institution that they had adequate facilities and staff to conduct the research in question. Asking for such guarantees would not break with, but rather would extend, existing practice. Some journals already require limited biographical information and information as to provenance. We recommend that potential authors routinely give biomedical journals the right to scrutinize the actual data on which the paper was based. This would again require that these records be kept for a useful interval, which we have set to be five years at the Journal of Laboratory and Clinical Medicine. This creates the simple expectation that an author accused of fraud will respond by producing the raw data and proof of ability to have performed the research in question. If he cannot, the journal will have clear authority to withdraw aegis or to retract the article.
In deciding what steps to take in response to allegations of fraud, a journal should examine the seriousness of the concern, the strength of the evidence, and (perhaps most important) the risk of harm resulting from inaction. The best course often might be to take no specific action, letting the observation stand or fall according to its ability to be confirmed or refuted by others. On the other hand, a journal may be compelled to act if the allegation is of blatant data fabrication, the evidence is substantial, and the study has immediate clinical application. Often that responsibility will be fulfilled by reporting the concerns and preliminary findings to the appropriate academic authority. Where none exists, the journal may be forced to take a more definitive step on its own authority. MM
Dale Hammerschmidt is editor in chief and Michael Franklin is managing editor of the Journal of Laboratory Clinical Medicine, which is based at the University of Minnesota.
1. Dreifus C. A conversation with Woo Suk Hwang and Shin Yong Moon; 2 Friends, 242 Eggs and a Breakthrough. New York Times. 2004;February 17:F1.
2. Hammerschmidt DE, Gross AG. Allegations of impropriety in manuscripts by Aws S. Salim: examination and withdrawal of journal aegis. The Executive Editorial Committee of the Journal of Laboratory and Clinical Medicine. J Lab Clin Med. 1994;123(6):795-9.
3. Salim AS. Allopurinol and dimethyl sulfoxide improve treatment outcomes in smokers with peptic ulcer disease. J Lab Clin Med. 1992;119(6):702-9.
4. Salim AS. Role of oxygen-derived free radical scavengers in the management of recurrent attacks of ulcerative colitis: a new approach. J Lab Clin Med. 1992; 119(6):710-7.