An analysis of rater effects in reviews of scientific manuscripts

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Dana P. Turner (Creator)
Institution
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/
Advisor
John Willse

Abstract: In the peer review process used by scientific journals, ratings of manuscripts are obtained and used to make publication decisions. Though concerns have been raised about reviews given to scientific manuscripts, little has been done to address the effects of reviewer severity bias on decision making. In other settings, the methods of Generalizability Theory and Many-Facet Rasch Measurement often have been used to investigate and address such effects. The purpose of this study is to use Generalizability Theory and Many-Facet Rasch Measurement to examine the effects of reviewer severity on the ratings and decisions made during the peer review of scientific manuscripts. The merits of each method and their utility in this novel context also are assessed. Deidentified peer reviews (N = 635) that used a five-item rating scale were included in a two-facet, partially nested Generalizability Theory analysis and subsequent Decision Studies. Many-Facet Rasch Measurement analysis of the data produced reviewer severity measures and manuscript publishability measures corrected for reviewer severity. Multinomial logistic regression analysis was used to compare manuscript decision categories predicted by average raw scores and Many-Facet Rasch Measurement corrected scores. Reviewer severity rankings also were compared using raw and adjusted methods. The results of the Generalizability Theory analysis revealed that reviewers nested within manuscripts account for 35.48% of the variance in publishability scores. Manuscripts accounted for 12.21% of the total variance, and items accounted for 15.22% of the total variance. Decision Studies indicated that an unrealistic number of reviewers and items would be needed to increase the generalizability coefficient and index of dependability to acceptable levels and that other methods of improving reliability should be employed. When the average raw total score was used to predict manuscript decision category, the overall percentage of manuscripts that were correctly classified using the average raw total score was 55.15%. Using the manuscript publishability measure (theta), the percentage of manuscripts that were correctly classified when the publishability measure was used was 52.49%, suggesting differences in classification, if a manuscript publishability measures corrected for reviewer severity were used. The reviewers’ average raw ratings and the reviewers’ severity measures had a Spearman rank-order correlation of -0.6083, which demonstrates differences likely attributable to the adjustment for manuscript quality in the severity measure. These findings indicate that reviewers are inconsistent in their reviews of manuscripts. Reviewer severity bias can be addressed with Many-Facet Rasch Measurement adjustments, but additional reviewer training may be needed to improve the reliability of manuscript scores. Both Generalizability Theory and Many-Facet Rasch Measurement contributed to the findings of the study and to understanding reviewer behavior. These methods show potential for increasing the capacity for more fair and accurate rating methods in the peer review of scientific manuscripts.

Additional Information

Publication
Dissertation
Language: English
Date: 2017
Keywords
Generalizability Theory, Many-Facet Rasch Measurement, Measurement, Peer Review
Subjects
Scientific literature $x Evaluation
Technical writing $x Evaluation
Peer review

Email this document to