As comments published in response to our last blog illustrate, the issue of arXiv and blind reviewing is controversial. The only part on which everybody seems to agree is the inadequacy of the current policy. We are rejecting papers for non-anonymised submissions where the authors genuinely forgot to remove their names, but let in papers previously submitted to arXiv, if they are declared during submission time. While this declaration supposedly gives reviewer an option to avoid seeing author names, in practice it is almost impossible. As a result, a large fraction of submitted papers are reviewed under different conditions than the rest. If we continue with the current policy, arXiv will be a death warrant for double-blind ACL reviewing.
I am glad to report that this Saturday, Min and I participated in the winter ACL executive meeting, where this issue was discussed in depth. The big question is how to formulate the policy moving forward. The ACL exec is planning to carefully study members’ feedback on this question, and we hope that you will use this forum to express your opinions. For now, I would like to share with you a very thoughtful informational piece on this topic written by Marti Hearst, the Vice President of ACL.
Executive summary: Numerous studies have shown that single-blind review leads to bias in favor of certain types of researchers over others when the objective merit of the work is held constant. All ACL conferences and most workshops make use of double-blind reviewing for this reason. The rapid rise in popularity of online pre-print servers such as arXiv, while presenting the community with many benefits, has the potential to threaten the double-blind review process. We as a community need to make a policy decision about how to handle public pre-posting of papers that are under review at ACL conferences.
In more detail:
The arXiv online pre-print repository service has become enormously popular for distributing NLP research. Many members of the ACL community subscribe to the daily mail alert listing the latest papers that have been posted on the service in order to keep up with the most recent research.
ArXiv has the advantage of making the full text of articles easily available in open access form, and allows authors to timestamp “technical reports” of as yet unpublished work. Authors can post work-in-progress versions of papers on arXiv, receive feedback, and post revisions of those papers. If a version of a paper is eventually accepted for publication at a conference or journal, an author can indicate the citation information and the DOI in specific fields. (However, in many cases authors unfortunately fail to go back and update their paper’s arXiv entry with this information.)
The main challenge that arXiv posts for conferences is the threat to double-blind reviewing.
Many studies through the years have shown the biases that result from reviewers knowing information about the authors of the papers. The latest in a long line of such studies was ironically posted on arXiv itself and has been circulated among those who discuss conference reviewing. Its title is “Single vs. Double Blind Reviewing at WSDM”, by Andrew Tomkins, Min Zhang, and William Heavlin. https://arxiv.org/pdf/1702.00502.pdf
This paper recounts how WSDM 2017’s program chairs conducted a controlled study in which each paper was reviewed throughout the process by 4 reviewers, 2 of whom were assigned to a double-blind, and 2 to a single-blind condition, through the processes of bidding, reviewing, and entering scores. Analyzing the results, the authors found a strong biases.
Single-blind reviewers showed measurable preferences for bidding for papers from “top” institutions over double-blind reviewers. Once papers were allocated to reviewers, single-blind reviewers were significantly more likely than their double-blind counterparts to recommend for acceptance papers from famous authors (p < .0006) and top institutions (p < .0004) and were significantly more positive about those papers. (Top institution = top 50 CS departments; famous author = at least 3 WSDM papers and 100 DLBP papers).
In this case, the authors did not find differences in bidding or reviewing behavior with respect to gender of authors; other studies have shown varying effects in terms of gender bias, some finding strong effects and some finding weak to no effects. They note however that their community does not have fixed conventions for “first authors” and so they count the presence of just one female author as signifying a female author for the entire submission. (NB: I think they should have measured female-majority papers.)
I want to emphasize again that many studies across different fields and several decades have found similar bias effects, and often gender bias effects as well; the Tomkins et al. paper provides a nice overview of several such papers.
I communicated with Andrew Tomkins, asking if since release of this paper he’d heard any responses about how other entities are dealing with arXiv. I would paraphrase his response as saying that other conferences are still feeling their way with this problem. For those wondering: “What about journals? They are single-blind review.” Tomkins et al. point out that the careful selection of reviewers, the opportunities for revision, and the lack of competition for limited slots in a journal potentially mitigate the concerns in journal reviewing.
In this post I am not proposing a solution, but rather am making a point. As a community, we need to consider carefully the relationship between posting to arXiv and the benefits of double-blind reviewing, and decide where we want to go from here.