As we come toward the period for author responses, we thought we’d update everyone about the status of the review process. This year, we adopted a short initial review cycle of two weeks to ensure that the reviews would be completed with enough time for quality assurance checks before authors see the reviews. The longer discussion period also helped reviewers read and check each others’ review (check). Jointly with 61 Area Chairs, Regina and I focused our efforts along three dimensions: (1) chasing late reviewers, (2) ACs manual checking of the reviews, and (3) PC chairs global checking of the reviews using scripts.
- Chasing Late Reviews. While the vast majority of reviews were completed on time, around 1% of reviews were not delivered even within a week from the deadline. For those papers, we had to ask area chairs to either find new reviewers or review the papers themselves. We are happy to say that now, we have all 3900+ reviews in. We have collected a list of reviewers who didn’t deliver (and didn’t notified us) for future PC chair considerations.
- AC Manual Checks. Each and every paper’s reviews were read personally by at least one area chair and vetted for quality (double check). This year we recruited many ACs which makes it manageable for them to closely supervise a cohort of papers. The goal of this check was to identify reviews with vague statements, unclear questions to the authors, and also closely monitor cases of inconsistency across reviewers. To resolve these issues, ACs started the discussions across reviewers and provided direct feedback to those who needed to change their reviews.
- PC chair’s programmatic checks for quality assurance. We implemented a spreadsheet that downloaded all of the reviews — all 2200+ long and 1700+ short — to check on the status by area. We ran status checks for inconsistent reviews (submissions where the reviews had a standard deviation above 1.5; about 3% initially), ones that had low confidence reviews (confidence score of 2 or 1; about 15% initially), and ones where at least one review was particularly short (50 words or less; about 3% initially), and flagged these for ACs to do a final round of checking (over 200 submissions; triple check). The excellent AC crew were already aware of most of these problems, but it definitely helped to have this layer of consistent checks implemented across the board.
Inevitably some authors will not be happy with the reviews that they receive, and that is statistically expected. We are writing this post to let you know the extent of the steps and checks that the reviewers, ACs, and PC chairs implemented to ensure that ACL remains the top venue for peer-reviewed, published work in NLP and CL. Also, as any reviewer knows, while the reviews themselves give authors’ their feedback, the discussion among ACs and the peer reviewers are also an unseen and often significant source of work that adds to the quality of the program. From checks #1 and #2, these discussions (which authors are not privy to) strongly affect many reviews before they are released to submission authors. Indeed, there are some papers where the peer review discussions rival the length and quality of the reviews themselves!
We are looking forward to submission authors’ responses over the next few days, and will be posting about this shortly.
P.S. (edit – Added quartile info as of March, 15) Below we provide some statistics on the scores per area at this midpoint juncture (excluding Biomedical and Speech, that had less than 10 submissions in each long/short category). Hopefully, these will help you to put your own scores into perspective when the initial reviews are released in the next day: