An Analysis of Work Saved over Sampling in the Evaluation of Automated Citation Screening in Systematic Literature Reviews
Citation screening is an essential and time-consuming step of the systematic literature review process in medicine. Multiple previous studies have proposed various automation techniques to assist manual annotators in this tedious task. The most widely used measure for the evaluation of automated citation screening techniques is Work Saved over Sampling (WSS). In this work, we analyse this measure and examine its drawbacks.
We subsequently propose to normalise WSS which enables citation screening performance comparisons across different systematic reviews.
We analytically show that normalised WSS is equivalent to the True Negative Rate (TNR).
Finally, we provide benchmark scores for fifteen systematic review datasets with TNR@95% recall measure and compare the measure with Precision and AUC.
W. Kusa, A. Lipani, P. Knoth, A. Hanbury, An Analysis of Work Saved over Sampling in the Evaluation of Automated Citation Screening in Systematic Literature Reviews, Intelligent Systems with Applications (2023) 200193.