TY - JOUR
T1 - Automatic recognition of self-acknowledged limitations in clinical research literature
AU - Kilicoglu, Halil
AU - Rosemblat, Graciela
AU - Malički, Mario
AU - Ter Riet, Gerben
N1 - With file of supplementary material.
PY - 2018/7
Y1 - 2018/7
N2 - Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency.Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM).Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]).Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
AB - Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency.Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM).Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]).Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
KW - self-acknowledged limitations
KW - clinical research literature
KW - natural language processing
KW - research transparency
U2 - 10.1093/jamia/ocy038
DO - 10.1093/jamia/ocy038
M3 - Article
C2 - 29718377
SN - 1067-5027
VL - 25
SP - 855
EP - 861
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - 7
ER -