Olorisade, BK, Brereton, OP and Andras, P (2019) The use of bibliography enriched features for automatic citation screening. Journal of Biomedical Informatics, 94. ISSN 1532-0480

[thumbnail of manuscript 2 - KBO - OPB - PA.pdf]
manuscript 2 - KBO - OPB - PA.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (354kB) | Preview


Citation screening (also called study selection) is a phase of systematic review process that has attracted a growing interest on the use of text mining (TM) methods to support it to reduce time and effort. Search results are usually imbalanced between the relevant and the irrelevant classes of returned citations. Class imbalance among other factors has been a persistent problem that impairs the performance of TM models, particularly in the context of automatic citation screening for systematic reviews. This has often caused the performance of classification models using the basic title and abstract data to ordinarily fall short of expectations.

In this study, we explore the effects of using full bibliography data in addition to title and abstract on text classification performance for automatic citation screening.

We experiment with binary and Word2vec feature representations and SVM models using 4 software engineering (SE) and 15 medical review datasets. We build and compare 3 types of models, binary-non-linear, Word2vec-linear and Word2vec-non-linear kernels) with each dataset using the two feature sets.

The bibliography enriched data exhibited consistent improved performance in terms of recall, work saved over sampling (WSS) and Matthews correlation co-efficient (MCC) in 3 of the 4 SE datasets that are fairly large in size. For the medical datasets, the results vary, however in the majority of cases the performance is the same or better.

Item Type: Article
Additional Information: This is the accepted author manuscript (AAM). The final published version (version of record) is available online via Elsevier at https://doi.org/10.1016/j.jbi.2019.103202 - please refer to any applicable terms of use of the publisher.
Uncontrolled Keywords: Computing methodologies; Citation screening automation; Systematic reviews; Text mining; Feature enrichment
Subjects: Q Science > QA Mathematics > QA76 Computer software
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Faculty of Natural Sciences > School of Computing and Mathematics
Depositing User: Symplectic
Date Deposited: 10 May 2019 08:43
Last Modified: 07 May 2020 01:30
URI: https://eprints.keele.ac.uk/id/eprint/6303

Actions (login required)

View Item
View Item