Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27045
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNaseem, U-
dc.contributor.authorKhushi, M-
dc.contributor.authorKhan, SK-
dc.contributor.authorShaukat, K-
dc.contributor.authorMoni, MA-
dc.date.accessioned2023-08-24T09:15:18Z-
dc.date.available2021-03-15-
dc.date.available2023-08-24T09:15:18Z-
dc.date.issued2021-03-15-
dc.identifierORCID iDs: Usman Naseem https://orcid.org/0000-0003-0191-7171; Matloob Khushi https://orcid.org/0000-0001-7792-2327; Kamran Shaukat https://orcid.org/0000-0003-2174-3383.-
dc.identifier23-
dc.identifier.citationNaseem, U. et al. (2021) 'A comparative analysis of active learning for biomedical text mining', Applied System Innovation, 2021, 4 (1), 23, pp. 1 - 18. doi: 10.3390/asi4010023.en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/27045-
dc.descriptionData Availability Statement: The code and data are available from https://github.com/usmaann (accessed on 14 March 2021).en_US
dc.description.abstractCopyright © 2021 by the authors. An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling.en_US
dc.description.sponsorshipThis research received no external funding.en_US
dc.format.extent1 - 18-
dc.format.mediumElectronic-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherMDPIen_US
dc.rightsCopyright © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectactive learningen_US
dc.subjectmachine learningen_US
dc.subjectbiomedical natural language processingen_US
dc.titleA comparative analysis of active learning for biomedical text miningen_US
dc.typeArticleen_US
dc.identifier.doihttps://doi.org/10.3390/asi4010023-
dc.relation.isPartOfApplied System Innovation-
pubs.issue1-
pubs.publication-statusPublished-
pubs.volume4-
dc.identifier.eissn2571-5577-
dc.rights.holderThe authors-
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).1.52 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons