Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/20698
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGhorbani, M-
dc.contributor.authorSwift, S-
dc.contributor.authorTaylor, SJE-
dc.contributor.authorPayne, AM-
dc.date.accessioned2020-04-21T11:42:54Z-
dc.date.available2020-04-21T11:42:54Z-
dc.date.issued2020-08-
dc.identifier.citationGhorbani, M., Swift, S., Taylor, S.J.E. and Payne, A.M. (2020) 'Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets', Journal of Grid Computing, 18, pp. 507–527. doi:10.1007/s10723-020-09518-y.en_US
dc.identifier.issn1570-7873-
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/20698-
dc.description.abstract© The Author(s) 2020. The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.-
dc.format.mediumPrint-Electronic-
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.rightsOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectBOINCen_US
dc.subjectdesktop griden_US
dc.subjectDNA sequenceen_US
dc.subjectfeature subset selectionen_US
dc.subjectmachine learningen_US
dc.subjecthigh performance computing-
dc.subjectWS-PGRADE-
dc.subjectgUSE-
dc.subjectDNA feature identification-
dc.subjectspeedup-
dc.titleDesign of a flexible, user friendly feature matrix generation system and its application on biomedical datasetsen_US
dc.typeArticleen_US
dc.identifier.doihttps://doi.org/10.1007/s10723-020-09518-y-
dc.relation.isPartOfJournal of Grid Computing-
pubs.publication-statusPublished-
dc.identifier.eissn1572-9184-
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdf6.78 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons