Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/13733
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAlsaad, A-
dc.contributor.authorAbbod, M-
dc.date.accessioned2016-12-21T13:10:57Z-
dc.date.available2016-09-23-
dc.date.available2016-12-21T13:10:57Z-
dc.date.issued2016-
dc.identifier.citationProceedings - UKSim-AMSS 17th International Conference on Computer Modelling and Simulation, UKSim 2015, pp. 90 - 94, (2016)en_US
dc.identifier.isbn9781479987122-
dc.identifier.urihttp://bura.brunel.ac.uk/handle/2438/13733-
dc.description.abstractDuring the past few years, the construction of digitalized content is rapidly increasing, raising the demand of information retrieval, data mining and automatic data tagging applications. There are few researches in this field for Arabic data due to the complex nature of Arabic language and the lack of standard corpora. In addition, most work focuses on improving Arabic stemming algorithms, or topic identification and classification methods and experiments. No work has been conducted to include an efficient stemming method within the classification algorithm, which would lead to more efficient outcome. In this paper, we propose a new approach to identify significant keywords for Arabic corpora. That is done by implementing advanced stemming and root extraction algorithm, as well as Term Frequency/Inverse Document Frequency (TFIDF) topic identification method. Our results show that combining advanced stemming, root extraction and TFIDF techniques, lead to extracting a highly significant terms represented by Arabic roots. These roots weights higher TFIDF values than terms extracted without the use of advanced stemming and root extraction methods. Decreasing the size of indexed words and improving the feature selection process.en_US
dc.format.extent90 - 94-
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectRoot extractionen_US
dc.subjectFeature selectionen_US
dc.subjectTopic identificationen_US
dc.subjectNatural language processingen_US
dc.subjectData miningen_US
dc.subjectText miningen_US
dc.titleEnhanced topic identification algorithm for Arabic corporaen_US
dc.typeConference Paperen_US
dc.identifier.doihttp://dx.doi.org/10.1109/UKSim.2015.77-
dc.relation.isPartOfProceedings - UKSim-AMSS 17th International Conference on Computer Modelling and Simulation, UKSim 2015-
pubs.publication-statusPublished-
Appears in Collections:Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:
File Description SizeFormat 
FullText.docx50.74 kBUnknownView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.