Enhanced topic identification algorithm for Arabic corpora

Alsaad, A; Abbod, M

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/13733

Full metadata record

DC Field	Value	Language
dc.contributor.author	Alsaad, A	-
dc.contributor.author	Abbod, M	-
dc.date.accessioned	2016-12-21T13:10:57Z	-
dc.date.available	2016-09-23	-
dc.date.available	2016-12-21T13:10:57Z	-
dc.date.issued	2016	-
dc.identifier.citation	Proceedings - UKSim-AMSS 17th International Conference on Computer Modelling and Simulation, UKSim 2015, pp. 90 - 94, (2016)	en_US
dc.identifier.isbn	9781479987122	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/13733	-
dc.description.abstract	During the past few years, the construction of digitalized content is rapidly increasing, raising the demand of information retrieval, data mining and automatic data tagging applications. There are few researches in this field for Arabic data due to the complex nature of Arabic language and the lack of standard corpora. In addition, most work focuses on improving Arabic stemming algorithms, or topic identification and classification methods and experiments. No work has been conducted to include an efficient stemming method within the classification algorithm, which would lead to more efficient outcome. In this paper, we propose a new approach to identify significant keywords for Arabic corpora. That is done by implementing advanced stemming and root extraction algorithm, as well as Term Frequency/Inverse Document Frequency (TFIDF) topic identification method. Our results show that combining advanced stemming, root extraction and TFIDF techniques, lead to extracting a highly significant terms represented by Arabic roots. These roots weights higher TFIDF values than terms extracted without the use of advanced stemming and root extraction methods. Decreasing the size of indexed words and improving the feature selection process.	en_US
dc.format.extent	90 - 94	-
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Root extraction	en_US
dc.subject	Feature selection	en_US
dc.subject	Topic identification	en_US
dc.subject	Natural language processing	en_US
dc.subject	Data mining	en_US
dc.subject	Text mining	en_US
dc.title	Enhanced topic identification algorithm for Arabic corpora	en_US
dc.type	Conference Paper	en_US
dc.identifier.doi	http://dx.doi.org/10.1109/UKSim.2015.77	-
dc.relation.isPartOf	Proceedings - UKSim-AMSS 17th International Conference on Computer Modelling and Simulation, UKSim 2015	-
pubs.publication-status	Published	-
Appears in Collections:	Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.docx		50.74 kB	Unknown	View/Open

Show simple item record