Multilingual sentiment analysis of Arabic, Bahraini dialects and English

Omran, Thuraya Mohamed Maki

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27228

Title:	Multilingual sentiment analysis of Arabic, Bahraini dialects and English
Authors:	Omran, Thuraya Mohamed Maki
Keywords:	Natural language processing;Resource scarcity;Parallel dataset;Transfer learning;LSTM deep learning model
Issue Date:	2022
Publisher:	Brunel University London
Abstract:	Sentiment analysis is a crucial natural language processing (NLP) task to analyze the user’s emotions and opinions towards entities such as events, services, or products. Arabic NLP faces numerous challenges, some of which include: (1) the scarcity of resources, especially in modern standard Arabic and Arabic dialects, particularly the Bahraini one; (2) the lack of multilingual deep learning models; and (3) insufficient transfer learning studies on Arabic dialects in general and Bahraini dialects specifically. This research aims to create a balanced dataset of Bahraini dialects that covers product reviews by translating English Amazon product reviews to modern standard Arabic, which were then converted to Bahraini dialects. Another aim of this research is to provide a multilingual deep learning long short-term memory (LSTM) model to analyze the parallel dataset of English, modern standard Arabic, and Bahraini dialects, which differ in linguistic properties. Many experiments were conducted using train-validate-test split and k-fold cross-validation to evaluate the model performance using accuracy, F1 score, and AUC metrics. The average accuracy of the model on all datasets ranged from 96.72% to 97.04% and 97.91% to 97.93% in the F1 score, while in AUC was 98.46% to 98.7% when utilizing an augmentation technique. The LSTM model was incorporated in a stacking ensemble learning process that includes other LSTM architectures as base learners and a decision tree (DT) as a meta-learner. Interestingly, promising results were obtained, such as 99.52%, 99.25%, and 98.52% of mean accuracy for English, MSA, and BDs datasets. Moreover, the LSTM model was utilized as a pre-trained model in the transfer learning process to exploit the knowledge gained from analyzing the product reviews in Bahraini dialects to perform another sentiment analysis task on a small dataset of movie comments in the same dialects. The pre-trained model performance was 96.97% accuracy, 96.65% F1 score, and 97.94% AUC.
Description:	This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London
URI:	http://bura.brunel.ac.uk/handle/2438/27228
Appears in Collections:	Computer Science Dept of Computer Science Theses

Files in This Item:

File	Description	Size	Format
FulltextThesis.pdf		5.26 MB	Adobe PDF	View/Open

Show full item record