Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online

Marshan, A; Mohamed Nizar, FN; Ioannou, A; Spanaki, K

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27736

Title:	Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online
Authors:	Marshan, A Mohamed Nizar, FN Ioannou, A Spanaki, K
Keywords:	machine learning;deep learning;hate speech;social media;text pre-processing;text representation;text analytics
Issue Date:	24-Nov-2023
Publisher:	Springer Nature
Citation:	Marshan, A., Nizar, F.N.M., Ioannou, A. et al. Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online. Information Systems Frontiers, 0 (ahad of proint), pp. 1 - 19. doi: 10.1007/s10796-023-10446-x.
Abstract:	Copyright © The Author(s) 2023. Social media platforms have become an increasingly popular tool for individuals to share their thoughts and opinions with other people. However, very often people tend to misuse social media posting abusive comments. Abusive and harassing behaviours can have adverse effects on people's lives. This study takes a novel approach to combat harassment in online platforms by detecting the severity of abusive comments, that has not been investigated before. The study compares the performance of machine learning models such as Naïve Bayes, Random Forest, and Support Vector Machine, with deep learning models such as Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM). Moreover, in this work we investigate the effect of text pre-processing on the performance of the machine and deep learning models, the feature set for the abusive comments was made using unigrams and bigrams for the machine learning models and word embeddings for the deep learning models. The comparison of the models’ performances showed that the Random Forest with bigrams achieved the best overall performance with an accuracy of (0.94), a precision of (0.91), a recall of (0.94), and an F1 score of (0.92). The study develops an efficient model to detect severity of abusive language in online platforms, offering important implications both to theory and practice.
Description:	Data Availability: The data used in this work is a public dataset.
URI:	https://bura.brunel.ac.uk/handle/2438/27736
DOI:	https://doi.org/10.1007/s10796-023-10446-x
ISSN:	1387-3326
Other Identifiers:	ORCID iD: Alaa Marshan https://orcid.org/0000-0001-6764-9160 ORCID iD: Farah Nasreen Mohamed Nizar https://orcid.org/0009-0006-5184-1370
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © The Author(s) 2023. Rights and permissions: Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.	2.16 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License