Long Short-Term Memory for Hate Speech and Abusive Language Detection on Indonesian Youtube Comment Section

WCSE 2021
ISBN: 978-981-18-1791-5 DOI: 10.18178/wcse.2021.06.029

Calvin Erico Rudy Salim, Derwin Suhartono

Abstract— Hate speech is one of the most challenging problem internet is facing today. With increasing numbers of users online, hate speech also rise and takes time to be classified manually particularly in languages other than English. This research examines hate speech detection problem in form of Bahasa Indonesia. Millions of comments and text posts are added to various social media and discussion platforms. Manual classification in all of the internet as hate speech and offensive language is a near impossible and time-consuming task. This research uses Long Short-Term Memory (LSTM) and Bidirectional Long Short Term Memory (Bi-LSTM) for the method of classifying hate speech and abusive language. The final accuracy is 88,44% by using 200 neurons with Bi-LSTM method. Most common challenges are different languages, out of vocabulary words, long range dependencies, and sarcasm

Index Terms— hate speech, machine learning, natural language processing

Calvin Erico Rudy Salim
Computer Science Department, Bina Nusantara University, INDONESIA
Derwin Suhartono
School of Computer Science, Bina Nusantara University, INDONESIA

[Download]

Cite: Calvin Erico Rudy Salim, Derwin Suhartono , "Long Short-Term Memory for Hate Speech and Abusive Language Detection on Indonesian Youtube Comment Section ," 2021 The 11th International Workshop on Computer Science and Engineering (WCSE 2021), pp. 193-200, Shanghai, China, June 19-21, 2021.

PREVIOUS PAPER
Cities Talking – A Social Media Platform: Crowdsourcing with Sentiment Analysis and Recommender System

NEXT PAPER
Beast Chasers: A 3D PC-based Third Person Action RPG Game used to Spread Societal Issue Awareness