Offensive Language Detection in Social Media Using Transformers and Importance of Pre-training

Home > Archive > 2023 > Volume 13 Number 2 (April 2023) >

IJML 2023 Vol.13(2): 82-87 ISSN: 2010-3700
DOI: 10.18178/ijml.2023.13.2.1133

Beyzanur Saraclar*, Birol Kuyumcu, Selman Delil, and Cuneyt Aksakalli

Manuscript received on August 10, 2021; revised March 22, 2022; accepted April 4, 2023.

Abstract—Being exposed to offensive language on social media platforms is relatively higher because of anonymity and distant self-expression compared to real communication. Billions of contents are shared daily on these platforms, making it impossible to detect offensive posts with manual editorial processes. This situation arises the need for automatic detection of offensive language in social media posts to provide users' online safety. In this paper, we applied different Machine Learning (ML) models on over manually annotated 36,000 Turkish tweets to detect the use of offensive language messages automatically. According to the results, the most successful model for predicting offensive language is pretrained transformer-based ELECTRA model with 0.8216 F-1 score. We also obtained the highest F-1 score with 0.8342 in this dataset up to now by combining transformer-based ELECTRA and BERT models in an ensemble model.

Index Terms—NLP, deep Learning, transformers, offensive language detection

Beyzanur Saraclar, Birol Kuyumcu, Selman Delil and Cuneyt Aksakalli are with the Sefamerve R&D Center Istanbul, Turkey.

[PDF]

Cite: Beyzanur Saraclar*, Birol Kuyumcu, Selman Delil, and Cuneyt Aksakalli, "Offensive Language Detection in Social Media Using Transformers and Importance of Pre-training," International Journal of Machine Learning vol. 13, no. 2, pp. 82-87, 2023.

Copyright @ 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

PREVIOUS PAPER

UNMMIT: A Unified Framework on Unsupervised Multimodal Multi-domain Image-to-Image Translation

NEXT PAPER

A Decision-Making Model Based on Spiking Neural Network (SNN) for Remote Patient Monitoring

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2023 > Volume 13 Number 2 (April 2023) >

Offensive Language Detection in Social Media Using Transformers and Importance of Pre-training

General Information

Article Metrics in Dimensions