Home > Archive > 2023 > Volume 13 Number 2 (April 2023) >
IJML 2023 Vol.13(2): 82-87 ISSN: 2010-3700
DOI: 10.18178/ijml.2023.13.2.1133

Offensive Language Detection in Social Media Using Transformers and Importance of Pre-training

Beyzanur Saraclar*, Birol Kuyumcu, Selman Delil, and Cuneyt Aksakalli

Manuscript received on August 10, 2021; revised March 22, 2022; accepted April 4, 2023.

Abstract—Being exposed to offensive language on social media platforms is relatively higher because of anonymity and distant self-expression compared to real communication. Billions of contents are shared daily on these platforms, making it impossible to detect offensive posts with manual editorial processes. This situation arises the need for automatic detection of offensive language in social media posts to provide users' online safety. In this paper, we applied different Machine Learning (ML) models on over manually annotated 36,000 Turkish tweets to detect the use of offensive language messages automatically. According to the results, the most successful model for predicting offensive language is pretrained transformer-based ELECTRA model with 0.8216 F-1 score. We also obtained the highest F-1 score with 0.8342 in this dataset up to now by combining transformer-based ELECTRA and BERT models in an ensemble model.

Index Terms—NLP, deep Learning, transformers, offensive language detection

Beyzanur Saraclar, Birol Kuyumcu, Selman Delil and Cuneyt Aksakalli are with the Sefamerve R&D Center Istanbul, Turkey.

[PDF]

Cite: Beyzanur Saraclar*, Birol Kuyumcu, Selman Delil, and Cuneyt Aksakalli, "Offensive Language Detection in Social Media Using Transformers and Importance of Pre-training," International Journal of Machine Learning vol. 13, no. 2, pp. 82-87, 2023.

Copyright @ 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

 

General Information

  • E-ISSN: 2972-368X
  • Abbreviated Title: Int. J. Mach. Learn.
  • Frequency: Quaterly
  • DOI: 10.18178/IJML
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals LibraryCNKI.
  • E-mail: ijml@ejournal.net


Article Metrics in Dimensions