Review of Offensive Language Detection on Social Media: Current Trends and Opportunities

Mut Altın, Lütfiye Seda; Saggion, Horacio

doi:10.1007/978-3-031-56728-5_6

Lütfiye Seda Mut Altın¹³ &
Horacio Saggion¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 960))

Included in the following conference series:

International Conference on Emerging Trends and Applications in Artificial Intelligence

Abstract

Offensive language is defined as derogatory or obscene language that has various forms such as hate speech or cyberbullying. Automated detection of offensive language gains traction due to the high and growing scale of social media user input. In this paper, we provide an overview of the field including background and recent research with a focus on natural language processing. We present a synopsis on the ambiguity in definition and categorization of offensive language, application areas of an automated system, shared tasks organized in this field, dataset creation, model evolution in time through machine learning and deep learning algorithms. Finally challenges and gaps in research are discussed.

Download conference paper PDF

Keywords

1 Introduction

At present, the active social media population is reported as more than 4.5 billion worldwide^{Footnote 1}. As the amount of social media content produced by users increases, it emerges the need for better moderation techniques for unwanted content. Therefore, automated detection of offensive text gains a lot of traction focusing on concepts around aggression, hate speech, trolling activities, misogyny, cyberbullying and so on. Offensive language is considered as degrading language that has a negative impact. Examples of offensive (OFF) and not offensive (NOT) texts are given in Table 1, from Semi-Supervised Offensive Language Identification Dataset (SOLID) [54].

There are numerous research and a number of previous reviews on the field of offensive language detection [28, 37, 64]. Advancements in natural language processing also led to improvements and an increase in the variety of research in this field. Use of machine learning and deep learning algorithms for accurate classification of offensive language and further classification of fine-grained types of offense are widely researched. Moreover, creating high-quality datasets to train and test the models, as well as methods for evaluating dataset annotation have been studied.

Table 1. Example texts from SOLID dataset

Full size table

In this paper, we present an overview of the background and current state of offensive language detection on social media. In Sect. 2, we describe our methodology for article search and selection. In Sect. 3, we provide a background around terminology, variations, and definitions, application areas, shared tasks organized on the topic, existing datasets with their differences in classes, and differences on creation steps such as annotation agreement and finally model evolution over time. In Sect. 4, we discuss challenges, gaps, and potential opportunities in the area.

2 Methodology of the Literature Review

While forming the methodology, guidelines from Kitchenham and Templier for writing literature reviews were used for best practice [32, 61]. The following resources were considered to do the search on the topic:

Conference Proceedings: According to conference rankings (by Google Scholar on Computational Linguistics) the following top 3 conferences for the last 2 years were examined: Meeting of the Association for Computational Linguistics (ACL), Conference on Empirical Methods in Natural Language Processing (EMNLP), Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL).

Digital Libraries: Among digital libraries Web of Science has been chosen for key term search since it is considered more reliable and has information on citation. Keyword groups have been searched on the platform and the results of the searches have been collected. Time interval between 2017-01-01 and 2023-05-31 was considered for the recent papers while no time constraint imposed for the background. Furthermore, additional resources have been included for relevant sections including areas other than computer science.

As search terms, firstly, keywords that can describe the offensiveness in various ways have been selected. The European Commission against Racism and Intolerance (ECRI) Glossary is also benefited from for the search keyword selection.^{Footnote 2} These keywords are: “offensive", “hate speech", “racism", “sexism", “cyberbullying". For the conference papers, these keywords were used directly in the search within the titles of the accepted papers. For the digital library search these keywords were used along with complementary terms to help discriminate articles from fields such as social science. For this purpose, the following search term list was applied on Web of Science: 1. “offensive" “text classification" 2. “hate speech" “text classification" 3. “cyberbullying" “detection" 4. “racism" “text classification" 5. “sexism" “text classification" for the recent papers (2017–2022); 6. “offensive" “language detection", 7.“hate speech" “detection" and 8. “cyberbullying" “detection" for a fundamental background. Papers were sorted by number of citations in descending order.

After obtaining the search results, overlapping papers were excluded, publications only written in English have been taken into account and publications on other fields were excluded such as social sciences.

3 Background

3.1 Definition and Variations

Offensive language is defined as the term that is applied to hurtful, derogatory or obscene comments made.^{Footnote 3} Whereas, the United Nations indicates that hate speech is used in common language as loosely referring to “offensive discourse targeting a group or an individual based on inherent characteristics - such as race, religion or gender - and that may threaten social peace".^{Footnote 4}

On the other hand, different terms are used in the literature for automatic text detection to refer to the same concept as offensive language such as aggression [56, 58], toxic [38, 40, 62], abusive [66] or threatening [35] language. Another specific term that is commonly used in the field is cyberbullying. Cyberbullying is a generic term which is defined as “bullying that takes place over digital devices and includes sending, posting or sharing negative, harmful, false, or mean content about someone else".^{Footnote 5}

Furthermore, more specific concepts under the umbrella “hate speech" have been considered. These concepts are usually based on the target group. They include but are not limited to particular concepts such as racism, sexism, homophobia, and so on. Additionally, very few examples of solely ideological hate speech identification towards the right wing in Germany [29]. Figure 1 shows a hierarchical schema of our attempt to clarify the relations of terms around the concept.

Moreover, Wiegand et al. [65] drew attention to the lack of good performance on detection of implicit abusive language (i.e. not conveyed by explicit offensive words) and presented a list of sub-types of implicit abusive language with divide-and-conquer idea behind it. The sub-types they recommended are ‘stereotypes’, perpetrators’ (meaning a person committing an illegal, criminal or evil act), ‘comparisons’ (e.g. “You sing like a dying bird"), ‘dehumanization’ (as the act of perceiving people as less than human, e.g.“I own my wife and her money."), ‘euphemistic constructions’ (e.g.“You inspire my inner serial killer." actually being an equivalent of “I want to kill you."), ‘call for action’ (the author asking something, typically some form of punishment), ‘multimodal abuse’ (i.e. harmful content of a micropost is hidden in the nontextual components or results as an interplay of text and image/video), ‘phenomena requiring world knowledge and inferences’ (as sub-types: jokes, sarcasm, rhetorical question) and finally, other implicit abuse to cover further cases.

3.2 Motivation and Application Areas

The increasing amount of social media input makes human moderation impossible while traditional rule-based (e.g. black lists of words) systems are insufficient to provide good coverage. Therefore the need for an efficient automated detection mechanism gains a lot of traction.

Previous research has shown a strong negative relation between cyberbullying and young people’s mental health [37]. Earlier studies claim that derogatory language aiming at minority groups leads to political radicalization and worsens intergroup interactions [4].

Considering ethical, sociological and psychological impacts, the demand for an efficient mechanism is quite high in various application areas. In private sector, tech companies and user input based platforms want to increase audience engagement and protect their brand by removing unwanted content as efficiently and quickly as possible. The 2018 Content Moderation report indicated that 27% of respondents of the Digital Trust Survey stated that they would stop using a social platform if it continued to allow harmful content.^{Footnote 6} As stated in the Business Journal from the Wharton School of the University of Pennsylvania (Jan, 2022), Facebook alone has committed to allocating 5% of the firm’s revenue, $3.7 billion, to content moderation (please note that it is overall content moderation including text, image, video etc.)^{Footnote 7}.

As Klonick [34] summarized the development of online speech moderation, the major social media platforms such as Facebook and Youtube did not even have clear public policies and community standards until the late 2000s, and yet since then they have been developing and improving the scope and definition of their policies, user feedback mechanisms and internationality of the moderation. Moreover, the increase in the usage of streaming platforms (e.g. Twitch) emphasized the need for real-time content moderation which requires speeds that is not always possible with manual moderation.

All in all, due to its scalability and speed, AI-based content moderation is on an increasing demand, while accuracy remaining the biggest challenge at the moment.

3.3 Shared Tasks

Shared tasks are challenges or competitions organized by research community that enable teams of researchers submit systems to solve specific tasks. Escartin et al., conducted a survey and reported that, in the NLP community, shared tasks are generally celebrated as an important factor for advancement of the field [17]. Among various specific tasks in the field of NLP, from news article similarity^{Footnote 8} to patronizing language detection^{Footnote 9}, identification of various forms of offensive language is quite popular.

As shown in Fig. 2, the main data source of datasets used in the previous shared tasks is Twitter. In terms of the language of the datasets; the most common is English with 11 dataset, followed by Spanish with 7, German and Hindi with 4, Arabic with 3, Italian with 2 and then the others including Bengali, Danish, Greek,Marathi, Turkish, Urdu and Vietnamese with only 1.

In terms of the participant and winner models, it is seen that throughout the years, submissions trained on neural network models have increased compared to non-neural networks. Support Vector Machine (SVM) [12], Logistic Regression, Random Forest, Naive Bayes, Decision Trees were the popular non-neural approaches while recurrent neural network (RNN) [55], convolutional neural network (CNN) [21], long short-term memory (LSTM) [25], bi-LSTM, GRU were the popular deep learning architectures. Also, ensemble classification systems are in general highly preferred among participants. For the tasks on more than one language, some participants also submitted their results with a multilingual approach where they trained their model with multiple languages.

For example, in the earlier tasks such as GermEval, TRAC, EVALITA in 2018, the ratio of total participant submission models were around 48% non-Neural (mostly SVM and Logistic Regression) and 52% Neural networks [6, 36, 67]. Whereas in a recent task, the latest EXIST [52], it is reported that all participants used some kind of transformer-based system except one team; more specifically, majority used Bidirectional Encoder Representations from Transformers (BERT)[16] or versions of BERT including multilingual BERT - mBERT, Spanish version of BERT called BETO, RoBERTa, DeBERTa, multilingual version of RoBERTa called XLM-R or other transformer versions. Also in OSACT (2022), it is reported that the participant teams used different fine-tuned transformer versions such as AraBERT, mBert, XLMRoberta etc. where the highest ranking submissions used an ensemble of different transformers [24].

3.4 Datasets

Reviewing the recent datasets it is seen that as well as the differences in labeling and classification schema, annotation mechanisms have also differences such as different numbers of annotators and evaluation of the agreement between anotators. For example, usually three or more annotators annotated each instance in the datasets. For the annotation agreement calculations Fleiss’sKappa was considered for some datasets [19, 71], Krippendorf’s was used in another one [38]. Moreover, annotator’s profile was also often in consideration. In some datasets it is stated that variety has been maintained among annotators, the details have, however, been kept private. In some datasets it is revealed, as an example, for the Levantine hate speech dataset by Mulki et al., genders of the 3 annotators has been chosen as one male and two females [46].

In terms of the data sources, the most common one appears to be social media platforms such as Twitter due to the limited short text structure. However, also sources such as newspapers and platforms known as more liberal such as ‘gab.com’ have also been considered. Data collection is usually based on certain keywords and hashtags; however at times data is collected based on searches following important events that have gone viral. A different example of data collection strategy is seen from the hate speech dataset by Mubarak et al. [45] which they collected tweets using emojies existed in offensive text which are extracted from previous datasets by Zampieri et al. [72] and Chowdhury et al. [10].

Rosa et al. [53] examined earlier cyberbullying datasets (from 2011 to 2018) and reported that the majority of the datasets are in English, mainly labelled by 3 annotators, with variety of size from 2K to 85K instances and from data sources including not only Twitter, YouTube, Instagram but also Formspring, AskFM, MySpace.

Regardless of which the classes are considered, the majority of the datasets we are aware of have a binary approach to labeling; however Hada et al. created the first dataset based on the degree of offensiveness, where each instance has a score between -1 (maximally positive) and 1 (maximally offensive) [23].

More recently, counter-narrative generation started to be researched as an alternative solution [11, 60, 73]. Counter-narratives are texts that withstand hate speech with fact-bound arguments or alternative viewpoints. Also in dataset creation, examples of hate speech/counter narrative datasets started to emerge [18]. In a recent study, GPT-2 was utilized to generate synthetic training data for the model [47, 50].

Further examples of dataset creation have concentrated on aspects of racial bias. Sap et al. examined annotators’ insensitivity to differences in dialect and showed that when annotators are made explicitly aware of an African-American English tweet’s dialect they are significantly less likely to label the tweet as offensive [57]. Davidson et al. also examined racial bias by training models on different datasets and finally referred that racial bias exists in datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates [13].

The area of research in various languages also keep expanding with new datasets as in most recent Korean, Chinese, Turkish studies [15, 30, 31].

3.5 Methodology and Model Evolution

In general, a detection system starts with pre-processing of the data, splitting the data as train and test (typically as 80% to 20%), feature extraction, model training and finally the classified output.

Pre-processing depends on the data format and involves punctuation removal, stopword removal, tokenization, stemming, lemmatization, smiley and slang conversion.

Features used vary from lexical features such as Bag of Words (BoW), n-grams, use of offensive word dictionaries to syntactic features such as use of identifiers i.e. second person pronouns and user level features [9]. Common techniques for feature extraction includes TF-IDF, BoW, sentiment, Part of Speech (PoS) tag, word embeddings (GloVe [48], Word2Vec [42], FastText [5], ELMO [49]). Jahan et al. reported in a systematic review that the most used features were word embeddings and TF-IDF; additionally as for the word embeddings most commonly used ones were Word2Vec and FastText [28].

Additionally, further features are explored by researchers. Galan et al. proposed an approach for cyberbullying detection based on the idea that every trolling profile is followed by the real profile of the user behind the trolling one and generated the hypothesis that it is possible to link an account of fake profile to the real profile and analyse different features of the profile including text [22]. McGillivray et al., recently draw attention to the fact that the meaning of a word may change over time and proposed a time dependent lexical feature approach, meaning that they applied an algorithm to detect semantic change over a 2 years’ time to detect words whose semantics has changed and either acquired or lost offensiveness [41]. Casavantes et al. used metadata features such as tweet creation time of the day or account age and reported that utilizing metadata gave better results. While doing this, they utilized three different learning models as their consideration being classical (BoW), advanced (Glove) and state-of-the-art (BERT) text representations and reported statistically significant difference with use of metadata [8].

In terms of models, some of the earliear research have utilized WEKA (standard machine learning software developed at University of Waikato) [26] and applied traditional algorithms as in Rezavi et al. [51]. They selected a classifier and created an abusive language dictionary and assigned a weight (1–5) to the entries in it then applied models from multi-level classifiers boosted by the dictionary. Akhter et al., performed a series of experiments and reported that character n-grams outperformed word n-grams [2].

As Rosa et al. noted, earlier research on cyberbullying detection that dates back from 2011 to 2017; detection mechanisms were mainly based on traditional machine learning algorithms such as SVM [53]. Van Hee et al. also showed that Support Vector Machine had better results than baseline systems that are based on keywords and word unigrams [63]. Similarly, on hate speech specific detection SVM has commonly been experimented [39]. Fortuna et al. also stated in their review back in 2018 that most chosen algorithms in the hate speech detection as traditional algorithms being the most frequent as SVM and followed by Random Forest, Decision Tree, Logistic Regression and Naive Bayes [20].

More recently, deep neural network models gained great attention [1]. In 2017, Badjatiya et al. reported that deep learning methods significantly outperformed state-of-the-art char/word n-gram methods on hate speech detection utilizing CNN, LSTM and FastText [3]. Mozafari et al. obtained high F1 scores on hate speech detection with their transfer learning approach including BERT [16] and CNN [44]. In their systematic review on hate speech, Jahan et al. identified 96 documents with deep learning approaches and noted that among deep learning algorithms, BERT is the most commonly used one with 38% although it has been released quite recently, then LSTM, CNN, bi-LSTM, GRU and combination of these followed respectively by the percentage of usage [28].

Lately, language generation has also been involved in hate speech detection. Chung et al. proposed methods for improving counter narrative generation for hate speech detection [11]. Another example is that Wullach et al. used GPT for generating synthetic hate speech data from available labeled examples [68]. Depending on the dataset size and class distribution, data augmentation is commonly utilized on imbalanced datasets to improve performance and prevent issues such as overfitting. Ilan et al. reported they improved the performance with an augmentation method that they introduced as input real unlabelled data unlike real labeled or synthetic data (using a generative model), which their approach made use of online platforms in which people are specifically asked to be bullying (such as subreddit r/RoastMe/ platform) [27]. Another recent data oriented approach was reported by Yang et al. in which they asked people to generate offensive arguments deliberately aiming to be less sensitive to lexical overlap [70]. With researchers drawing attention to the scarcity of labelled data, Sarracén et al. presented a study in which their model was composed of Convolutional Graph Neural Network (GNN) and reported that it performed better then state-of-the-art models on small datasets [14]. Besides, Tanvir et al. reported that with their GAN-BERT approach, they obtained a promising result with a small sized dataset in Bengali language [59]. Breazzano et al. experimented a transformer-based architecture combining BERT with multi-task and generative adversarial learning (MT-GAN-BERT) for six different abusive language classification tasks enabling semi-supervised learning and reported conclusions as decrease in computational costs without a considerable decrease in prediction quality [7].

From a more generic perspective, Minaee et al. reviewed deep learning approaches for various text classification tasks through more than 150 deep learning models and 40 datasets. They emphasized the fast progress on text classification over the recent years thanks to contributions such as neural embedding, attention mechanism, self attention, transformer as well as showing again that deep learning models resulted in significant improvements compared to non-deep learning models. They also discussed that to choose the best neural network for a classification task, there is not one single solution and it varies depending on things such as nature of the domain, application areas, availability of the labels [43].

4 Discussion and Conclusions

In this paper, we presented an overview of detection of offensive language in social media text by natural language processing. We tried to bring light on the ambiguity in terminology and classification along with different data classes given in the datasets provided through shared tasks that accepts many experiment inputs each.

For this review we have not taken into account the research that considers multi-modals such as including image and video into consideration along with text and author metadata.

4.1 Challenges

As in the ’garbage in garbage out’ principle, issues around data constitutes an important part of the challenges in the field [64]. Complexity of the definition as mentioned in previous sections sometimes causes ambiguity in dataset classes, annotation and combining similar datasets. Moreover, due to the nature of social media, most of the text contains slang, variety of smileys and grammatically incorrect sentences therefore they are hard to predict structures. In addition, context switch, use of different dialects, not enough available data source for all languages are also other language related challenges.

Furthermore, as social media is our main consideration; social media entries are subject to change in time relatively quick. For example, a new term or an acronym which did not exist in a language before or exist with neutral or positive meaning before, might emerge in a new incident such as a new released television advertisement or a political scandal or a viral video and after that people might start using it with a secondary negative meaning.

4.2 Gaps in the Research

In spite of the traction on the field, some of the gaps in the research can be identified as follows:

Although there are numerous datasets and experiments on different languages, the majority of the research is on English language. However, we are not aware of any research with comparison of the perception on ‘native speaker’ and ‘non-native speaker’ point of view.
The amount of research on more particular topics such as migration or refugee status, disability etc. is relatively low in comparison to more generic classification such as offensive/non-offensive. There is an opportunity to create datasets and work on classification on more specialized areas of hate speech.
Even though recently there are more research on multi-lingual classification, still the research so far is limited and there is opportunity to study languages other than English and research on multi-lingual models.
Dataset sources are mostly social media user input sometimes along with user metadata and there are very few examples of other sources such as: Song lyrics, movie or TV show dialogs (sub-titles) (Informative documents such as Wikipedia are not in consideration).
Images and videos are important parts of social media. For instance, Instagram users share over one million memes daily^{Footnote 10}. However, research that combines text with image or video is quite limited. Although there has been some research, such as (Yang et al. 2019)’s multi-modal [69] and (Kiela et al. 2020)’s ‘Hateful Meme Challenge’ [33] there is still opportunity on this aspect.

Notes

References

Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2019. LNCS, vol. 10772, pp. 141–153. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-319-76941-7_11
Chapter Google Scholar
Akhter, M.P., Jiangbin, Z., Naqvi, I.R., Abdelmajeed, M., Sadiq, M.T.: Automatic detection of offensive language for urdu and roman urdu. IEEE Access 8, 91213–91226 (2020)
Article Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760 (2017)
Google Scholar
Bilewicz, M., Soral, W.: Hate speech epidemic. The dynamic effects of derogatory language on intergroup relations and political radicalization. Polit. Psychol. 41, 3–33 (2020)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Bosco, C., Felice, D., Poletto, F., Sanguinetti, M., Maurizio, T.: Overview of the evalita 2018 hate speech detection task. In: EVALITA 2018-Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, vol. 2263, pp. 1–9. CEUR (2018)
Google Scholar
Breazzano, C., Croce, D., Basili, R.: Multi-task and Generative Adversarial Learning for Robust and Sustainable Text Classification. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds.) AIxIA 2021. LNCS, vol. 13196, pp. 228–244. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-08421-8_16
Chapter Google Scholar
Casavantes, M., Aragón, M.E., González, L.C., Montes-y Gómez, M.: Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in twitter. J. Intell. Inf. Syst. 61, 519–539 (2023)
Article Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, pp. 71–80. IEEE (2012)
Google Scholar
Chowdhury, S.A., Mubarak, H., Abdelali, A., Jung, S., Jansen, B.J., Salminen, J.: A multi-platform arabic news comment dataset for offensive language detection. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 6203–6212 (2020)
Google Scholar
Chung, Y.L., Tekiroglu, S.S., Guerini, M.: Towards knowledge-grounded counter narrative generation for hate speech. arXiv preprint arXiv:2106.11783 (2021)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article Google Scholar
Davidson, T., Bhattacharya, D., Weber, I.: Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516 (2019)
De la Peña Sarracén, G.L., Rosso, P.: Convolutional graph neural networks for hate speech detection in data-poor settings. In: Natural Language Processing and Information Systems: 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, Valencia, Spain, June 15–17, 2022, Proceedings, pp. 16–24. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-08473-7_2
Deng, J., et al.: Cold: a benchmark for chinese offensive language detection. arXiv preprint arXiv:2201.06025 (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Escartín, C.P., Lynn, T., Moorkens, J., Dunne, J.: Towards transparency in nlp shared tasks. arXiv preprint arXiv:2105.05020 (2021)
Fanton, M., Bonaldi, H., Tekiroglu, S.S., Guerini, M.: Human-in-the-loop for data collection: a multi-target counter narrative dataset to fight online hate speech. arXiv preprint arXiv:2107.08720 (2021)
Fortuna, P., da Silva, R.R., Wanner, L., Nunes, S., et al.: A hierarchically-labeled Portuguese hate speech dataset. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 94–104 (2019)
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018)
Article Google Scholar
Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Amari, S., Arbib, M.A. (eds.) Competition and Cooperation in Neural Nets, pp. 267–285. Springer, Heidelberg (1982). https://doi.org/10.1007/978-3-642-46466-9_18
Galán-García, P., de la Puerta, J.G., Gómez, C.L., Santos, I., Bringas, P.G.: Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying. Logic J. IGPL 24(1), 42–53 (2016)
Google Scholar
Hada, R., Sudhir, S., Mishra, P., Yannakoudakis, H., Mohammad, S.M., Shutova, E. Ruddit: norms of offensiveness for english reddit comments. arXiv preprint arXiv:2106.05664 (2021)
Hassan, S., Samih, Y., Mubarak, H., Abdelali, A., Rashed, A., Chowdhury, S.A.: Alt submission for osact shared task on offensive language detection. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pp. 61–65 (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Holmes, G., Donkin, A., Witten, I.H.: Weka: a machine learning workbench. In: Proceedings of ANZIIS’94-Australian New Zealand Intelligent Information Systems Conference, pp. 357–361. IEEE (1994)
Google Scholar
Ilan, T., Vilenchik, D.: Harald: augmenting hate speech data sets with real data. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp.2241–2248 (2022)
Google Scholar
Jahan, M.S., Oussalah, M.: A systematic review of hate speech automatic detection using natural language processing. arXiv preprint arXiv:2106.00742 (2021)
Jaki, S., De Smedt, T.: Right-wing german hate speech on twitter: analysis and automatic detection. arXiv preprint arXiv:1910.07518 (2019)
Jeong, Y., et al.: Kold: Korean offensive language dataset. arXiv preprint arXiv:2205.11315 (2022)
Karayiğit, H., Akdagli, A., Aci, Ç.İ: Homophobic and hate speech detection using multilingual-bert model on Turkish social media. Inf. Technol. Control 51(2), 356–375 (2022)
Article Google Scholar
Keele, S., et al.: Guidelines for performing systematic literature reviews in software engineering. Technical report, ver. 2.3 ebse technical report. ebse (2007)
Google Scholar
Kiela, D., et al.: The hateful memes challenge: competition report. In: NeurIPS 2020 Competition and Demonstration Track, pp. 344–360. PMLR (2021)
Google Scholar
Klonick, K.: The new governors: the people, rules, and processes governing online speech. Harv. L. Rev. 131, 1598 (2017)
Google Scholar
Kumar, A., Saumya, S., Roy, P.K.: Abusive and threatening language detection from urdu social media posts: a machine learning approach (2021)
Google Scholar
Kumar, R., Ojha, A.K., Malmasi, S., Zampieri, M.: Benchmarking aggression identification in social media. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 1–11 (2018)
Google Scholar
Kwan, I., et al.: Cyberbullying and children and young people’s mental health: a systematic map of systematic reviews. Cyberpsychol. Behav. Soc. Netw. 23(2), 72–82 (2020)
Article Google Scholar
Leite, J.A., Silva, D.F., Bontcheva, K., Scarton, C.: Toxic language detection in social media for brazilian portuguese: new dataset and multilingual analysis. arXiv preprint arXiv:2010.04543 (2020)
MacAvaney, S., Yao, H.-R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14(8), e0221152 (2019)
Article Google Scholar
Makhnytkina, O., Matveev, A., Bogoradnikova, D., Lizunova, I., Maltseva, A., Shilkina, N.: Detection of toxic language in short text messages. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS, pp. 315–325. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-60276-5_31
Chapter Google Scholar
McGillivray, B., et al.: Leveraging time-dependent lexical features for offensive language detection. In: Proceedings of the 1st Workshop of Ever Evolving NLP, EMNLP 2022 (2022)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)
Article Google Scholar
Mozafari, N., Farahbakhsh, R., Crespi, N.: A bert-based transfer learning approach for hate speech detection in online social media. In: Cherifi, H., Gaito, S., Mendes, J., Moro, E., Rocha, L. (eds.) COMPLEX NETWORKS 2019, vol. 881, pp. 928–940. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-36687-2_77
Chapter Google Scholar
Mubarak, H., Al-Khalifa, H., Al-Thubaity, A.M.: Overview of osact5 shared task on arabic offensive language and hate speech detection. In: Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, pp. 162–166 (2022)
Google Scholar
Mulki, H., Haddad, H., Ali, C.B., L-hsab, H.A.: A levantine twitter dataset for hate speech and abusive language. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 111–118 (2019)
Google Scholar
Nouri, N.: Data augmentation with dual training for offensive span detection. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2569–2575 (2022)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, vol. 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics (2018)
Google Scholar
Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Razavi, A.H., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Farzindar, A., Keselj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 16–27. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13059-5_5
Chapter Google Scholar
Rodríguez-Sánchez, F., et al.: Overview of exist 2022: sexism identification in social networks. Procesamiento del Lenguaje Natural 69, 229–240 (2022)
Google Scholar
Rosa, H., et al.: Automatic cyberbullying detection: a systematic review. Comput. Human Behav. 93, 333–345 (2019)
Article Google Scholar
Rosenthal, S., Atanasova, P., Karadzhov, G., Zampieri, M., Nakov, P.: Solid: a large-scale semi-supervised dataset for offensive language identification. arXiv preprint arXiv:2004.14454 (2020)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Sadiq, S., Mehmood, A., Ullah, S., Ahmad, M., Choi, G.S., On, B.W.: Aggression detection through deep neural model on twitter. Future Gener. Comput. Syst. 114, 120–129 (2021)
Article Google Scholar
Sap, M., Card, D., Gabriel, S., Choi, Y., Smith, N.A.: The risk of racial bias in hate speech detection. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1668–1678 (2019)
Google Scholar
Si, S., Datta, A., Banerjee,S., Naskar, S.K.: Aggression detection on multilingual social media text. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5. IEEE (2019)
Google Scholar
Tanvir, R., et al.: A gan-bert based approach for bengali text classification with a few labeled examples. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodriguez, S. (eds.) DCAI 2022. LNCS, vol. 583, pp. 20–30. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-20859-1_3
Chapter Google Scholar
Tekiroglu, S.S., Chung, Y.L., Guerini, M.: Generating counter narratives against online hate speech: data and strategies. arXiv preprint arXiv:2004.04216 (2020)
Templier, M., Paré, G.: A framework for guiding and evaluating literature reviews. Commun. Assoc. Inf. Syst. 37(1), 6 (2015)
Google Scholar
Van Aken, B., Risch, J., Krestel, R., Löser, A.: Challenges for toxic comment classification: an in-depth error analysis. arXiv preprint arXiv:1809.07572 (2018)
Van Hee, C., Jacobs, G., Emmery, C., Desmet, B., Lefever, E., Verhoeven, B., De Pauw, G., Daelemans, W., Hoste, V.: Automatic detection of cyberbullying in social media text. PLoS ONE 13(10), e0203794 (2018)
Article Google Scholar
Vidgen, B., Derczynski, L.: Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLoS ONE 15(12), e0243300 (2020)
Article Google Scholar
Wiegand, M., Ruppenhofer, J., Eder, E.: Implicitly abusive language–what does it actually look like and why are we not getting there? In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 576–587. Association for Computational Linguistics (2021)
Google Scholar
Wiegand, M., Ruppenhofer, J., Kleinbauer, T.: Detection of abusive language: the problem of biased datasets. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, vol. 1 (long and short papers), pp. 602–608 (2019)
Google Scholar
Wiegand, M., Siegel, M., Ruppenhofer, J.: Overview of the germeval 2018 shared task on the identification of offensive language. In: Overview of the Germeval 2018 Shared Task on the Identification of Offensive Language (2018)
Google Scholar
Wullach, T., Adler, A., Minkov, E.: Fight fire with fire: fine-tuning hate detectors using large samples of generated hate speech. arXiv preprint arXiv:2109.00591 (2021)
Yang, F., et al.: Exploring deep multimodal fusion of text and photo for hate speech classification. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 11–18 (2019)
Google Scholar
Yang, K., Jang, W., Cho, W.I.: Apeach: attacking pejorative expressions with analysis on crowd-generated hate speech evaluation datasets. arXiv preprint arXiv:2202.12459 (2022)
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666 (2019)
Zampieri, N., et al. Semeval-2020 task 12: multilingual offensive language identification in social media (offenseval 2020). arXiv preprint arXiv:2006.07235 (2020)
Zhu, W., Bhat, S.: Generate, prune, select: a pipeline for counterspeech generation against online hate speech. arXiv preprint arXiv:2106.01625 (2021)

Download references

Acknowledgments

We acknowledge support from the Maria de Maeztu Units of Excellence Programme CEX2021-001195-M, funded by MICIU/AEI /10.13039/50110001103 and with the support from Departament de Recerca i Universitats de la Generalitat de Catalunya (ajuts SGR-Cat 2021).

Author information

Authors and Affiliations

Department of Information and Communication Technologies, Pompeu Fabra University, C/Tànger, 122, 08018, Barcelona, Spain
Lütfiye Seda Mut Altın & Horacio Saggion

Authors

Lütfiye Seda Mut Altın
View author publications
You can also search for this author in PubMed Google Scholar
Horacio Saggion
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lütfiye Seda Mut Altın .

Editor information

Editors and Affiliations

Ingenium Research Group, University of Castilla-La Mancha, Ciudad Real, Spain
Fausto Pedro García Márquez
National University of Computer and Emerging Sciences, Islamabad, Pakistan
Akhtar Jamil
Department of Computer Engineering, Istinye University, Istanbul, Türkiye
Alaa Ali Hameed
Ingenium Research Group, University of Castilla-La Mancha (UCLM), Ciudad Real, Spain
Isaac Segovia Ramírez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mut Altın, L.S., Saggion, H. (2024). Review of Offensive Language Detection on Social Media: Current Trends and Opportunities. In: García Márquez, F.P., Jamil, A., Hameed, A.A., Segovia Ramírez, I. (eds) Emerging Trends and Applications in Artificial Intelligence. ICETAI 2023. Lecture Notes in Networks and Systems, vol 960. Springer, Cham. https://doi.org/10.1007/978-3-031-56728-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-56728-5_6
Published: 30 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56727-8
Online ISBN: 978-3-031-56728-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Review of Offensive Language Detection on Social Media: Current Trends and Opportunities