key: cord-0666257-blsgzofv authors: Shekhar, Chander; Bagla, Bhavya; Maurya, Kaushal Kumar; Desarkar, Maunendra Sankar title: Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi Posts date: 2021-01-15 journal: nan DOI: nan sha: 1862e6e608561f24d5dd5b0acd0a3c0839e625bb doc_id: 666257 cord_uid: blsgzofv As the reach of the internet increases, pejorative terms started flooding over social media platforms. This leads to the necessity of identifying hostile content on social media platforms. Identification of hostile contents on low-resource languages like Hindi poses different challenges due to its diverse syntactic structure compared to English. In this paper, we develop a simple ensemble based model on pre-trained mBERT and popular classification algorithms like Artificial Neural Network (ANN) and XGBoost for hostility detection in Hindi posts. We formulated this problem as binary classification (hostile and non-hostile class) and multi-label multi-class classification problem (for more fine-grained hostile classes). We received third overall rank in the competition and weighted F1-scores of ~0.969 and ~0.61 on the binary and multi-label multi-class classification tasks respectively. During coronavirus lockdown, number of active internet users across the globe has increased rapidly. The government enforced lockdown, which pushed people to stay indoors and thus, increased the engagement with social media platforms like Facebook, Twitter, Instagram, Whatsapp, etc. This led to increased hostile posts over social media, including cyberbullying, trolling, spreading hate, death threat, etc. A major challenge for the common users in the digital space is to identify misinformation (aka fake news) in online content. In addition to that, according to a recent survey 2 , on Twitter, there has been a 900% increase in hate speech directed towards Chinese people and 200% increase in the traffic to Hate sites and posts written against the Asian community. It is also found 3 that the percentage of non-English tweets in India has jumped up by 50%. This inspires a necessity of research in hostility detection in posts written in low resource but widely-used languages like Hindi. As billions of posts appear each day on social media and anti-social elements can get full anonymity while expressing hostile behavior over the internet, identification of authorized information should have a reliable automation system. Even though Hindi is the third most spoken language globally, it is considered a low-resource language due to the unavailability of accurate tools and suitable datasets for various tasks in Hindi. This motivates us to take the task of hostility detection of Hindi posts on Social media. We view the hostility detection as a two-stage process. First, a Coarse-grained classification is done to mark a post as hostile or non-hostile. If the post is detected as hostile, then the second stage performs a more fine-grained classification of hostile classes. We briefly define these two terms: 1. Coarse-grained classification: It is a binary classification problem in which each post is categorised as hostile or non-hostile. 2. Fine-grained classification: It is a multi-label multi-class classification of the hostile classes. Each hostile post belongs to one or more of the following categories: fake news, hate speech, offensive and defamation. In our proposed approach, we leverage the pre-trained multilingual BERT (mBERT) [6] 4 for input post representation and further these representations are used as input for Artificial Neural Network (aka ANN) and other ML learning models for binary and multi-label multi-class classification problems. The base architecture of mBERT is the same as BERT. BERT and mBERT have proven as the state of the art models across multiple NLU and NLG tasks. The rest of this paper is organized in the following way: Section 2 presents related work; Section 3 gives an in-depth explanation of our model. Section 4 presents the experimental setup. Section 5 provides results and our analysis. Finally, we provide conclusions and directions for future work in Section 6. A hostility detection dataset in Hindi was presented in [1] . The authors develop their hostility detection models using traditional machine learning algorithms, i.e., Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Logistic Regression (LR), on top of the post embeddings extracted using pretrained m-BERT. Multiple profound researchers have tried to tackle this problem of hostility detection in the past few years [7] , [2] , etc. However, their research was limited to the English Language. Some aspects of detecting hate speech that was limited to racism and sexism using character n-gram were discussed in [7] . In [2] , the authors use crowd-sourcing to collect lexicon for hateful/offensive language and annotate the tweets as hate, offensive or none. Their model was based on keywords. They were able to detect hateful/offensive language if the text contains hate words explicitly. There has been several recent advancements for hostility/toxicity detection in non-English languages, such as Brazilian Portuguese [5] , Hindi [4] , Bengali [3] ), etc. These works focus on generating new annotated dataset in a particular language. They also provide benchmarks for the datasets to facilitate further research in these languages. A flow diagram for our coarse-grained and fine-grained classification techniques is included in figure-1. It consists of three stages: Pre-processing, embeddings extraction, and classification model. In general, tweets and online posts contain unstructured language/sentence, in which each token consists of meaningful information. Removal of any token may distort the meaning of the sentence. The dataset consists of tokens like emojis, hashtags, URLs, etc. Experimental results show that emojis and hashtags are essential features for hostility detection, whereas URLs are not. We manually removed all the URLs in the pre-processing step. The pre-processed posts are tokenized using the mBERT-tokenizer. We choose input post length threshold as 128 and truncate (or pad) the post if it is longer (or shorter). We extracted Post embedding in two ways: 1. Raw Representation: The pre-processed and tokenized input post was fed through the mBERT [6] model. We obtained [CLS] representation from mBERT as final post representation of dimension 768. The extracted Raw Representation of each post were used to train ANN model for both the classification tasks. Once the model is trained we performed single forward pass of Raw Representations to obtain Fine-tuned Representation. An architectural diagram of proposed model is presented in figure-2. In another setting, we extracted the Fine-tuned Representation from ANN, which acts as an input for XGboost classifier. We trained XGboost classifier for both binary and multi-label class classification task to obtain final label distributions. Ensemble Model: We propose a novel ensemble model to effectively combine existing model's output. Let's assume that there are n posts in the test/validation dataset and m existing models with outputs Where O ij is multi-hot vector output from i th model for j th post. The multi-hot vector consists of binary value for all classes (i.e., non-hostile, fake, hate, offensive and defamation). We also obtain fine-grained f1-scores from all m models as f f Note that f f 1 i is computed on validation dataset and same values were used for test data (as test data label are not available for f f 1 i computation). The ensembled multi-hot outputÔ ij is obtained in the following way: 4 Experimental Set-up We are using a dataset released by [1] . The dataset is a collection of ∼ 8200 social media posts in Hindi. Each post belongs to either hostile (i.e., fake, defamation, hate and offensive) or non-hostile class, where each hostile post can have more than one hostile label. [1] defines the classes as follows: -Fake News: A claim or information that is verified to be not true. The Weighted F1-score is our primary metric to evaluate the proposed models. For coarse-grained evaluation, weighted F1-score is computed for hostile and non-hostile classes, whereas for fine-grained evaluation, it is computed across four hostile classes. Weighted F1-Score is the weighted average of F1-Scores of all the classes by support (the number of true instances for each label). We also report individual F1-score for all hostile classes (see Tables 2 and 3 ). We obtained baseline models from [1] . Similar to our approach, authors first extracted the post representation from mBERT and applied following algorithms: Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Logistic Regression (LR) for both the tasks. They reported weighted f1-score on the validation dataset. For ANN classifier, we use AdamW Optimizer with weight decay as 0.001, input size 128, learning rate as 0.00003, one hidden layer with 256 neurons, and the 'ReLU' activation function. After generating sentence embeddings, we apply a dropout to avoid overfitting. Binary cross-entropy is used as the loss function with logits. For the XGBoost classifier, the learning objective is set as 'binary: logistic' and the max depth as 4. In the competition, we submitted our best five results which are variant of the proposed models (see section-3. Table 2 and Table 3 summarize our results on the validation and test dataset, respectively submitted to the competition. We observed an almost similar trend of f1-scores across the test and validation dataset. We found that the best f1scores for different labels are distributed across different submissions. Though, submission results across different submissions have minor changes. Coarsegrained classification received the highest score as this is only a binary classification set-up. On weighted Fine-grained F1-Score, submission-4 received the best score, which is close to the ensemble method. Our model consistently outperformed all the baseline models on validation dataset. The baseline results are not available for test dataset. The comparison results are included in Table- 4. The defamation class poses lowest score mostly because it consists of minimal number of training points. Except, defamation class, all our proposed models consistently outperformed all the baselines across all the categories by large margin. The f1-score is highest for the fake label in hostile categories because fake has highest number of training points (see Table- 1). We can infer that the the model performance is heavily dependent on the number of training points even with large multi-lingual pre-trained models like mBERT. Table 5 includes the effect of Raw Representation and Fine-tuned Representation across coarse-grained and fine-grained tasks with XGBoost (XGB) classifier. It highlights the importance of Fine-tuned Representation, as results improved with this. For fine-grained classification, the absolute f1-score improved by 0.24 by using fine-tuned representations. In this paper, we presented a novel and simple ensemble based architecture for hostility detection in Hindi posts. We develop multiple architectures based on mBERT and different classifiers (i.e., ANN and XGBoost). Our proposed model outperformed all the existing baselines and secured 3rd rank in the competition. In the future, we will try to tackle class imbalance using some weighting strategies during model training. Amitava Das, and Tanmoy Chakraborty. 2020. Hostility Detection Dataset in Hindi Automated Hate Speech Detection and the Problem of Offensive Language BanFakeNews: A Dataset for Detecting Fake News in Bangla DHOT-Repository and Classification of Offensive Tweets in the Hindi Language Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter XGBoost Aggression and Misogyny Detection using BERT: A Multi-Task Approach