key: cord-0204053-tl59kdgl
authors: Pramanick, Shraman; Dimitrov, Dimitar; Mukherjee, Rituparna; Sharma, Shivam; Akhtar, Md. Shad; Nakov, Preslav; Chakraborty, Tanmoy
title: Detecting Harmful Memes and Their Targets
date: 2021-09-24
journal: nan
DOI: nan
sha: 2b76a4562661e9d50c9e397a26a37eba49af63b8
doc_id: 204053
cord_uid: tl59kdgl

Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without appropriate contexts, off-the-shelf multimodal models may not be adequate to understand their underlying semantics. In this work, we propose two novel problem formulations: detecting harmful memes and the social entities that these harmful memes target. To this end, we present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19. Each meme went through a rigorous two-stage annotation process. In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to: individual, organization, community, or society/general public/other. The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks. We further discuss the limitations of these models and we argue that more research is needed to address these problems.

The growing popularity of social media has led to the rise of multimodal content as a way to express ideas and emotions. As a result, a brand new type of message was born: meme. A meme is typically formed by an image and a short piece of text on top of it, embedded as part of the image. Memes are typically innocent and designed to look funny. Over time, memes started being used for harmful purposes in the context of contemporary political and socio-cultural events, targeting individuals, groups, businesses, and society as a whole. At the same time, their multimodal nature and often camouflaged semantics make their analysis highly challenging (Sabat et al., 2019) .

Meme analysis. The proliferation of memes online and their increasing importance have led to a growing body of research on meme analysis (Sharma et al., 2020a; Reis et al., 2020; . It has also been shown that off-the-shelf multimodal tools may be inadequate to unfold the underlying semantics of a meme as (i) memes are often context-dependent, (ii) the visual and the textual content are often uncorrelated, and (iii) meme images are mostly morphed, and the embedded text is sometimes hard to extract using standard OCR tools (Bonheme and Grzes, 2020) .

The dark side of memes. Recently, there has been a lot of effort to explore the dark side of memes, e.g., focusing on hate (Kiela et al., 2020) and offensive (Suryawanshi et al., 2020) memes. However, the harm a meme can cause can be much broader. For instance, the meme 1 in Figure 1c is neither hateful nor offensive, but it is harmful to the media shown on the top left (ABC, CNN, etc.), as it compares them to China, suggesting that they adopt strong censorship policies. In short, the scope of harmful meme detection is much broader, and it may encompass other aspects such as cyberbullying, fake news, etc. Moreover, harmful memes have a target (e.g., news organization such as ABC and CNN in our previous example), which requires separate analysis not only to decipher their underlying semantics, but also to help with the explainability of the detection models. Our contributions. In this paper, we study harmful memes, and we formulate two problems. Problem 1 (Harmful meme detection): Given a meme, detect whether it is very harmful, partially harmful, or harmless. Problem 2 (Target identification of harmful memes): Given a harmful meme, identify whether it targets an individual, an organization, a community/country, or the society/general public/others. To this end, we develop a novel dataset, HarMeme, containing 3, 544 real memes related to COVID-19, which we collected from the web and carefully annotated. Figure 1 shows several examples of memes from our collection, whether they are harmful, as well as the types of their targets. We prepare detailed annotation guidelines for both tasks. We further experiment with ten state-of-the-art unimodal and multimodal models for benchmarking the two problems. Our experiments demonstrate that a systematic combination of multimodal signals is needed to tackle these problems. Interpreting the models further reveals some of the biases that the best multimodal model exhibits, leading to the drop in performance. Finally, we argue that off-the-shelf models are inadequate in this context and that there is a need for specialized models Our contributions can be summarized as follows:

• We study two new problems: (i) detecting harmful memes and (ii) detecting their targets.

• We release a new benchmark dataset, HarMeme, developed based on comprehensive annotation guidelines.

• We perform initial experiments with state-ofthe-art textual, visual, and multimodal models to establish the baselines. We further discuss the limitations of these models.

Reproducibility. The full dataset and the source code of the baseline models are available at

The appendix contains the values of the hyperparameters and the detailed annotation guidelines.

Below, we present an overview of the datasets and the methods used for multimodal meme analysis.

Hate speech detection in memes. Sabat et al. (2019) developed a collection of 5, 020 memes for hate speech detection. Similarly, the Hateful Memes Challenge by Facebook introduced a dataset consisting of 10k+ memes, annotated as hateful or non-hateful (Kiela et al., 2020) . The memes were generated artificially, so that they resemble real ones shared on social media, along with "benign confounders." As part of this challenge, an array of approaches with different architectures and features have been tried, including Visual BERT, ViLBERT, VLP, UNITER, LXMERT, VILLA, ERNIE-Vil, Oscar and other Transformers (Vaswani et al., 2017; Li et al., 2019; Tan and Bansal, 2019; Su et al., 2020; Yu et al., 2021; Li et al., 2020; Lippe et al., 2020; Zhu, 2020; Muennighoff, 2020) . Other approaches include multimodal feature augmentation and crossmodal attention mechanism using inferred image descriptions (Das et al., 2020; Sandulescu, 2020; Zhou and Chen, 2020; Atri et al., 2021) , as well as up-sampling confounders and loss re-weighting to complement multimodality (Lippe et al., 2020) , web entity detection along with fair face classification (Karkkainen and Joo, 2021) from memes (Zhu, 2020) , cross-validation ensemble learning and semi-supervised learning (Zhong, 2020) to improve robustness.

Meme sentiment/emotion analysis. Hu and Flaxman (2018) developed the TUMBLR dataset for emotion analysis, consisting of image-text pairs along with associated tags, by collecting posts from the TUMBLR platform. Thang Duong et al. (2017) prepared a multimodal dataset containing images, titles, upvotes, downvotes, #comments, etc., all collected from Reddit. Recently, SemEval-2020 Task 9 on Memotion Analysis (Sharma et al., 2020a) introduced a dataset of 10k memes, annotated with sentiment, emotions, and emotion intensity. Most participating systems in this challenge used fusion of visual and textual features computed using models such as Inception, ResNet, CNN, VGG-16 and DenseNet for image representation (Morishita et al., 2020; Sharma et al., 2020b; Yuan et al., 2020) , and BERT, XLNet, LSTM, GRU and DistilBERT for text representation (Liu et al., 2020; Gundapu and Mamidi, 2020) . Due to class imbalance in the dataset, approaches such as GMM and Training Signal Annealing (TSA) were also found useful. Morishita et al. (2020) ; Bonheme and Grzes (2020); Guo et al. (2020) ; Sharma et al. (2020b) proposed ensemble learning, whereas Gundapu and Mamidi (2020) ; De la Peña Sarracén et al. (2020) and several others used multimodal approaches. A few others leveraged transfer-learning using pre-trained models such as BERT (Devlin et al., 2019) , VGG-16 (Simonyan and Zisserman, 2015) , and ResNet (He et al., 2016) . Finally, state-of-the-art results for all three tasks -sentiment classification, emotion classification and emotion quantification on this dataset,-were reported by , who proposed a deep neural model that combines sentence demarcation and multi-hop attention. They also studied the interpretability of the model using the LIME framework (Ribeiro et al., 2016) .

Meme propagation. Dupuis and Williams (2019) surveyed personality traits of social media users who are more active in spreading misinformation in the form of memes. Crovitz and Moran (2020) studied the characteristics of memes as a vehicle for spreading potential misinformation and disinformation. Zannettou et al. (2020a) discussed the quantitative aspects of large-scale dissemination of racist and hateful memes among polarized communities on platforms such as 4chan's /pol/. Ling et al. (2021) examined the artistic composition and the aesthetics of memes, the subjects they communicate, and the potential for virality.

Based on this analysis, they manually annotated 50 memes as viral vs. non-viral. Zannettou et al. (2020b) analyzed the "Happy merchant" memes and showed how online fringe communities influence their spread to mainstream social networking platforms. They reported reasonable agreement for most manually annotated labels, and established a characterization for meme virality.

Other studies on memes. Reis et al. (2020) built a dataset of memes related to the 2018 and the 2019 election in Brazil (34k images, 17k users) and India (810k images, 63k users) with focus on misinformation. Another dataset of 950 memes targeted the propaganda techniques used in memes (Dimitrov et al., 2021a) , which was also featured as a shared that at SemEval-2021 (Dimitrov et al., 2021b) . Leskovec et al. (2009) introduced a dataset of 96 million memes collected from various links and blog posts between August 2008 and April 2009 for tracking the most frequently appearing stories, phrases, and information. Topic modeling of textual and visual cues of hate and racially abusive multi-modal content over sites such as 4chan was studied for scenarios that leverage genetic testing to claim superiority over minorities (Mittos et al., 2020) . Zannettou et al. (2020a) examined the content of meme images and online posting activities to identify the probability of occurrence of one event in a specific background process, affecting the occurrence of other events in the rest of the processes, also known as Hawkes process (Hawkes, 1971) , within the context of online posting of trolls. observed that fauxtographic content tends to attract more attention, and established how such content becomes a meme in social media. Finally, there is a recent survey on multi-modal disinformation detection (Alam et al., 2021) .

Differences with existing studies. Hate speech detection in multimodal memes (Kiela et al., 2020) is the closest work to ours. However, we are substantially different from it and from other related studies as (i) we deal with harmful meme detection, which is a more general problem than hateful meme detection; (ii) along with harmful meme detection, we also identify the entities that the harmful meme targets; (iii) our HarMeme comprises real-world memes posted on the web as opposed to using synthetic memes as in (Kiela et al., 2020) ; and (iv) we present a unique dataset and benchmark results for harmful meme detection and for identifying the target of harmful memes.

Here, we define harmful memes as follows: multimodal units consisting of an image and a piece of text embedded that has the potential to cause harm to an individual, an organization, a community, or the society more generally. Here, harm includes mental abuse, defamation, psycho-physiological injury, proprietary damage, emotional disturbance, and compensated public image.

Harmful vs. hateful/offensive. Harmful is a more general term than offensive and hateful: offensive and hateful memes are harmful, but not all harmful memes are offensive or hateful. For instance, the memes in Figures 1b and 1c are neither offensive nor hateful, but harmful to Donald Trump and to news media such as CNN, respectively. Offensive memes typically aim to mock or to bully a social entity. A hateful meme contains offensive content that targets an entity (e.g., an individual, a community, or an organization) based on its personal/sensitive attributes such as gender, ethnicity, religion, nationality, sexual orientation, color, race, country of origin, and/or immigration status. The harmful content in a harmful meme is often camouflaged and might require critical judgment to establish its potencial to do hard. Moreover, the social entities attacked or targeted by harmful memes can be any individual, organization, or community, as opposed to hateful memes, where entities are attacked based on personal attributes.

Below, we describe the data collection, the annotation process and the guidelines, and we give detailed statistics about the HarMeme dataset.

To collect potentially harmful memes in the context of COVID-19, we searched using different services, mainly Google Image Search. We used keywords such as Wuhan Virus Memes, US Election and COVID Memes, COVID Vaccine Memes, Work From Home Memes, Trump Not Wearing Mask Memes. We then used an extension 2 of Google Chrome to download the memes. We further scraped various publicly available groups on Instagram for meme collection. Note that, adhering to the terms of social media, we did not use content from any private/restricted pages. Unlike the Hateful Memes Challenge (Kiela et al., 2020) , which used synthetically generated memes, our HarMeme dataset contains original memes that were actually shared in social media. As all memes were gathered from real sources, we maintained strict filtering criteria 3 on the resolution of meme images and on the readability of the meme text during the collection process. We ended up collecting 5, 027 memes. However, as we collected memes from independent sources, we had some duplicates. We thus used two efficient de-duplication repositories 4 5 sequentially, and we preserved the memes with the highest resolution from each group of duplicates. We removed 1, 483 duplicate memes, thus ending up with a dataset of 3, 544. Although we tried to collect only harmful memes, the dataset contained memes with various levels of harmfulness, which we manually labeled during the annotation process, as discussed in Section 4.3. We further used Google's OCR Vision API 6 to extract the textual content of each meme.

As discussed in Section 3, we consider a meme as harmful only if it is implicitly or explicitly intended to cause harm to an entity, depending on the personal, political, social, educational or industrial background of that entity. The intended harm can be expressed in an obvious manner such as by abusing, offending, disrespecting, insulting, demeaning, or disregarding the entity or any sociocultural or political ideology, belief, principle, or doctrine associated with that entity. Likewise, the harm can also be in the form of a more subtle attack such as mocking or ridiculing a person or an idea.

We asked the annotators to label the intensity of the harm as harmful or partially harmful, depending upon the context and the ingrained explication of the meme. Moreover, we formally defined four different classes of targets and compiled welldefined guidelines 7 that the annotators adhered to while manually annotating the memes. The four target entities are as follows (c.f. Figure 1 ):

1. Individual: A person, usually a celebrity (e.g., a well-known politician, an actor, an artist, a scientist, an environmentalist, etc. such as Donald Trump, Joe Biden, Vladimir Putin, Hillary Clinton, Barack Obama, Chuck Norris, Greta Thunberg, Michelle Obama).

An organization is a group of people with a particular purpose, such as a business, a governmental department, a company, an institution or an association, comprising more than one person, and having a particular purpose, such as research organizations (e.g., WTO, Google) and political organizations (e.g., the Democratic Party).

A community is a social unit with commonalities based on personal, professional, social, cultural, or political attributes such as religious views, country of origin, gender identity, etc. Communities may share a sense of place situated in a given geographical area (e.g., a country, a village, a town, or a neighborhood) or in virtual space through communication platforms (e.g., online forums based on religion, country of origin, gender).

4. Society: When a meme promotes conspiracies or hate crimes, it becomes harmful to the general public, i.e., to the entire society.

During the process of collection and annotation, we rejected memes based on the following four criteria: (i) the meme text is in code-mixed or non-English language; (ii) the meme text is not readable (e.g., blurry text, incomplete text, etc.); (iii) the meme is unimodal, containing only textual or visual content; (iv) the meme contains cartoons (we added this last criterion as cartoons can be hard to analyze by AI systems). 

For the annotation process, we had 15 annotators, including professional linguists and researchers in Natural Language Processing (NLP): 10 of them were male and the other 5 were female, and their age ranged between 24-45 years. We used the PyBossa 8 crowdsourcing framework for our annotations (c.f. Figure 3) . We split the annotators into five groups of three people, and each group annotated a different subset of the data. Each annotator spent about 8.5 minutes on average to annotate one meme. At first, we trained our annotators with the definition of harmful memes and their targets, along with the annotation guidelines. To achieve quality annotation, our main focus was to make sure that the annotators were able to understand well what harmful content is and how to differentiate it from humorous, satirical, hateful, and non-harmful content. Dry run. We conducted a dry run on a subset of 200 memes, which helped the annotators understand well the definitions of harmful memes and targets, as well as to eliminate the uncertainties about the annotation guidelines. Let α i be a single annotator. For the preliminary data, we computed the inter-annotator agreement in terms of Cohen's κ (Bobicev and Sokolova, 2017) for three randomly chosen annotators α [1,2,3] for each meme for both tasks. The results are shown in Table 1 . We can see that the score is low for both tasks (0.295 and 0.373), which is expected for the initial dry run. With the progression of the annotation phases, we observed much higher agreement, thus confirming that the dry run helped to train the annotators.

Final annotation. After the dry run, we started the final annotation process. Figure 3a shows an example annotation of the PyBossa annotation platform. We asked the annotators to check whether a given meme falls under the four rejection criteria as given in the annotation guidelines. After confirming the validity of the meme, it was rated by three annotators for both tasks.

Consolidation. In the consolidation phase, for high agreements, we used majority voting to decide the final label, and we added a fourth annotator otherwise. Table 2 shows statistics about the labels and the data splits. After the final annotation, Cohen's κ increased to 0.695 and 0.797 for the two tasks, which is moderate and high agreement, respectively. These scores show the difficulty and the variability in gauging the harmfulness by human experts. For example, we found memes where two annotators independently chose partially harmful, but the third annotator annotated it as very harmful. Figure 4 shows the length distribution of the meme text for both tasks, and Table 3 shows the top-5 most frequent words in the union of the validation and the test sets. We can see that names of politicians and words related to COVID-19 are frequent in very harmful and partially harmful memes. For the target of the harmful memes, we notice the presence of various class-specific words such as president, trump, obama, china. These words often incorporate bias in the machine learning models, which makes the dataset more challenging and difficult to learn from (see Section 6.4 for more detail).

We provide benchmark evaluations on HarMeme with a variety of state-of-the-art unimodal textual models, unimodal visual models, and models using both modalities. Except for unimodal visual models, we use MMF (Multimodal Framework) 9 to conduct the necessary experiments.

£ Text BERT: We use textual BERT (Devlin et al., 2019) as the unimodal text-only model. £ VGG19, DenseNet, ResNet, ResNeXt: For the unimodal visual-only models, we used four different well-known models -VGG19 (Simonyan and Zisserman, 2015) , DenseNet-161 (Huang et al., 2017) , ResNet-152 (He et al., 2016) , and ResNeXt-101 (Xie et al., 2017) pre-trained on the ImageNet (Deng et al., 2009) dataset. We extracted the feature maps from the last pooling layer of each architecture and fed them to a fully connected layer.

£ Late Fusion: This model uses the mean score of pre-trained unimodal ResNet-152 and BERT. £ Concat BERT: It concatenates the features extracted by pre-trained unimodal ResNet-152 and text BERT, and uses a simple MLP as the classifier. £ MMBT: Supervised Multimodal Bitransformers (Kiela et al., 2019) is a multimodal architecture that inherently captures the intra-modal and the intermodal dynamics within various input modalities. £ ViLBERT CC: Vision and Language BERT (ViLBERT) (Lu et al., 2019) , trained on an intermediate multimodal objective (Conceptual Captions) (Sharma et al., 2018) , is a strong model with taskagnostic joint representation of image + text. Table 3 : Top-5 most frequent words per class. The tf-idf score per word is given within parenthesis.

£ Visual BERT COCO: Visual BERT (V-BERT) (Li et al., 2019) pre-trained on the multimodal COCO dataset (Lin et al., 2014) is another strong multimodal model used for a broad range of vision and language tasks.

Below, we report the performance of the models described in the previous section for each of the two tasks. We further discuss some biases that negatively impact performance. Appendix A gives additional details about training and the values of the hyper-parameters we used in our experiments.

We used six evaluation measures: Accuracy, Precision, Recall, Macroaveraged F1, Mean Absolute Error (MAE), and Macro-Averaged Mean Absolute Error (MMAE) (Baccianella et al., 2009) . For the first four measures, higher values are better, while for the last two, lower values are better. Since the test set is imbalanced, measures like macro F1 and MMAE are more relevant. Table 4 shows the results for the harmful meme detection task. We start our experiments by merging the very hateful and the partially hateful classes, thus turning the problem into an easier binary classification. Afterwards, we perform the 3-class classification task. Since the test set is imbalanced, the majority class baseline achieves 64.76% accuracy. We observe that the unimodal visual models perform only marginally better than the majority class baseline, which indicates that they are insufficient to learn the underlying semantics of the memes. Moving down the table, we see that the unimodal text model is marginally better than the visual models. Then, for multimodal models, the performance improves noticeably, and more sophisticated fusion techniques yield better results. We also notice the effectiveness of multimodal pre-training over unimodal pre-training, which supports the recent findings by . While both ViL-BERT CC and V-BERT COCO perform similarly, the latter achieves better Macro F1 and MMAE, which are the most relevant measures.

Harmful Meme Detection 2-Class Classification Table 5 shows the results for the target identification task. This is an imbalanced 4-class classification problem, and the majority class baseline yields 46.60% accuracy. The unimodal models perform relatively better here, achieving 63% − 70% accuracy; their F1 Macro and MMAE scores are also above the majority class. However, the overall performance of the unimodal models is poor. Incorporating multimodal signals with fine-grained fusion improves the results substantially, and advanced multimodal fusion techniques with multimodal pre-training perform much better than simple late fusion with unimodal pre-training. Moreover, V-BERT COCO outperforms ViLBERT CC by 8% of F1 score and by nearly 0.3 of MMAE.

To understand how human subjects perceive these tasks, we further hired a different set of experts (not the annotators) to label the test set. We observed 86% − 91% accuracy on average for both tasks, which is much higher than V-BERT, the bestperforming model. This shows that their is a potential for enriched multimodal models that better understand the ingrained semantics of the memes. 

Since the HarMeme dataset was compiled of memes related to COVID-19, we expected that models with enriched contextual knowledge and sophisticated technique would have superior performance. Thus, to comprehend the interpretability of V-BERT (the best model), we used LIME (Locally Interpretable Model-Agnostic Explanations) (Ribeiro et al., 2016), a consistent model-agnostic explainer to interpret the predictions.

We chose two memes from the test set to analyze the potential explanability of V-BERT. The first meme, which is shown in Figure 5a , was manually labeled as very harmful, and V-BERT successfully classified it, with prediction probabilities of 0.651, 0.260, and 0.089 corresponding to the very harmful, the partially harmful, and the harmless classes respectively. Figure 5b highlights the most contributing super-pixels to the very harmful (green) class. As expected, the face of Donald Trump, as highlighted by the green pixels, prominently contributed to the prediction. Figure 5c demonstrates the contribution of different meme words to the model prediction. We can see that words like CORONA and MASK have significant contributions to the very harmful class, thus supporting the lexical analysis of HarMeme as shown in Table 3 .

The second meme, which is shown in Figure 5d , was manually labeled as harmless, but V-BERT incorrectly predicted it to be very harmful. Figure 5e shows that, similarly to the previous example, the face of Donald Trump contributed to the prediction of the model. We looked closer into our dataset, and we found that it contained many memes with the image of Donald Trump, and that the majority of these memes fall under the very harmful category and targeted and individual. Therefore, instead of leaning the underlying semantics of one particular meme, the model easily got biased by the presence of Donald Trump's image and blindly classified the meme as very harmful.

We presented HarMeme, the first large-scale benchmark dataset, containing 3,544 memes, related to COVID-19, with annotations for degree of harmfulness (very harmful, partially harmful, or harmless), as well as for the target of the harm (an individual, an organization, a community, or society). The evaluation results using several unimodal and multimodal models highlighted the importance of modeling the multimodal signal (for both tasks) -(i) detecting harmful memes and (ii) detecting their targets-, and indicated the need for more sophisticated methods. We also analyzed the best model and identified its limitations.

In future work, we plan to design new multimodal models and to extend HarMeme with examples from other topics, as well as to other languages. Alleviating the biases in the dataset and in the models are other important research directions.

User Privacy. Our dataset only includes memes and it does not contain any user information.

Biases. Any biases found in the dataset are unintentional, and we do not intend to do harm to any group or individual. We note that determining whether a meme is harmful can be subjective, and thus it is inevitable that there would be biases in our gold-labeled data or in the label distribution. We address these concerns by collecting examples using general keywords about COVID-19, and also by following a well-defined schema, which sets explicit definitions during annotation. Our high inter-annotator agreement makes us confident that the assignment of the schema to the data is correct most of the time.

Misuse Potential. We ask researchers to be aware that our dataset can be maliciously used to unfairly moderate memes based on biases that may or may not be related to demographics and other information within the text. Intervention with human moderation would be required in order to ensure that this does not occur.

Intended Use. We present our dataset to encourage research in studying harmful memes on the web. We distribute the dataset for research purposes only, without a license for commercial use. We believe that it represents a useful resource when used in the appropriate manner.

Environmental Impact. Finally, we would also like to warn that the use of large-scale Transformers requires a lot of computations and the use of GPUs/TPUs for training, which contributes to global warming (Strubell et al., 2019) . This is a bit less of an issue in our case, as we do not train such models from scratch; rather, we fine-tune them on relatively small datasets. Moreover, running on a CPU for inference, once the model has been finetuned, is perfectly feasible, and CPUs contribute much less to global warming. Figure B .1: Examples of memes that we rejected during the process of data collection and annotation.

Hamed Firooz, and Preslav Nakov. 2021. A survey on multimodal disinformation detection. arXiv 2103

See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization. Knowledge-Based Systems

Evaluation Measures for Ordinal Regression

Interannotator agreement in sentiment analysis: Machine learning perspective

SESAM at SemEval-2020 task 8: Investigating the relationship between image and text in sentiment analysis of memes

Analyzing Disruptive Memes in an Age of International Interference

Detecting hate speech in multi-modal memes

ImageNet: A large-scale hierarchical image database

BERT: Pre-training of deep bidirectional transformers for language understanding

Detecting propaganda techniques in memes

SemEval-2021 Task 6: Detection of persuasion techniques in texts and images

The spread of disinformation on the Web: An examination of memes on social networking

Gundapusunil at SemEval-2020 task 8: Multimodal Memotion Analysis

Guoym at SemEval-2020 task 8: Ensemble-based classification of visuolingual metaphor in memes

Spectra of some self-exciting and mutually exciting point processes

Deep residual learning for image recognition

Multimodal sentiment analysis to explore the structure of emotions

Densely connected convolutional networks

Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation

Supervised multimodal bitransformers for classifying images and text

Pratik Ringshia, and Davide Testuggine. 2020. The hateful memes challenge: Detecting hate speech in multimodal memes

Adam: A method for stochastic optimization

Meme-tracking and the dynamics of the news cycle

VisualBERT: A simple and performant baseline for vision and language

Oscar: Object-semantics aligned pretraining for vision-language tasks

Microsoft COCO: Common Objects in Context

Dissecting the meme magic: Understanding indicators of virality in image memes

Santhosh Rajamanickam, Georgios Antoniou, Ekaterina Shutova, and Helen Yannakoudakis. 2020. A multimodal framework for the detection of hateful memes

UoR at SemEval-2020 task 8: Gaussian mixture modelling (GMM) based sampling approach for multi-modal memotion analysis

ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks

And we will fight for our race!" A measurement study of genetic testing conversations on Reddit and 4chan

Hitachi at SemEval-2020 task 8: Simple but effective modality ensemble for meme emotion recognition

Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes

PRHLT-UPV at SemEval-2020 task 8: Study of multimodal techniques for memes analysis

Exercise? I thought you said 'extra fries': Leveraging sentence demarcations and multi-hop attention for meme affect analysis. proceedings of the

A dataset of fact-checked images shared on WhatsApp during the Brazilian and Indian elections

Explaining the predictions of any classifier

Hate speech in pixels: Detection of offensive memes towards automatic moderation

Detecting hateful memes using a multimodal deep ensemble

Viswanath Pulabaigari, and Björn Gambäck. 2020a. SemEval-2020 task 8: Memotion analysis-the visuo-lingual metaphor! In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Se-mEval '20

Vasantha. 2020b. Memebusters at SemEval-2020 task 8: Feature fusion model for sentiment analysis on memes using transfer learning

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning

Very deep convolutional networks for large-scale image recognition

Are we pretraining it right? Digging deeper into visio-linguistic pretraining

Energy and policy considerations for deep learning in NLP

VL-BERT: pre-training of generic visual-linguistic representations

Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text

LXMERT: Learning cross-modality encoder representations from transformers

Multimodal classification for analysing social media

Attention is all you need

Savvas Zannettou, and Gianluca Stringhini. 2020. Understanding the use of fauxtography on social media

Aggregated residual transformations for deep neural networks

ERNIE-ViL: Knowledge enhanced vision-language representations through scene graph

YNU-HPCC at SemEval-2020 task 8: Using a parallelchannel model for memotion analysis

Characterizing the use of images in state-sponsored information warfare operations by Russian trolls on Twitter

A quantitative approach to understanding online antisemitism

Classification of multimodal hate speech -the winning solution of hateful memes challenge

Unified vision-language pre-training for image captioning and VQA

Multimodal learning for hateful memes detection

Enhance multimodal transformer with external label and in-domain pretrain: Hateful meme challenge winning solution

The work was partially supported by the Wipro research grant and the Infosys Centre for AI, IIIT Delhi, India. It is also part of the Tanbih megaproject, developed at the Qatar Computing Research Institute, HBKU, which aims to limit the impact of "fake news," propaganda, and media bias by making users aware of what they are reading.

We trained all the models using the Pytorch framework on an NVIDIA Tesla T4 GPU with 16 GB of dedicated memory and with CUDA-10 and cuDNN-11 installed. For the unimodal models, we imported all the pre-trained weights from the TORCHVI-SION.MODELS 10 subpackage of PyTorch. We initialized the non pre-trained weights randomly with a zero-mean Gaussian distribution with a standard deviation of 0.02. To minimize the impact of the label imbalance in the loss calculation, we assigned larger weights to the minority class. We trained our models using the Adam optimizer (Kingma and Ba, 2014) and the negative log-likelihood loss as the objective function. We trained the models end-to-end for the two classification tasks, i.e., the memes that were classified as Very Harmful or Partially Harmful in the first classification stage were sent to the second stage for target identification.

B.1 What do we mean by harmful memes?The entrenched meaning of harmful memes is targeted towards a social entity (e.g., an individual, an organization, a community, etc.), likely to cause calumny/vilification/defamation depending on their background (bias, social background, educational background, etc.). The harm caused by a meme can be in the form of mental abuse, psycho-physiological injury, proprietary damage, emotional disturbance, compensated public image. A harmful meme typically attacks celebrities or well-known organizations, with the intent to expose their professional demeanor.

• Harmful memes may or may not be offensive, hateful, or biased in nature.• Harmful memes expose vices, allegations, and other negative aspects of an entity based on verified or unfounded claims or mocks.• Harmful memes leave an open-ended connotation to the word community, including antisocial communities such as terrorist groups.10 http://pytorch.org/docs/stable/torchvision/models.html • The harmful content in harmful memes is often implicit and might require critical judgment to establish its potential to do harm.• Harmful memes can be classified at multiple levels, based on the intensity of the harm they could cause, e.g., very harmful or partially harmful.• One harmful meme can target multiple individuals, organizations, and/or communities at the same time. In that case, we asked the annotators to go with the best personal judgment.• Harm can be expressed in the form of sarcasm and/or political satire. Sarcasm is praise that is actually an insult; sarcasm generally involves malice, the desire to put someone down. On the other hand, satire is the ironical exposure of the vices or the follies of an individual, a group, an institution, an idea, the society, etc., usually with the aim to correcting it.

An organization is a group of people with a particular purpose, such as a business or a government department. Examples include a company, an institution, or an association comprising one or more people with a particular purpose, e.g., a research organization, a political organization, etc. On the other hand, a community is a social unit (a group of living things) with a commonality such as norms, religion, values, ideology customs, or identity. Communities may share a sense of place situated in a given geographical area (e.g., a country, a village, a town, or a neighborhood) or in the virtual space through communication platforms.

We apply the following rejection criteria during the process of data collection and annotation:1. The meme's text is code-mixed or not in English.2. The meme's text is not readable. (e.g., blurry text, incomplete text, etc.)3. The meme is unimodal in nature, containing only textual or only visual content.4. The meme contains a cartoon.