key: cord-0585006-p2vwcdhy authors: Karmakar, Sayar; Das, Sanchari title: Evaluating the Impact of COVID-19 on Cyberbullying through Bayesian Trend Analysis date: 2020-08-08 journal: nan DOI: nan sha: a27b51560d7853b4c147e2611ee27f10a0f5a1a2 doc_id: 585006 cord_uid: p2vwcdhy COVID-19's impact has surpassed from personal and global health to our social life. In terms of digital presence, it is speculated that during pandemic, there has been a significant rise in cyberbullying. In this paper, we have examined the hypothesis of whether cyberbullying and reporting of such incidents have increased in recent times. To evaluate the speculations, we collected cyberbullying related public tweets (N=454,046) posted between January 1st, 2020 -- June 7th, 2020. A simple visual frequentist analysis ignores serial correlation and does not depict changepoints as such. To address correlation and a relatively small number of time points, Bayesian estimation of the trends is proposed for the collected data via an autoregressive Poisson model. We show that this new Bayesian method detailed in this paper can clearly show the upward trend on cyberbullying-related tweets since mid-March 2020. However, this evidence itself does not signify a rise in cyberbullying but shows a correlation of the crisis with the discussion of such incidents by individuals. Our work emphasizes a critical issue of cyberbullying and how a global crisis impacts social media abuse and provides a trend analysis model that can be utilized for social media data analysis in general. Bullying is characterized as the "repeated oppression, psychological or physical, of a less powerful person by a more powerful one" [18] . With the ascent of online communication, the dynamics of bullying have transcended beyond physical boundaries to the digital realm, referred to as "cyberbullying" . Cyberbullying has gotten increasingly pervasive, as focused exploitation has moved from face to face to advanced stages, targeting users despite geographic constraints [44, 52] . Victims of cyberbullying can be targeted through various sources, including mobile phones, video cameras, emails, and web pages [53] . Cyberbullying can negatively impact mental health, with 32% of victims reporting symptoms of stress and 38% of victims experiencing emotional distress, even after the online abuse stops [19, 54] . Earlier investigations have indicated that web-based social networking has expanded the impact of cyberbullying [54] . On social networking sites and applications, cyberbullying is particularly common, with 66% of all cyberbullying episodes occurring on these platforms 1 . Twitter permits individuals to now and again interact with outsiders (counting celebrities) [11] ; however, this also leads others to mirror and forge identities online and trick users [47] . Verification of profiles only works for celebrities or those who are well-known in their field, making it difficult to verify an individual's identity [31] . It is even more challenging to identify abusers when they are imitating someone else. Due to the correlation of cyberbullying with social media usage, individuals often have shown negative user experience in these social media platforms [36] . Besides, with the current COVID-19 pandemic, individuals have increased their social media use to remain associated with others while social distancing [50] . In any case, there have likewise been reports of incivility through such platforms [23] . An abrupt ascent in internet-based life use -joined by children and adolescents continually utilizing such stages -could make a concerning spike in cyberbullying 2 . Along these lines, our goal was to explore explicitly: How has a crisis, such as a pandemic (COVID-19), impacted reporting and discussions of cyberbullying incidents on Twitter? To understand users' perspectives, we collected 454, 046 of publicly available tweets about cyberbullying to understand user discussions online. We first tried a simple visual analysis to detect a significant rise in the incidence count of these keywords anytime around March. However, as one can see from Figure 1 or 2 such a changepoint is not very prominent. This allowed us to address the shortcomings of such a simplistic model, which ignores the possibility of a smooth change, the inherent dependence in the time series of counts. The initial analysis motivated our research to build a suitable autoregressive bayesian model, as described in Section 5. We choose a time-varying Bayesian method previously detailed by Karmakar et al. [40] . As hypothesized, we noticed an increase in cyberbullying incident discussions during the pandemic, which shows an impact of the crisis on cyberbullying trends. Our method allowed us to construct posterior samples of the parameter function of time with the collected data. To the best of our knowledge, this work is the first quantitative trend analysis on a large sample data. Our results also reveal a clear telling effect of COVID-19 on worsening cyberbullying incidents as reported and discussed through tweets. Based on our quantitative work, we can explore in-depth qualitative analysis as a future extension of this work to further see the details of the discussed tweets. Cyberbullying has expanded significantly with the advent of social media and billions of users being online everyday [44] . User experience of cyberbullying has been reported in several social networking platforms, chat rooms, and mobile messaging applications; such abuse transcends beyond geographical proximity [45, 48] . Furthermore, in light of the fact of crisis, it is being speculated that the crisis has increased cyberbullying incidents [13] . To address the issue of cyberbullying, prior research has investigated online maltreatment and created effective technical and policy-focused [9, 10] mitigation strategies. However, though some of these strategies are being implemented through social media policy management, there are several privacy concerns of the social media users [4, 14] . Thus, the speculation about the rise of cyberbullying due to a pandemic is a natural progression, which requires detailed analysis to verify such hypothesis. To understand further, we start by analyzing the cyberbullying discussion trends over twitter. Mason defined cyberbullying as "an individual or a group willfully using information and communication involving electronic technologies to facilitate deliberate and repeated harassment or threat to another individual or group by sending or posting cruel text and/or graphics using technological means" [28] . To investigate further, Nocentini et al. studied the behaviour of the attackers for different types of cyberbullying, including imbalance of power, intention, repetition, anonymity, and publicity [35] . Previous works have explored the effects of cyberbullying on targets, especially on teenagers; sometimes such abuses can impact both the cyberaggressors and cybervictims [5, 29, 43, 51] . Dredge et al. noted the detrimental effects of cyberbullying on the social and emotional lives of targets, with the severity of the impact of the harassment depending on different factors, including the anonymity of the perpetrators and the presence of bystanders [17] . All of the abovementioned studies and several other researchers [24, 38] indicate the severity of cyberbullying on individuals and for the society, thus it is critical to develop strong defense against cyberbullying. 2.2.1 Technical Mitigation Techniques. Several technical mitigation techniques have been proposed, with the goal of automatically detecting and intervening in cyberbullying incidents online [2, 3] . For instance, Dinakar et al. proposed an online dashboard, which would allow moderators to track potential bullying incidents on a forum through natural language processing [16] . Mondal et al. analyzed hate speech on Twitter and Whisper to improve automated detection of bullying [34] , while Cortis and Handschuh analyzed bullying tweets in the context of major world events [7] . Such technical mitigation strategies are helpful, yet it is also critical to understand an individual's perspective and any policy implementation strategies adapted by the social media organizations. Policies. As a mitigating measure, some prior work has focused on improving social media policies to prevent perpetrators from abusing their victims [8] . Pater et al. compared the social media policies of 15 different platforms and found that these policies vary from mild censoring to the involvement of law enforcement [39] . Given the legal implications, there can be severe consequences for such incidents for these social media organizations [33] . Milosevic examined the responsibilities of social media companies in addressing cyberbullying among children [32] . In order to improve anti-harassment measures, previous research has examined the motivations of cyberbullies [15, 25, 30, 42] . Lee and Kim interviewed 110 subjects to investigate why social media users leave benevolent or malicious comments [26] . Whittaker and Kowalski found that cyber aggression was more present in online comment sections and forum replies than on Facebook, again suggesting the importance of anonymity [49] . Overall, these defensive mechanisms are helpful and aid in making internet and online experience better for individuals. But, even with such strong defensive tactics, it is speculated that cyberbullying has increased, especially during the COVID-19. Thus, for our work we try to understand the problem better through detailed quantitative analysis. Studies that analyze trends in cyberbullying are helpful in understanding how events can impact digital users. Schneider et al. conducted four surveys across 17 high schools and found that the overall rate of cyberbullying increased from 2006 to 2012 [22] . Snell and Englander through survey-based analysis found that females are more likely to be involved in cyberbullying as both victims and as perpetrators, indicating the importance of gender as a factor in mitigating online bullying [46] . Mangaonkar et al. used a distributed design for analyzing tweets and detecting cyberbullying in real-time [27] . Twitter allows users to express themselves in 280 character 'tweets;' prior studies have analyzed these messages for cyberbullying [1, 37] . Cortis and Handschuh analyzed bullying tweets in the context of two trending events (the Ebola outbreak and shooting of Michael Brown in Ferguson, Missouri) and identified commonly used hashtags and named entitites in bullying tweets [7] . They tried to identify cyberbullies through these discussions, but whether or not such crisis situations increased bullying tweets was not studied. Due to an increase in individuals' online digital presence, assumptions have been made that the pandemic situation from COVID-19 can increase cyberbullying attacks. Thus, our goal was to find concrete evidence to support or contradict this hypothesis, also rather than surveys, we chose to collect data from Twitter itself to get the real-time trend of such critical incidents. With over 300 million active daily users, Twitter 3 is an ideal data source 4 . Thus, to assess the impact of COVID-19 on cyberbullying, we collected 454, 046 public tweets on Twitter, all of which mentioned cyberbullying. We scraped Twitter for user-posted, publicly available tweets related to the topics of cyberbullying, social media bullying, online harassment, etc. The data was collected using Get Old Tweets API 5 , which allowed us to access tweets older than one week. This API was used in the web crawler written in Python, and the data was stored with MongoDB. The data collection spanned from January 1st 2020-June 7th 2020. This timeline was particularly selected to note the impact of COVID-19 on online users and determine whether the crisis situation led to an increase in online abuse. We used the following key terms when conducting our search: Internet bullying, Internet bully, Internet bullies, online abuse, online harassment, online shaming, online stalking, cyberbullying, social media bullying, stop cyberbullying, cyberbully, cyberbullies, FB bullying, FB cyberbullying, FB harassment, FB victim, Facebook bullying, Facebook cyberbullying, Facebook victim, Facebook harassment, Twitter bullying, Twitter cyberbullying, Twitter harassment, Twitter victim, Insta bullying, Insta cyberbullying, Insta harassment, Insta victim. We only collected direct tweets and removed any retweets or duplicate tweets. After completing the data collection, we performed trend analysis to evaluate the impact of the crisis situation, such as COVID-19 pandemic on cyberbullying. Using the timestamp of the post, we obtained the daily count of the tweets which including at least one of these keywords. Figure 1 shows the daily count for the 159 days from 01 st January, 2020 to 07 t h June, 2020. Some of the types of keywords had fewer tweets with negligible impact on the analysis. Thus, we broadly divide them in 3 sub-classes: keywords containing "cyber" (CY, 7 keywords, 235, 542 tweets), "online/internet"(ON, 6 keywords, 96, 629 tweets) and "twitter"(TW, 3 keywords 96, 147, tweets). The daily count distribution for these sub-classes are shown in Figure 2 . In a traditional change-point analysis, one looks for an abrupt change; i.e. after observing X 1 , . . . , X T if we suspect there is at most one change point then we are looking for the unknown location 1 ≤ τ ≤ T , such that The analysis shows a clear pattern that is prevalent to all the counts and the sub-classes that we present here. Overall, except for the sub-class 'ON' , there does not seem to be a huge change in mean except it went slightly upwards since mid-March and in all categories including the total. We notice a huge spike in the cyberbullying related tweets in the second half of May. The sudden rise in the frequency of tweets in the second half May, can be due to the untimely demise of the Japanese TV star 6 which occurred due to cyberbullying. Moreover for the class 'ON' , one can see a significant spike in the second half of February and the overall mean also had an upward trend. This may or may not be due to the pandemic. Note that except for the spike in later half of May, there is no abrupt break due to COVID-19. The authors explored ( [12] ) such a simple change-point model as an work in progress. However we believed that a simple model like (1) can often fail to adequately capture some other sophistication that are particular to the data we collected here. Abrupt change vs Smooth change: Note that the change-point model in 1 address for abrupt change. However, due to the heterogeneous nature of Twitter, one can expect the change might not be abrupt and this can explain why just from the daily count summaries of the total tweets or the sub-classes do not reveal any abrupt change in either mean or variance in general. A more meaningful model could be where the parameters change smoothly over time and we can estimate these parameters as function of time and see whether the trend is increasing due to COVID-19 or not. Dependence: Note that, the daily time count of number of occurrences is a time-series. Any visual analysis of change-point would heavily disregard the inherent dependence assumption that is present in a time-series. These counts depend heavily on current trend and are expected to show strong correlation with recent pasts. We decide to furnish this through the following Figure 3 . Under such heavy dependence for the total count and the three sub-series, one needs to take the dependence into account. Otherwise any analysis, be it abrupt change point or smooth time-varying parameter model will not be justified. Poisson count time series: Also note that the daily number of occurrences is a count series but unfortunately the traditional changepoint analysis often assume normality (Normal distribution). Another advantage of using Poisson random variable is it can model the mean and variance through a single parameter. Small sample size: A wide range of frequentist time-varying model were discussed in [21] and [20] that relied on kernel-based methods. However, one needs a large sample (Sample time point size of at least 500) to estimate any time-trend with precision. This is a sheer shortcoming of the kernel based methods that are essential for a frequentist model. But here we collected tweets of first 5 months of 2020 resulting in a sample of 159 time points. In order to address the inadequacy of a visual detection of changepoint, we propose the following time-varying Bayesian autoregressive count (TVBARC) model from [40] . Note that this model addresses the short-comings mentioned above. Due to the possibly non-stationary (over time) nature of the data, we propose a time-varying version of the linear Poisson autoregressive model [6, 55] . The conditional distribution for count-valued timeseries X t given F t −1 = {X i : i ≤ (t − 1)} is, Due to the Poisson link in (2), both conditional mean and conditional variance depend on the past observations. The conditional expectation of X t in the above model (2) is E(X t |F t −1 ) = µ(t/T ) + p i=1 a i (t/T )X t −i , which is positive-valued. Additionally, we impose the following constraints on parameter space for the timevarying parameters, When p = 0, our proposed model reduces to routinely used nonparametric independent Poisson regression model as in [41] . The µ(·) function correspond to the general mean trend at time t and a p (·), the p-th order autoregressive (AR hereafter) coefficient function denotes how the observation at time t is affected by a past observation at lag p. The strong correlation pattern in Figure 3 shows we should opt for a p > 0. To proceed with Bayesian computation, we put priors on the unknown functions µ(·) and a i (·)'s such that they are supported in P 1 . The prior distributions on these functions are induced through basis expansions in B-splines with suitable constraints on the coefficients to impose the shape constraints as in P. Detailed description of the priors are given below, Here B j 's are the B-spline basis functions and δ j 's are unbounded. The prior induced by above construction are P-supported. The verification is straightforward and can be found in [40] . We develop efficient MCMC algorithm to sample the parameter β, θ and δ from the above likelihood. Interested readers can see [40] for the computation of the likelihood and partial derivatives. We first analyze the total count trends through two different choices of lag p: an AR(1) and an AR(10) model. Often there could be weekly patterns which could mean high correlation at lag 7 and also lag 8 if lag 1 was significant. To see if there is really a weekly pattern we decided to take a lag that is slightly higher than 7. The trend functions with their corresponding credible intervals (we omit the credible intervals for 10 AR coefficients for clarity) are shown in Figure 4 , 5 and 6. The trend and the credible intervals are mean and quantiles of 20000 posterior MCMC samples after 10000 burn-in. When we increase the number of lags to 10, we no longer report the credible intervals. From the above figures we summarize the findings as follows: • Mean trend increased from March 9th or so. If we contrast this with Figure 1 , it is easy to appreciate the significant role of dependence for such time-series data. • The AR(1) and AR(10) models are comparable and usually only the first lag accounts for most of the correlation. The mean trend µ(t) came out to be very similar to Figure 4 . • The 95% credible intervals provided are very narrow and thus gives us a significant confidence about the true trends being of similar nature. Next we analyze the three sub-classes under the AR(1) model and put them together in Figure 7 . These sub-classes are very similar to the original total count. In all three of them there is a rise around first week of March and the rise continues for a while (except class TW). Nothing significant can be said about the AR coefficients except that the are also non-stationary. Note that with smaller number of data points for the classes ON and TW one can see the credible intervals are wider which is understandable. Cyberbullying is a primary concern, and there have been several speculations on the increase in cyberbullying incidents during COVID-19. To start with this investigation, we perform a comprehensive Bayesian analysis of the daily count of cyberbullying occurrences and its dominant classes on cyberbullying-related public tweets (N = 454, 046) posted between January 1 st , 2020 -June 7 th , 2020. We developed a Bayesian model that exhibited a sharp increase in the general mean trend for most of these twitter keywords related to cyberbullying. The significant AR correlation was at lag 1, and it evolved around 0.4-0.6 but not in a monotonic way. This analysis showed the increase in the cyberbullying discussion trend by Twitter users during COVID-19, which may or may not be due to the pandemic's direct impact. However, to further analyze the content discussed in these tweets, we plan to perform in-depth qualitative analysis. Our work is novel due to its first quantitative trend analysis on understanding the users based on their discussions regarding cyberbullying, especially during a pandemic crisis. Since COVID-19 has spread in multiple phases, it will be of utmost importance to detect such a high rise in social media trends. Analysis of tweets related to cyberbullying: exploring information diffusion and advice available for cyberbullying victims A Study of Cyberbullying Detection and Mitigation on Instagram Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers Effect of penitence on social media trust and privacy concerns: The case of Facebook Cyber Bullying and Internalizing Difficulties: Above and Beyond the Impact of Traditional Forms of Bullying A linear Poisson autoregressive model: The Poisson AR (p) model Analysis of Cyberbullying Tweets in Trending World Events What is a Flag for? Social Media Reporting Tools and the Vocabulary of Complaint Privacy Preserving Policy Model Framework Modularity is the Key A New Approach to Social Media Privacy Policies How Celebrities Feed Tweeples with Personal and Promotional Tweets: Celebrity Twitter Use and Audience Engagement Change-Point Analysis of Cyberbullying-Related Twitter Discussions During COVID-19 Annelies Wilder-Smith, and Heidi Larson. 2020. The Pandemic of Social Media Panic Travels Faster than the COVID-19 Outbreak Privacy Practices, Preferences, and Compunctions: WhatsApp Users in India Upstanding by design: Bystander intervention in cyberbullying Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying Cyberbullying in Social Networking Sites: An Adolescent VictimâĂŹs Perspective Understanding and Preventing Bullying Online Victimization: A Report on the Nation's Youth Asymptotic Theory for Simultaneous Inference Under Dependence Simultaneous inference for time-varying models Trends in cyberbullying and school bullying victimization in a regional census of high school students Effects of Social Grooming on Incivility in COVID-19 Cyberbullying: Bullying in the digital age Cyberbullying in the Social Networking Sites: An Online Disinhibition Effect Perspective Why People Post Benevolent and Malicious Comments Online Collaborative detection of cyberbullying behavior in Twitter data Cyberbullying: A preliminary assessment for school personnel Most Teens Bounce Back: Using Diary Methods to Examine How Quickly Teens Recover from Episodic Online Risk Exposure Co-Designing Mobile Online Safety Applications with Children Identity Verification Mechanism for Detecting Fake Profiles in Online Social Networks Protecting Children Online? Cyberbullying Policies of Social Media Companies A Measurement Study of Hate Speech in Social Media Cyberbullying: Labels, behaviours and definition in three European countries Techies Against Facebook: Understanding Negative Sentiment Toward Facebook via User Generated Content Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility Cyberbullying and self-esteem Characterizations of Online Harassment: Comparing Policies Across Social Media Platforms Bayesian semiparametric time varying model for count data to study the spread of the COVID-19 cases Adaptive Bayesian procedures using random series priors On Newer App Features and Cyberbullying in Schools Cyberbullying in Adolescent Victims: Perception and Coping Cyberbullying: Another Main Type of Bullying? An Investigation into Cyberbullying, its Forms, Awareness and Impact, and the Relationship Between Age and Gender in Cyberbullying Cyberbullying victimization and behaviors among girls: Applying research findings in the field Tackling Twitter and Facebook Fakes: ID Theft in Social Media Social Comparison, Social Media, and Self-Esteem Cyberbullying via Social Media Social Media Use During Social Distancing Dear Diary: Teens Reflect on Their Weekly Online Risk Experiences Examining the Overlap in Internet Harassment and School Bullying: Implications for School Intervention Online aggressor/targets, aggressors, and targets: A comparison of associated youth characteristics Examining Characteristics and Associated Distress Related to Internet Harassment: Findings from the Second Youth Internet Safety Survey A regression model for time series of counts We would like to thank Umang Mehta for his help with the data collection. We would also like to acknowledge the support of Secure and Privacy Research in New-Age Technology (SPRINT) Lab, University of Denver; and Human and Technical Security (HATS) Lab, Indiana University, and University of Florida. Any opinions, findings, and conclusions or recommendations expressed in this material are solely those of the author(s).