key: cord-235946-6vu34vce
authors: Beskow, David M.; Carley, Kathleen M.
title: Social Cybersecurity Chapter 13: Casestudy with COVID-19 Pandemic
date: 2020-08-23
journal: nan
DOI: nan
sha: 
doc_id: 235946
cord_uid: 6vu34vce

The purpose of this case study is to leverage the concepts and tools presented in the preceding chapters and apply them in a real world social cybersecurity context. With the COVID-19 pandemic emerging as a defining event of the 21st Century and a magnet for disinformation maneuver, we have selected the pandemic and its related social media conversation to focus our efforts on. This chapter therefore applies the tools of information operation maneuver, bot detection and characterization, meme detection and characterization, and information mapping to the COVID-19 related conversation on Twitter. This chapter uses these tools to analyze a stream containing 206 million tweets from 27 million unique users from 15 March 2020 to 30 April 2020. Our results shed light on elaborate information operations that leverage the full breadth of the BEND maneuvers and use bots for important shaping operations.

The COVID-19 pandemic is a defining event of the modern era, and there are few events more appropriate to apply social cybersecurity tools and concepts. At the time of this writing, the pandemic has reached almost every society in the world, with massive impact not only on the lives of those who contract it but on the social, economic, and institutional fabric of these societies. The pandemic and differing opinions on how to react to it have created a virtual battle of ideas across social media. Actors ranging from soccer moms to well-resourced nation states have entered the virtual marketplace of beliefs and ideas trying to sway the beliefs, actions, and decisions of both leaders and followers. With the pandemic as the backdrop of life as we write this book, it seemed appropriate to use the social cybersecurity tools that we discussed in the previous chapters to identify and understand information operations related to There are still many questions as well as competing narratives about the origins and nature of the COVID-19 coronavirus disease. The disease is caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2), and was The virus and resulting pandemic have radically altered the world landscape and daily life for many people. With a large portion of the world's consumers sheltering in or near their home, the world's markets have slowed, inducing the greatest global recession since the Great Depression [10] . It has led to the cancellation or deferment of most travel activities, sporting and entertainment events, religious and political gatherings. As of 14 April 2020, schools and universities had closed or otherwise been disrupted in 191 countries affecting 1.57 billion students (pre-primary, primary, lower-secondary, and upper secondary levels of education), or 90.1% of the total enrolled students for these categories [1] .

The COVID-19 pandemic has induced information warfare at multiple levels, both within and between nations. At the individual level, much of the information is in regards to safety and health during the pandemic. Social media often doesn't have structural filters for good ideas, and at times poor ideas regarding health and safety have been promoted by segments of the crowd. Within countries, the pandemic has created two warring factions: 1) those who think policy should prioritize the health and safety of citizens and 2) those who think policy should prioritize the national economy and maintaining the jobs and livelihoods of citizens. These factions often seem to fall along existing political party lines, with conservatives emphasizing the economy and liberals emphasizing health and safety. This policy oriented friction has contributed a large portion of the information conflict within nations. Finally, the origin and nature of the COVID-19 coronavirus has aggravated existing geopolitical fault lines between nations, particularly between the United States and China, with the European Union, Russia, Brazil, and other nations participating in the conflict of narratives. As we look at information warfare and social cybersecurity in the COVID-19 pandemic conversation, we attempt to illuminate information conflict across this spectrum.

Each country's response has varied based on its society, forms of government, and health care system. The COVID-19 pandemic has served as a worldwide test of the resiliency of these systems. Nations are therefore turning to information warfare to strengthen their test results while attacking the response and results of other nations, particularly those whose form of government differs from theirs. This is particularly true in the information conflict between China and United States.

This chapter will showcase the use of social cybersecurity tools and theory to identify and characterize information operations in the COVID-19 related Twitter Stream. To do this, we will start by discussing data collection and initial exploratory data analysis. Then we will conduct bot detection and characterization, merging bot classification with other quantitative methods. We will then look at meme classification and analysis, highlighting the use of multi-modal data for information warfare. Finally, we will briefly illustrate the use of Sketch-IO.

Throughout this chapter will highlight the importance of BEND as a foundational construct for understanding modern information warfare.

Like most event oriented social cybersecurity data collection on Twitter, our team started by establishing a Twitter Stream with select COVID-19 related terms, periodically expanding the terms as appropriate. This gives us the main foundation of data for assessment. Once we began analysis of the stream, we turned to the Twitter REST API to collect other data as necessary. This additional data is often account oriented and includes timelines, friends, followers, and at times account ID "re-hydration". We will discuss each of these below.

For COVID-19, our team began collecting data on 27 January 2020 with a limited list of keywords and expanded this list on 13 March 2020. For the data that we will focus on in this chapter, the list of keywords are: "coronaravirus", "coronavirus", "wuhan virus", "wuhanvirus", "2019nCoV", "NCoV", "NCoV2019", "covid-19", "covid19", "covid 19". The resulting stream provided the primary data that we will focus our study on. In this chapter, we will focus on the data from 15 March 2020 to 30 April 2020. The summary statistics of this data is provided in Table 1 . This is not the entire COVID-19 conversation. Many tweets may not contain these key words, or may be in other languages. Additionally, our stream is limited to 50 tweets per second, or approximately 4.3 million tweets per day. The temporal analysis below will highlight the 4.3 million daily limit on our stream.

As we go through our analysis, we will return at times to the Twitter REST API to collect additional data that is not found in the stream. For example, the stream only contains an account's COVID-19 content, but not all other content and topics. To get access to all content shared by an account, we will at times collect their account timeline (aka account history). There are also times we need to identify an account's friends and followers, which will also require us to call the Twitter REST API. Finally, at the end of our exploratory data analysis, we will try to find out if any accounts have been suspended by Twitter since contributing content to our stream. We will do this by "re-hydrating" account IDs in order to see if the account still exists. If the account doesn't exist, we will then test to see if the account was suspended or deleted by the user.

All data analysis begins with a thorough exploration of the data. The analyst should explore all of the distributions and possible relationships in the data. In our case, this means exploring the temporal, categorical, geospatial, and quantitative distributions of various fields in the Twitter Data. We collected and parsed the data using the twitter col 2 Python package that we created to assist with collecting and manipulating Twitter data. The primary statistics for the data are provided in Table 1 . We have 206 million tweets produced by 27 million users. We see that the majority of the tweets are actually retweets. In fact, only 23 million tweets (11% of the stream) is original content produced by the respective account. It is also interesting that quotes significantly outnumber replies. A quote is produced when a user references another tweet and comments on it, starting a new thread. Replies, by contrast, are addressed to the original author and don't start a new thread. We see a modest number of state sponsored tweets, with substantially more amplification of state sponsored accounts. Finally, for reference, this primary stream is 177 gigabytes in compressed gzip format or 1.54 terabytes in uncompressed format.

Next we analyze the temporal distribution of our stream. This is primarily done to study the peaks and valleys to understand when the conversation surges and ebbs. This analysis will also ensure that our tweets fall within the scope of our study (from 15 March 2020 to 30 April 2020).

The temporal distribution of our stream is provided in Figure 1 . This is not what we expect to see, and is definitely not what you would observe in smaller streams. Given the size of our stream, we are artificially limited by Twitter rate limiting. Twitter limits basic streaming API's to no more than 50 tweets per second, or 4.3 million tweets per day. It is unknown what portion of the total conversation we are getting, but given that several days fall below the 4.3 million, we believe we are collecting the majority. Twitter did open up a unlimited COVID-19 Stream in mid-May [15] , but to our knowledge this was not available during the time frame we were interested in. 

In order to measure the geospatial density of the data, we used a country level geo-inference algorithm created by Huang and Carley [11] . This model infers the country level location of tweets with 92.1% accuracy. We ran this on data from 13 April and plotted it on a chloropleth map in Figure 2 with a logarithmic scale. This shows that the vast majority of the data is from the United States, with notable contributions from Canada, South America, Western Europe, Nigeria, South and Southeast Asia. This Geospatial inference and visualization provides yet another facet of our exploratory data analysis. Given both the keyords that we used and Twitter users by country, this spatial distribution is essentially what one would expect.

Its important to explore categorical data distributions in Twitter data. In addition to the retweet/reply/quote/original content distribution that we explored in Table 1 , we also want to look at top languages, hashtags, mentions, and URL domains. The importance of hashtags ebbs and flows with time and events. We've developed the visualization found in Figure 3 to understand this changing dynamic for the top 12 hashtags found in the stream. We observe decreasing use of "corona" and increasing use of "COVID19" as a hashtag to identify pandemic related content and conversation. This reflects both the naming of the virus and the increasing knowledge in the world about the virus.

The full counts for languages and domains is provided in Table 2 . We see that the majority of the content is in English, followed by many of the prominent world languages. The only major world language that is underrepresented is Chinese, and that's because Twitter is blocked by the "Great Firewall of China." We note that much of the data from China in this data set is actually from state sponsored chinese media. The domains include several link shorteners, multimedia companies, as well as a spattering of news media companies, including both traditional and alternative news, independent and state owned media, and both conservative and liberal leaning news sites.

By its very nature social media creates links between people, and these links combine to create various types of networks. Most social media have a friend or follow functionality, which creates the most obvious social network on these platforms. Additionally, the online conversation itself will create links between accounts. Whenever an account retweets, mentions, replies, or quotes another With Twitter data, collecting friend/follower links is limited by strict ratelimiting on the part of Twitter. Most developer accounts are only allowed to scrape 5,000 friends/followers for one account every minute. The Twitter REST API will only return 5,000 friends/followers per request. For example, at the time of this writing the primary Twitter account for the United States Center for Disease Control (@CDCgov), has 2.7 million followers. To get all of the followers for @CDCGov would take approximately 2,700,000 5,000 = 540 minutes. In other words, it would take 9 hours to scrape the followers for this single account. Scraping links for all of the accounts in the COVID-19 conversation (and most conversations) is therefore not realistic. For this reason we often visualize parts of the conversation network, for which we already have the data and is more useful in understanding the conversation. In our visualization of the network, we visualized the mention network, though at other times the retweet or reply network may be appropriate.

The Twitter JSON structure has all of the information we need to create mention, retweet, or reply networks. As discussed at other times in this book, we created the public facing twitter col 3 Python package to collect and manipulate Twitter data. The twitter col package has several functions that make it easy to parse networks from raw Twitter JSON data. The ORA software [6, 7] , both ORA-PRO 4 and ORA-Lite software 5 has functions for parsing Twitter data into networks. These functions were used on the COVID-19 stream. Visualizing large networks can be difficult. The COVID-19 mention network contains 25,673,160 nodes and 152,481,141 edges (density = 0.000000231). Very few software packages can visualize a network of this size, and none of the software solutions that our team commonly uses can visualize this network. For this reason, we chose to visualize just the core of the network. To do this, we found the k-core of the network. The k-core of graph G is the maximal connected subgraph of G where all the nodes (vertices) have a degree of at least k. While experimenting with k with our mention network, we found the k = 100 was adequate. This means we will visualize the core of the mention network in which all nodes will have a degree of at least 100. This core network is dense (density = 0.000947), which means there are more edges than we are able to visualize. To sparsify the graph, we sampled one million edges to visualize. This final core network contains 159,533 nodes and 1,000,000 edges with density = 0.0000393.

In Figure 4 we visualize the core of the mention network colored by language using the Graphistry 6 software. We could also visualize larger networks with ORA-PRO or the sigmaNet 7 package in R. In Figure 4 we see the inter-connection between the English, French, Spanish, and Portuguese conversations. We also can see the community groups that are clearly evident in the larger conversations, particularly the English and Spanish conversation. 

Given that we now have networks parsed from our COVID-19 stream, there are a number of network science techniques that we can use to help understand this network. For now we will focus on the measuring which accounts (or nodes in the network) are the most influential (or central) to the network. Measuring centrality is an important step in network science, and there are many different methods that have been published on how to do this, with each technique measuring a slightly different definition of influence and importance. For example, degree centrality measures influence by number of connections whereas betweenness centrality measures influence by those accounts that bridge communities.

For our analysis, we chose to measure centrality and influence by eigenvector centrality [12] . Eigenvector centrality measures influence by finding accounts that have the most connections to influential accounts. In other words, not all links are created equal, and an account is more influential if it is connected to many nodes who themselves have high scores. We believe that this mirrors the way that many people view influence in the physical world, and therefore use eigenvector centrality for our analysis. We also find that eigenvector centrality, unlike measures like betweenness, is computationally practical on large networks.

The top influencers as measured by eigenvector centrality for both the mention and the retweet networks are shown below in Table 3 . As we will discuss later, the central accounts in the mention network are politicians and celebrities that we expect. The central accounts in the retweet network, however, includes many bots, which we will elaborate on later.

It is often important to evaluate the quantity and nature of suspended accounts in any event oriented stream. Twitter and most social media companies suspend accounts that frequently violate their terms of service. These violations could include frequently posting violent, racist, or other unauthorized content. It also could be that the accounts display unauthorized automated activity (i.e. they are a bot). By identifying and evaluating suspended accounts, we get a sense of how the social media company has been "cleaning" this particular stream.

To identify suspended accounts, we "re-hydrate" account IDs in order to see if the account still exists. If the account doesn't exist, we will then test to see if the account was suspended by Twitter or deleted by the user. The workflow begins by identifying all unique account ID's in the stream. We then "rehydrate" them in batch mode using the Twitter Rest API. Using batch mode is a fast method to "re-hydrate", but does not provide any feedback for missing accounts. Having rehydrated account IDs, we then identify those that are missing with M issing = T otal − Rehydrated. There are several reasons why an account could be missing. The two most common are that the account was deleted by the user or was suspended by Twitter. To determine which of these is the case, we individually attempt to rehydrate the missing IDs (not in batch mode). This provides a detailed response that indicates whether the account was deleted or suspended.

Given the size of the COVID-19 stream, we randomly sampled 1 million users from the available 27 million unique users in the stream. After "re-hydrating" in batch mode, we determined that 70,842 were missing. Using this number, we estimate that 7.0842 ± 0.05% of the accounts in the stream have been deleted or suspended (estimated using 95% confidence interval).

Next we want to estimate the number that have been suspended. To do this we now individually attempt to "rehydrate" the missing IDs. When we do this, we find that 9,639 accounts had been suspended by Twitter. Using this number, we estimate that Twitter has suspended 0.9639 ± 0.019% (estimated using 95% confidence interval). Using these IDs, we made another pass through our stream, and determined that these 9,639 accounts produced 93,841 tweets that contain the terms we were filtering for the stream. Running Bot-Hunter Tier 1 algorithm on this data we find that 73.4% of the suspended users have strong bot-like characteristics. It is often helpful to sample several of the suspended accounts and view their tweets to determine if they were participating in information operations, and if so what was the message and who was the target audience. For example, the tweets of suspended account @Scopatumanigga are provided in Table 4 . Our first observation is that nearly all statuses are retweets, which is highly indicative of bot behavior. Given the account screen name and content, it appears that this likely bot account is attempting to infiltrate and influence African American virtual communities on Twitter. The likely intent is to amplify racial divides in order to create instability in the United States. Table 4 : Tweets from suspended account @Scopatumanigga RT @tonyhawk: Ive been sick lately (not sick AF just sick) with symptoms other than COVID-19. But I know two friends in the U.S. with co RT @mvrlyns: so after the Coronavirus blows over, will yall continue to practice good hygiene and sanitation? ... or will yall go back to RT @KenichiAL: Joe Biden:

The tests for the coronavirus should be free Bernie Sanders: The vaccines and treatment for the coronavirus shou RT @workerism: Haven't been able to stop thinking about this. A US pharma company with a potential COVID-19 vaccine is in court trying to P RT @CrypticNoOne: 123movies would never do this RT @Carnage45 : Black athletes give back during every crisis. I'm not saying it doesn't happen but I don't be seeing Tom Brady, Mike Trout RT @shanalala : The elite getting tested without any symptoms and commoners with all the symptoms are denied tests. RT @elliecampbbell: I know its necessary to stop the spread of covid-19, but self isolation, no school etc and everything being shut/cance RT @EricHaywood: There are 16 people in this photograph RT @eugenegu: @realDonaldTrump There it is. Ive been deathly afraid of this exact moment where Trump turns to racism and xenophobia and ca RT @baeonda: Im 22 years old and I tested positive for COVID-19. Ive been debating on posting, but I want to share my experience especi RT @SocialistWitch: Coronavirus is not mother nature's 'cure' for 'evil humans'. The Earth doesn't suffer from humanity. It suffers from RT @RaeOfLite: Hi. Yes, it originated in China, but the technical term is Covid-19 Your mom originated from the back of a Buick Skylark @swevenpjm @ashleytwo @lydiakahill How's this covid19 treating you Bernie Sanders on his way to the hospital after he sees this tweet URL RT @ChapatiPapa: So they cashed out. Hoarded supplies. Moved money to companies they believe could make the vaccine, and testing kits all t RT @Claireific: Hey remember that time I said that tell me about Tuskegee should be a required interview question for medical applicants RT @Nigensei: Empty hotels all over the city of Las Vegas and theyre putting the homeless in a fucking parking lot. RT @DecaturDane: SECOND WAVE??? URL RT @hoodcuIture: I swear Christians and colonizers never stop!! How an ISOLATED GROUP get infected RT @BreeNewsome: whyyyyyyyyyyy do we accept such a lower quality of life in this country in exchange for nothing but slogans &amp; confetti RT @ iamtiredLord: this baby was shot 20 times in the head by a grown ass man because he wanted her girlfriend who rejected him. THEY ARE 1 RT @thespinsterymc: The family members of the Johnsons have made it clear that medical neglect killed them. Stop romanticizing this antibla RT @LackingSaint: got to be honest, it's starting to feel like we're just doing things we feel like doing and saying it's in support of hea

Bot detection is a critical step for social cybersecurity workflows. As discussed in Chapter 7, bot detection often helps to delineate an information warfare campaign, as well as illuminate lines of effort (topics) and target audiences. It also sheds light on the scale, level of sophistication, and at times attribution for the operation.

In this section we will discuss how to appropriately deploy bot detection algorithms for social cybersecurity. We will start by estimating the accuracy of our algorithms on the given data stream as well as selecting an appropriate threshold for the data at hand (in our case the COVID-19 stream). We will

discuss where each of the Bot-Hunter tiers should be used in the workflow. Having run Bot-Hunter on all 27 million accounts, we will use it to find influential bots, lines of effort, target audiences, and foreign influence.

Before using any bot detection tool on a given event or topic oriented data stream, the analysts should verify it's accuracy on the stream as well as determine an appropriate threshold. As we discussed in Chapter 7, training data matters for bot detection. We need to verify that the model that we are trying to use, and its respective training data, are appropriate and predictive for our event stream. In our case that means verifying that the bot detection model works on our COVID-19 stream. To make this evaluation, we need a small labeled dataset from our event stream. For the COVID-19 data, we created a list of all unique user ID's that were found in the data, and then randomly sampled 200 accounts from this list. We then manually labeled the accounts using a custom workflow that we've developed. After manually labeling the 200 accounts, we evaluated the proposed bot-detection algorithms on the data.

With labels and bot-detection scores in hand, we can measure performance with various metrics (accuracy, precision, recall, F1 score, ROC-AUC, etc). These scores will tell us how well our models are generalizing to COVID-19 data. The scores are shown in Table 5 for default settings of threshold = 0.5 for all models. From Table 5 we first and foremost determine that Bot-Hunter Tier 1 should be our primary bot detection model, with a higher F1 score and good balance of precision and recall. From Table 5 we can also determine that the Botometer model as well as the Bot-Hunter Tier 2 model don't seem to generalize as well to the COVID-19 Twitter stream. The Bot-Hunter Tier 0 model, which appearing to perform well, will only be used on specific tasks because it is only able to predict English speaking accounts and because it tends to have a higher false positive rate (in this test the false-positive rate is three times larger than the Bot-Hunter Tier 1). These scores, however, are sensitive to the threshold that we choose. In order to choose an appropriate threshold, we use our labels and bot detection scores and plot precision-recall curves as shown in Figure 5 . Remember that precision and recall often have an inverse relationship. As precision increases, recall decreases, and vice versa. Recall monotonically decreases, whereas precision does not monotonically increase. The exact choice of the threshold will depend on the context and any related policy decisions. If the policy decision requires a low false-positive rate, then choose a threshold with high precision. If the policy decision or analytical goal requires a low false-negative rate, then choose a threshold with higher recall. For most tasks it is best to have a balance of precision and recall, which is why we often use F1 score to measure the performance of bot detection algorithms. In choosing a bot detection threshold, our goal for the COVID-19 stream is to characterize the entire conversation. We want to have robust characterization of the entire forest, not necessarily precise analysis of individual trees. This goal requires a good balance between precision and recall. As indicated above, our Bot-Hunter Tier 0 Text model tends to produce a high false positive rate. For this reason, we will use a threshold of 0.7 for this model in our COVID-19 Stream, which cuts false positive rate by a third. For Bot-Hunter Tier 1 we will retain the 0.5 default threshold, since it provides a good balance with precision and recall. For Botometer and Bot-Hunter Tier 2, we will use a threshold of 0.3 in order to increase recall. The adjusted performance is provided in Table Since the Bot-Hunter Tier 1 algorithm is our primary algorithm, we've visualized the probability distribution for all COVID-19 Accounts in Figure 6a with threshold = 0.5 and threshold = 0.65. Here we see a large number of human accounts, as well as a large number of accounts that are in the middle, with decreasing numbers of high probability bots. The threshold = 0.5 is our default model, while we can at times use threshold = 0.65 for increased precision. Note that for each actor they will have a botscore for each bot detection tool. These algorithm identified bots can be more accurately described as actors with bot-like characteristics. Depending on which tool is used and which threshold is used the number of "bots" that are identified will vary. For example, on any given day depending on what is used, the number of "bots" may vary from approximately 25% to 48%.

Now that we have bot prediction thresholds, we can begin to use the models to understand the COVID-19 stream and the conversation and actors involved in it. In Figure 7 we visualize the sparsified core of the mention network (k core = 100), colored by bot prediction with Bot-Hunter Tier 1 and threshold = 0.5. We see that bot-like accounts are highly embedded in the core of the conversation, and connect to and mention influential accounts in an effort to manipulate these personalities and their followers. In this visualization, we have also zoomed in to get a better feeling of the structure of the network and so that we can highlight the location of prominent English speaking accounts.

The most surprising bot detection analysis that we found was in regards to account creation date. Anytime we analyze a list of accounts, it is often enlightening to visualize a histogram of their account creation date. The Twitter JSON contains a field in the user object that records the date, hour, minute, and second that the account was originally created. This date will be some date between March 21, 2006 (the day Twitter started) and the current date. We've found it best to bin this by day. Each bar in the resulting histogram will contain all accounts that were created on that day. Any large spike of accounts created around the same time should cause us to dig deeper to check for the presence of a bot "army".

In Figure 8 we visualize the account density plot for COVID-19 colored by bot percentage. The coloring indicates what portion of accounts in that bar have a Bot-Hunter Tier 1 score greater than 0.5. Green indicates bars that have few bots, while red indicates bars that have higher proportion of bots.

From Figure 8 we see that a large number of bot-like accounts have been created since the pandemic began and then immediately deployed into the conversation. In fact, of the 27 million accounts participating in the conversation, 1.5 million have been created since 1 February AND have a Bot-Hunter Tier 1 score greater than 0.5. Undoubtedly part of this surge in accounts is created by individuals who are stuck at home and decided to create a Twitter account. A significant portion, however, appears to be bot armies. These bot armies are produced by a number of actors with a variety of agendas, but likely all involve manipulation of the marketplace of beliefs, ideas, and collective action.

Next we want to intersect our measure of influence (eigenvector centrality) with a bot prediction score in order to identify influential bots. In Table 3 we list the top 50 most influential accounts in the mention network and retweet network as measured by eigenvector centrality. Table 3 also provides the Bot-Hunter Tier 1 bot score, with red text indicating accounts that have a score greater than 0.5. Looking at the macro-level comparison, we see more bots are involved with the retweet network than the mention network. This makes sense given that bots are often used to scale amplification, and the easiest way to amplify is with retweets. We also see many more verified politician, news, and government accounts in the mention network. Many news accounts have "bot-like" behavior, with @FoxNews, @MSNBC, and @CBSNews accounts surpassing the 0.5 threshold. It has been documented that many news and celebrities can have bot-like behavior [9] , which is supported in our analysis. We also see in the retweet network that several accounts that are not classified as bots nonetheless have bot . We also note that it is possible for users to employ software or a bot on occasion from the same account. This hybrid form that is human and bot we refer to as a cyborg. Cyborgs will also tend to exhibit bot like characteristics; however, they are likely to be lower in their values than a totally automated account.

Next we want to try to separate the stream into topic groups. We will do this with Latent Dirichlet Allocation (LDA) model [4] . To do this, we concatenated all English hashtags by account and then performed LDA with k = 5. We chose to concatenate hashtags rather than use raw text for computational tractability and because hashtags provide tokens that capture the essence of topic and meaning. Word Clouds of the resulting five topic groups are shown in Figure 9 . We see that Topic Groups are differentiated in some ways by geography and in other ways by politics. This topic groups allow us to segment the conversation and focus on a topic of interest, for example a certain geography (the Nigerian conversation) or a specific political affiliation (the conservative political conversation).

The choice of k is just as much an art as a science. If you want to get a view of the macro topics, use a smaller k, like we did here with k = 5. If you want to extract a very specific conversation (i.e. the liberal conversation in Canada), you will have to increase k in order to sufficiently isolate the topic of interest.

Similar to topic analysis, we also want to look at community groups. While topic analysis looked at semantic topics that seem to cluster together, community groups look at accounts that tend to cluster together regardless of topic. We ran louvaine community detection [5] on both the retweet and mention network, and then looked at influential accounts and top hashtags for each of the top 10 communities. A summarization of the community groups in the retweet network are summarized in Table 7 . Here we see once again that some community groups are geographically and/or linguistically oriented, while others are politically affiliated. 

Now that we've explored bots in the data, we will begin to look at other ways to characterize accounts. These include analysis of biased or questionable sources, abusive languages, and use of national flags.

Next we will look at the political bias found in the URLs. To do this, we will use the dictionary approach presented in Chapter 11. Using this approach, 18.2% of the total URL domains were found in the dictionary lookup that we built in Chapter 11. Having estimated the bias and factual content in the URLs, we visualize this distribution in Figure 10 where the bar plots are colored by the proportion of bot involvement. From this we first see that the highest number of URLs are coming from the Center and Center Left political bias, and generally have high factual content. We do see the presence of fake, satire, and conspiracy theory sources in this stream, which are often correlated with the low factual content seen in Figure  10b . We also see that bots have a higher degree of correlation with URLs from the far-right and fake-news biases, and to a lesser extent from the far-left. We also see high bot correlation with URLs containing low factual content. These discoveries largely confirm our assumptions going into the analysis.

We used the multi-lingual dictionary based algorithm presented in Chapter 1111 to identify tweets that contain abusive language. The daily volume of abusive is presented in Figure 11 . We did not normalize this visualization since our total daily count of tweets was held constant at 4.3 million tweets. If this was not true, it would be appropriate to normalize this (plot proportion instead of raw count). This allows us to identify events that seem to aggravate the population of active Twitter users. The two prominent spikes are tied to political events and voices in the United States, with the first spike tied to actions by US Congress and the second spike tied to comments by the US Executive Branch.

We found that 57.7% of the accounts that share abusive content had bot like characteristics. This is significantly higher than the 41.8% of non-abusive accounts that have bot-like characteristics. This means that within the COVID-19 stream bot-like accounts are used to produce or promote abusive content.

As discussed in Chapter 11, at times flags in the user description can indicate suspicious accounts. This is especially true with multiple flags. To explore this in the COVID-19 stream, we extracted all flags in the account descriptions for all 27 million accounts. We've plotted the distribution of these in Figure 12 Reviewing the distributions in Figure 12 , nothing in the one and two flag distribution is unexpected or necessarily cause for further exploration. Once we get to the three, four, five, and six flag distributions, however, these are likely suspicious accounts. In particular many of these accounts have multiple Western nations (US, Canada, and European Nations), and may be used to manipulate multiple Western nations while appearing to be an expatriate.

6 Image/Meme Analysis

As discussed in Chapter 10, memes are a powerful way to connect a message to a target audience. Memes evolve as they propagate through a society. Given the size of the COVID-19 Stream, we sampled 1 million images (approximately 10% of the total) and conducted meme classification on these. The Meme-Hunter model classified 37,473 images as memes. A collage of these memes is found in Figure 13 .

Given the massive impact of COVID-19 on society and daily life, many memes were innocent humor designed to help folks get through some very tough circumstances. We did find a number of political memes, however, some targeting domestic pandemic policy discussions and others targeting geo-political competition. The domestic policy memes were trying to use image and text to argue for one of the competing priorities: namely the safety of society or the economic foundation of that society. The geo-political memes were likely created by nationstates or nation-state proxies, with many memes created by Russia, China, and Iran, which will be discussed in more detail below.

As discussed in Chapter 10, the meme-hunter suite of tools includes a special meme Optical Character Recognition (OCR) pipeline for meme images. This was also presented in detail in [3] . We ran meme OCR on the extracted memes from 1 million images, and conducted wordcloud visual analysis of the results. These results are found in Figure 14 .

From these results we see that a number of general coronavirus memes. We also see that a number of memes are targeting government leaders, ministers, and agencies.

Next we used open source facial recognition software 8 to identify prominent politicians and world leaders in the memes. Facial recognition software simply identifies personalities, but does not indicate whether the meme is supporting or attacking the specific personality found. A distribution of memes about prominent US politicians and other world leaders is given in Figure 15 . Here we see that prominent world leaders are the target of most of the memes in our sample, with Xi Jinping and Donald Trump in the first and second place positions, respectively. This also highlights that much of the COVID-19 discussion and information conflict is between the US, China, and Europe.

We next calculated the evolution of the 37,000 memes that Meme-Hunter classified in our sample. The network was created by using a VGG deep learning model [14] and extracting the last layer before softmax. Using this 25,088 dimension vector to represent the image, we then conducted radius nearest neighbor graph learning with distance = 1200. In Figure 16 we see that many of the darker image memes are clustered together in the center, with other prominent memes evolving in clusters that are separate components. We've highlighted the evolution and links between two of the memes. We were able to use the COVID-19 data to test the Sketch-IO prototype application for sketching and analyzing information operation campaigns. Given that our prototype application was not able to ingest the entire stream due to its size, we instead ingested all data produced by or propagating state sponsored media. The Sketch-IO application proved effective and responsive at quickly performing a number of battle drills to analyze this data. Example usage and screenshots are provided in Figure 17 .

In this section we will identify overt and seemingly overt propaganda operations by nation state actors. Covert and black propaganda is much harder to detect and attribute to a nation state actor (black propaganda is designed to make the victim appear to be the perpetrator).

We will use the bot labeler function for finding state sponsored media and state sponsored media amplification as a proxy for finding propaganda. In Figure 18 we show the number of retweets of state sponsored media accounts as measured by bot labler. This image is colored by percentage of bots that are retweeting these accounts. As indicated in Chapter 11, these state sponsored media vary drastically in purpose and independence (the purpose of Russian RT is different than the US Voice of America). That being said, we clearly see that Russia and China are investing heavily in producing and promoting state sponsored media and messages. These messages are amplified by both bots and legitimate accounts. 

To some extent, the COVID19 pandemic seems to be a turning point for the Chinese in regards to information operations. Traditionally, Chinese information operations focused on positive narrative largely pushed by the "50 cent Army", a low-paid or otherwise coopted group of online netizens. Traditionally, they did not conduct "higher risk higher reward" operations that relied on negative, antagonizing and controversial statements. This slowly started to change with the Hong Kong protests in 2019, and fully changed with COVID-19 pandemic, with China seeming to adopt more aggressive IO practices and operations. In February 2020, Lijian Zhao was promoted to the Deputy Director General, Information Department, Foreign Ministry. Foreign Minister Zhao has a history of aggressive information operations, and seems to be implementing this in the Chinese information operations as well as personally on social media. On March 12, Zhao posted a tweet that suspected the US Army of bringing the coronavirus to Wuhan province in China (see Figure 19 ). Zhao seems to be a key personality leading and directing China's more aggressive approach to information operations. Fig. 19 : Foreign Minister Lijian Zhao's rise to director of Chinese information operations seems to correlate with a more aggressive approach than historical Chinese information operations (Tweet ID: 1238111898828066823).

Chinese propaganda is amplified by Chinese government representatives. Chinese government officials around the world are able to amplify state sponsored media without questioning it, knowing that only approved messages are published. In the COVID-19 stream, the Chinese Ambassador to Venezuela has the 2nd highest number of state sponsored mentions/retweets. He retweeted or mentioned Chinese state sponsored Spanish and English COVID-19 content 769 times. The increasing penetration of Chinese state sponsored media around the world is pushed by these legitimate accounts.

It also appears that China often uses trolls instead of bots. With easy access to human capital, the Chinese seem to prefer the control and nuance that trolls allow compared to a bot Army. Many of the suspicious Chinese accounts contain enough nuance and temporal patterns to consider them a troll rather than a bot.

We see increased use of meme warfare by China in the COVID-19 stream. Historically, China has seemed hesitant to use memes, potentially because memes propagate through evolution, and this evolution is outside of the control of the state [2] . China has especially been concerned with the evolution of memes within their own population, and has banned some memes [3] . In COVID-19, however, we see them developing and deploying memes in a way that is more akin to Russian information operations. An example of a Chinese meme is provided in Figure 20a . Here we see a message tied to an image that has clear cultural relevance and traction within the target audience. Additional examples of Chinese Memes that were found using the Bot-Match methodology are provided in Figure  26 below.

We also found evidence that China is starting to interlace adult oriented and humorous content with their information in order to increase traffic, particularly from certain demographics. Examples of this are found in Figure 21 ). This has historically been a key part of Russian IO, and it seems that China is increasingly adopting similar practices. Explicit adult content is often designed to attract and manipulate the minds of young impressionable men.

In the Chinese propaganda, we see some propaganda uses Chinese language while others uses English language content. The English text is often accompanied by memes that connect the message with American audiences. The English language propaganda is arguably targeting American audience, attacking leaders and institutions in America. It is also designed to strengthen American's view of China, Chinese leadership and the Chinese Communist Party (CCP). The propaganda that uses Chinese language text is arguably targeting Chinese audiences within China's borders as well as in other Asian countries. This propaganda is designed to increase nationalistic fervor within China's borders.

Within the Chinese propaganda that we viewed, the vast majority was singularly focused on the United States. This differs from Russian information operations, which focuses more broadly on the West, adding European actors to their list of targets.

Within the COVID-19 stream, Russia appears to be staying with their historical information playbook. With extensive experience in manipulating world opinion with their "active measures" throughout the Cold War, Russia has long had one of the most aggressive and well-resourced information operations capabilities among nation states. Russia's information operations are tightly coupled with their other cyber operations.

Russian information operations rely on increasing penetration of their state sponsored media around the world. RT, Sputnik, and other state sponsored media outlets offer news in many languages around the world. These media outlets offer news stories that support Russian information narratives, as seen in Figure 22 . In this Figure that Russia provided more help to the Italian population than did the European Union.

As seen in Figure 22b , Russian operations still use large and sophisticated bot "armies" to push their content. This observation is supported by the quantitative analysis seen in Figure 18 , which shows the amplification of Russian state sponsored messages is 78% bots.

As indicated above, Russian operations target the West in general, with European targets receiving almost as much emphasis as the United States. This differs from Chinese operations, which seem to primarily target the United States, with smaller efforts directed at Europe. In Figure 23 we see examples of Russian state media trolling the West. While Russian state sponsored media advertise themselves as news organizations, this content is arguably well beyond reporting unbiased news.

On 6 April 2020, Iran initiated a concerted attack on the United States trying to encourage California to exit the union. Most of this effort was tagged with #CalExit. By the time the dust settled approximately 30K tweets had been launched by a several thousand strong bot/troll Army. This is not the first time that #CalExit (or other similar messages like #Texit), have become trending hashtags on Twitter largely due to foreign information operations. The 6 April surge appears to be largely Iranian influence operation that was triggered by domestic political tension in the United States. In the days and weeks preceding 6 April, tensions between US national leadership and California leadership intensified, and the governor of California referred to California as a "nation-state" [13] . The Iranian government and/or proxies appeared to be monitoring these political tensions in the United States, and timed their #CalExit campaign to capitalize on them. This 24 hour information operation had some creative content with many human/troll accounts and limited automation. The creative content can be seen in the meme collage in Figure 24 .

As discussed in Chapter 9, bot-match can be a very powerful tool for finding similar accounts given a seed account. Bot-Match allows you to find similar accounts where similarity can be defined by network proximity, semantic proximity, or a combination of both. The size of our network and tweet corpus limit the number of models available for measuring similarity. We first concatenate all user text, thereby aggregating content to the account level. Because of the size of our corpus, we then used cosine similarity on a document-term matrix (also known as a bag-of-words) with 4000 top words. This was created for English tweets.

We used this to identify accounts that were propagating Chinese propaganda in English. We used @SafiSina1 as our seed account (this account was discovered above in Figure 20b ). We illustrate in Figure 25 how we recursively build out the Chinese propaganda network with Bot-Match. Notice that we did not have to conduct any elaborate labeling or training process, we just needed to start with our seed node and the document-term matrix. All accounts seen in Figure 25 are amplifying Chinese state sponsored media, and are each embedded in slightly different networks.

Using the Bot-Match methods illustrated in Figure 25 , we were able to identify approximately 20 additional accounts that appeared to be propagating Chinese propaganda targeting America. We then used the Twitter REST API to scrape the timeline history of these accounts, extract image links, and run Meme-Hunter on the images shared by these accounts. We found that indeed all of these accounts were conducting very targeted information operations against the United States around the COVID-19 pandemic as well as the American protest that followed the death of George Floyd at the hands of Minneapolis police officer. A collage of a sample of these targeted memes is provided in Figure  26 . Throughout our analysis of foreign influence, we observed their use of the BEND forms of maneuver. We continue to observe that Russia closely intertwines narrative and network maneuvers. They conduct long and protracted efforts to infiltrate target audiences before interjecting narrative. China, while working hard to control and manipulate the narrative, does not appear to infiltrate targeted subcultures. They do appear to tie their narrative to the target audience culture as seen above with memes based on the Friends sitcom as well as memes that use the George Floyd protests to support pro-CCP policies. With the limited network maneuver, it remains to be seen whether their information operations gain traction or simply become a "shot in the dark." Iran also appeared to launch large attributed information campaigns focused on a specific narrative such as #CalExit, without first preparing target networks. Once again, these operations may become information warfare "chaff" with limited effects.

The primary goal of this case study was to illustrate social cybersecurity workflows in a relevant event. Using the apt COVID-19 Twitter stream from 15 March to 30 April 2020, we demonstrated how to collect the data, conduct initial exploratory data analysis, conduct and use bot and meme classification and exploration, conduct account characterization, and demonstrate the role of Sketch-IO and the BEND Framework. This chapter therefore serves as an example for social cybersecurity researchers on how to leverage these tools to identify and characterize offensive information operations targeting their society, institutions and culture.

In regards to the COVID-19 pandemic, we found large information operations trying to manipulate domestic and international perceptions, beliefs, and actions. At the domestic level the information conflict was largely over pandemic policy, particularly whether public safety or the economy were more important. At the international level we identified attempts to manipulate international perception of the origins of the disease as well as perceptions of each countries handling of the disease. We also see nation-state efforts to amplify tensions and drive wedges in existing fissures in rival nations. Throughout the data we see bots and trolls used to scale and spread narrative, therefore acting like a "forcemultiplier" in information operations. We see Russia continue to use memes, and China massively increase their use of memes in information warfare. Both nations study their target audience and choose relevant cultural artifacts to connect their memes and messages to the target audience.

Even as the COVID-19 coronavirus moved much of business and society to use virtual platforms for social interaction as well as business and collaboration, it also allowed many nation-states to increasingly use virtual platforms for competition in the information space with ramifications for geo-politics. While the effects of these campaigns are hard to measure, their scale and persistence require social cybersecurity policy and process.

Characterization and comparison of russian and chinese disinformation campaigns

The evolution of political memes: Detecting and characterizing internet memes with multi-modal deep learning

Latent dirichlet allocation

Fast unfolding of communities in large networks

Ora: A toolkit for dynamic network analysis and visualization

Ora user's guide 2020

The rise of social bots

Classification of twitter accounts into automated agents and human users

IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

The great lockdown: Worst economic downturn since the great depression imf blog

On predicting geolocation of tweets using convolutional neural networks

A new status index derived from sociometric analysis

Coronavirus in california: Gavin newsom's response -the atlantic

Very deep convolutional networks for large-scale image recognition

Overview twitter developers

This work was supported in part by the Office of Naval Research (ONR) Multidisciplinary University Research Initiative Award N000140811186, Award N000141812108, ONR Award N00014182106 and the Center for Computational Analysis of Social and Organization Systems (CASOS). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the ONR or the U.S. Government.