key: cord-0432945-tak9evml authors: Chire-Saire, Josimar E. title: Characterizing Twitter Interaction during COVID-19 pandemic using Complex Networks and Text Mining date: 2020-09-11 journal: nan DOI: nan sha: 0fe9cd00d482996ad281c5d588a00b90ab110adb doc_id: 432945 cord_uid: tak9evml The outbreak of covid-19 started many months ago, the reported origin was in Wuhan Market, China. Fastly, this virus was propagated to other countries because the access to international travels is affordable and many countries have a distance of some flight hours, besides borders were a constant flow of people. By the other hand, Internet users have the habits of sharing content using Social Networks and issues, problems, thoughts about Covdid-19 were not an exception. Therefore, it is possible to analyze Social Network interaction from one city, country to understand the impact generated by this global issue. South America is one region with developing countries with challenges to face related to Politics, Economy, Public Health and other. Therefore, the scope of this paper is to analyze the interaction on Twitter of South American countries and characterize the flow of data through the users using Complex Network representation and Text Mining. The preliminary experiments introduces the idea of existence of patterns, similar to Complex Systems. Besides, the degree distribution confirm the idea of having a System and visualization of Adjacency Matrices show the presence of users' group publishing and interacting together during the time, there is a possibility of identification of robots sending posts constantly. Nowadays, the use of Social Networks to communicate, share information, thoughts, ideas is very common. Usually, people is creating posts, writing during the day and tagging friends, colleagues, etc. Therefore, all this flow of data can represent the actual status of the citizens. Besides considering pandemic covid-19, users can reflect what they are thinking, feeling in front of the global issue related to the pandemic in their cities, countries. In consequence, this behaviour can be analyzed to monitor the situation of the population, health area as Infomediology studies the behaviour through data and Infovelliance is the application using Computational tools and directed/undirected sources of data. Actually, there are many studies related to covid-19 to analyze people's behaviour using Social Networks, in particular Twitter because this Social Network presents facilities to access data and the quantity of data can be representative, i.e. top concerns of users [1] and focus in countries as scope of study: Italy [2] , Peru [3] , United Kingdom, Unites States [4] , Mexico( citeChireSaire2020.05.07.20094466, Colombia [5] , United States [6] , Ghana [7] , France [8] . These studies uses a Data Mining approach and Natural Language Processing techniques to describe and understand the phenomenon in many levels: public health, social, mental health and more. By contrast, this flow of data can include misinformation produced intentionally through robots. One approach to represent and analyze this networks of users exchanging data is to use Complex Networks. Complex Networks had many applications in different areas: Physics, Biology and Social Sciences. Then, Complex Networks is a capable representation to study the interaction of users, i.e. [9] studies the top users in Spain using Twitter as source data. The contribution of this paper: • Select South America as scope and study the covid-19 pandemic influence on this region, open the possibilities of studying this phenomenon and provide a proposal for this analysis, section II ) • Introduce Complex Networks to study this phenomenon getting data from Social Networks and find pattern related to Complex Systems. • Find a affordable way to identify network of users with constant flowing of data, beside the possibility of finding robots, fake users through this mechanism, section III. This paper is analyzing the interaction of South American users where Spanish is official language, through Twitter Social Network. Considering Internet Access and density of population, the capital of each country were selected for the analysis. The dataset is a collection of tweets from 08 March to 11 July using Twitter API, the table Tab. 1 describes the dataset. The Complex Network was created considering the next criterion: • Pick the N 1 users with more posts during the period of study • Search @tag mentions of users inside of the tweets to find users connection • Find a global list of users and create a set from this elements to avoid duplicated users • Create a global text with text for each country to find the users and count them • Create the edges considering the N 2 top users and the set of users with the frequency as weight This section explains the performed experiments to describe, analyze and understand the interaction of South American Twitter users. The first experiment used N 1 = 500, N 2 = 100 and created a directed graph with no associated weights. The graphic 1 is presenting the results for Peru country, then it is possible to notice a kind of reticular pattern. Therefore, there is the possibility of existence of users' group. For the next experiment, the frequency of @tag users is considered to set the weight of the edges. The first experiment used N 1 = 2000, N 2 = 200 and created a directed graph with no weights related. The graphic 2 introduces the result for Argentina, the number of edges for this Complex Network is higher than 20,000 and the weight are very disperse, then a log scale is introduced. In spite of the adaptation it is not possible to perceive the edges or connections between users. For the previous reason, a filtering is performed considering the degree distribution(see Figure 3 ) and only edges with weight higher than 200 are considered. Besides, it is important to notice the presence of a distribution similar to Levy's Distribution. The results for Argentina, Bolivia, Chile, Colombia, Ecuador, Paraguay, Peru, Uruguay and Venezuela, besides a color bar is showed to express the strength or weight of each edge, see Fig. 4 . Analyzing the present results, it is possible identify network of users in each country. Besides, Bolivia is a country with less number of users therefore the filtered matrix is smaller than other countries. At the same time is possible to identify the line structures for Colombia, Ecuador, Peru, Paraguay, Uruguay and Venezuela. This lines can mean or represent a group of users with constant tagging between themselves. Venezuela calls our attention, considering the size of the filtered matrix and the extension of the lines, it is possible to identify two big groups of 100 and 150 users tagging themselves durin the period of study. To conduct a deep analysis about the existence of users' groups an algorithm to find communities is performed over the complete graph. The results are presented in Table II and Figure 5 . A previous hypothesis about Venezuela can be confirmed considering the number of present communities in the interaction of users. And, the images representing the communities can express a big concentration of nodes in the center respectively showing the tagging of some specific users. This preliminary study to analyze Twitter interaction of South American around covid-19 pandemic shows promissory results. First, a text mining approach is used to process text and find users. Second a proposal is performed in experiments sections showing the viability of creating Complex Network with the proposal. Finally, visualisation techniques are proposed to analyze the matrix adjacency of each country, a filtering process to select most representative behaviour and discovering of communities, Venezuela arises a concern about intentional group of users publishing content during this period of study. Future work, involves to perform an analysis of every week with the aim of finding changes in the interaction, number of top users and describe all this behaviour using Complex Networks and features related to degree measurements of the nodes, edges. The author wants to mention Research4Tech, an Artificial Intelligence community of Latin American Researchers for promoting Science and collaboration in Latin American countries, his roles as integrator between Professional, Researchers, Technology communities is key to develop the Latin American region as a strong body. Top concerns of tweeters during the covid-19 pandemic: infoveillance study An infoveillance system for detecting and tracking relevant topics from italian tweets during the covid-19 event Covid19 surveillance in peru on april using text mining Public opinions towards covid-19 in california and new york on twitter How was the mental health of colombian people on march during pandemics covid19?" medRxiv Feeling positive about reopening? new normal scenarios from covid-19 reopen sentiment analytics Study of coronavirus impact on parisian population from april to june using twitter and text mining approach Twitter interaction to analyze covid-19 impact in ghana, africa from march to july Characterizing information leaders in twitter during covid-19 crisis