key: cord-0559742-a7c6n4ja authors: Saire, Josimar Edinson Chire; Zuniga, Esteban Wilfredo Vilca title: Analysis of Users Reaction around Impeachment in Peru using Twitter date: 2020-10-10 journal: nan DOI: nan sha: a25c0a2ea347cdeef122681758a9ad8be2556450 doc_id: 559742 cord_uid: a7c6n4ja Covid-19 pandemic generated many problems and show other hidden issues in countries in South America. Every government analyzed his own context and decided which health policies would be used. Peru is a country in the middle of South America region, the first reported case was on March 6. Besides, a lockdown was established in ground borders, sea and air. Peruvian government analyzed the context and proposed many policies around health, economy, employment, transport. But, these action were not enough for the existence of previous lack of infrastructure in hospitals, as result of past governments. By the other hand, a variety of politic parties in the Parliament and their search for own interests, was evidenced during this pandemic period. Considering previous condition of lack of success in health, economic policies, the discussion about possible impeachment started. Therefore, this work has the main aim of finding evidence about what users were talking about and what was the impact on Peruvian population using Twitter. The pandemic Covid-19 is an special event that transform the society. Many governments around the world present their strategies to stop the spread of this virus [1] . Peru is a country in South America that implements one of the most strict quarantine in the world with poor results [2] . Moreover, the current government was involved in a cover-up scandal of possible undue payments [3] . For this reason, the congress decided to initiate a impeachment process against the Peruvian president [4] . Analyse the opinion of people in this kind of context is extremely important. Understand the collective reactions provides crucial information to make decisions. Some statistics during the impeachment show us that Peruvian people was against the impeachment [5] . Thus, we want to analyse the interest of the people during the impeachment in Twitter. This social media is a powerful social network that has been used for many political studies [6] , [7] . In this study, we analyse a variety of topics related with the impeachment. We work in five parts the data collection, query setup, preprocessing, filtering and visualization. The analysis show us the Peruvian cities with more activity, the words more frequent, and the emotional force of them before, after, and during the impeachment. In this section, we will evaluate some investigations related with the proposal. In the article [7] , the authors explore the political discourse of trump on social media Twitter. They could analyze social polarity like "white supremacists". Also, the main topics in Trump's discourse were in populism and conspiracy rather than detailed policy prescriptions. This analysis provides information about people's interaction on Twitter. Showing that Trump's tweets focus on discredit technologies companies, claim Twitter is biased against Trump, and there is a Shadow Banning (soft censorship) against Republicans. In the article [8] , the authors explore the change in presidential talk caused by Trump. They focus on two main points: the message and the spreading in Twitter. They analyze some controversial tweets made by the president and how they change the common way to do politics. They produce an anthropological analysis of his tweets. The result show us that there is a signal of shift in the discourse strategy from a wide electorate to a core one. In the article [9] , they analyse the tweets with fake news and valid information. They focus on verified and non verified counts with and without verified information. The results show that the fake news are spread easily than verified information. They use hashtags like "COVID-19", "Corona", and "#2019 ncov". In the article [10] , the authors study the global sentiment around Covid-19 pandemic using twitter trends. They conclude that fear was the main emotion at the beginning but the angry was increasing at the same time the fear was decreasing. Finally, the sadness was increasing related with the lost of family and friends. All these articles show us that is possible analyze the current emotional and political situation of a country using twitter as social network. In this project, we focus on the sentiment analysis using some strategies in these articles to analyse the Peruvian president impeachment. The distribution of the paper follows: section II, literature review and section III, the proposal. Section IV, results and section V, conclusion. Finally, section VI, future work. III. PROPOSAL This section explains the step used for conducting the collection of data, and steps for the analysis around the impeachment of Peruvian president. The proposal follows a well-know Data Mining approach, this has some adaptations according to the study. The topic around Peruvian impeachment was selected because the relevance of this political action. Actually, Peru has many problems as results of insufficient or inefficient health, economical, employment policies. Therefore, study how Peruvian users interaction was during these weeks is relevant. The study needs to get data from each region from Peru to have global opinion, idea from the impact of the event on Peruvian population. It is necessary to mention that Peruvian country has 25 regions. The collection of data is using API (Application Programming Interface) from Twitter, with the next parameters: • Range Date: 05-09 to 12-09, 14-09 to 21-09 and 20-09 to 28-09 • Keywords: considering the context, all the tweets generated during the range data were collected • Geolocalization and radius: latitude, longitude and radius are presented in Tab. III-B, the radius were selected manually considering the prior knowledge of population concentration in regions. • Language: Spanish The preprocessing step is necessary to clean data and process it later to filter and get a good visualization. • First, there is overposition of dates then it is necessary to drop duplicates. • Natural Language Processing task is applied, all text to lowercase and removal of urls using regular expressions • Remove stopwords, i.e articles, pronouns and custom words, i.e. mas (but), si (if), rt • Topic of study is around impeachment, president, martin vizcarra (Peruvian president) then a filtering of the data is performed using these terms. • Besides, a second removal is performed, deleting previous terms. This step is important because it provides graphical support to perform the analysis. Frequency histogram are used to show the number of tweets per day, city. Cloud of words to show the most frequent words per city. And, line plot are used to show different trends and help the visualization of many sources. This section presents the results of the exploration of data using Text Mining techniques. From the description of the dataset, results after filtering steps mentioned in the previous section. And, Google Trends source is added to strengthen the analysis. The dataset used for this study has the next features: • From this graphic is possible to notice, the three regios with more tweets Lima, La Libertad and Arequipa. Besides, the three regions with less publications are: Apurimac, Huancavelica and Pasco. The first ones are located in the coast, Lima is the capital of Peru and La Libertad is a neighbor region and Arequipa is one of the most populated regions. By the other hand, Apurimac, Huancavelica and Pasco are located in the middle of Highlands then these regions has a lower population and the Internet access can be an issue. Next question wants to show how many publications/posts were generated during these weeks. Using date field is possible to generate a new field considering a substring of date wich follows the next format: YYYY-MM-DD. Then, it is possible count the number of tweets considering this new field. The objective of this graphic is to check how was the flow of publications during this range date from: 05-09 to 28-09. Considering the events were happening during these three weeks, the daily posts are presented in Fig. 2 . Then, it is possible to notice there was more interaction around September 11 and 21 in the majority of regions. During this two days, 11 and 21 September were the peak of the topic around impeachment in Peruvian country. This was a final of a deny of impeachment [11] in Peru country. Therefore, next question is formulated to be more specific about impeachment in this country. Now, to check the specific topic around impeachment is possible to filter available data. A filtering using the keywords related to impeachment are used to extract only tweets related to this topic. vacancia (impeachment), martin, vizcarra, presidente (president). Then, the results are presented in Fig. 3 , This three words were chosen because they are strong related to the topic of Peruvian impeachment. Following this step is possible to check how the number of publications per region around this concern during the range of date of the actual study. Then, the next question is related to identify what terms or words appear or are related to impeachment. Considering the filtered data from previous subsection is possible to generate a cloud of words. This representation can help to see terms with higher frequency around the keywords. After the filtering step, these keywords are removed because they are not useful to know terms around them. For this reason, an extraction of these terms are performed and new cloud of words is generated, Fig. 4 . Considering news around impeachment, this action was supported for some parliamentary people. Edgar Alarcon, Karem Roca and Richard Swing are people involved in some audios about some possible irregular actions [12] Fig. 4 . Cloud of words after filtering and without selected terms president, this terms are present in Lima region. Besides, many regions mention the terms "audios", "grabaciones" because during this time many audios related to declarations of Karem Roca was released, i.e. Callao, Huanuco, Puno, Tacna, Tumbes. By the other hand, the term "cuellos blancos" appears in Junin region, reminding a past case about corruption case between judges in Callao Region. Next analysis considers the total data from Peru to create bigrams (see Fig. 5 ) to support the analysis about how Peruvian users react in front of impeachment. On Satuday 05-09, people were commenting about to stop the lockdown on Sundays, because on some regions according to the number of covid-19 infection a lockdown was mandatory to avoid the spread. On 10-09 starts, the bigrams: edgar alarcon, incapacidad moral (moral incapacity), contenido audio (audio content), richard swing related to the scandal originated by audios of Karem Roca. Next day, Friday 11-09 has the presence of the bigrams: incapacidad moral (moral incapacity), contenido audio (audio content), richard swing. Later, Tuesday 15-09 the next bigrams are present: karem roca, roca cofirma (roca conffirms), confirma grabo (conffirm recorded), both are related to the previous event. The most important topic of the week was the deny of impeachment of Peruvian president on 19-09, next bigrams are present: A sentimental analysis task were performed, and values between 0 and 1 were obtained. A threshold 0.5 is used to establish which comments were negative or positive and this analysis is replicated for all the days. Then, the results are presented in Fig. 6 , it is evident that negative/positive comments were balanced before of 09-08 and suffered two main peaks, 11-09 and 19-09. First peak are related to impeachment process after the release of audios and second one, after of the deny of impeachment. It is important to mention that the number of negative publications decrease meaningfully. Then, this analysis can help us to understand how was the appreciation of Peruvian users before, during and after of impeachment process. Using experience of other works related to use Digital data, Google trends can be an interesting source to measure trends related to search using Google Engine Search. Then, it is possible to know the trends around one term or keyword, setting the range of date for the analysis. The graphic 7 is presenting the frequency of tweets from Peruvian users and google trends of next keywords: Congreso (Parlament), Martin Vizcarra, Vacancia (impeachment). Besides, to dive in the results of google trends, results from Youtube search about Martin Vizcarra because people was interested or curious about the impeachment topic. Scales of each seris is Data coming from Social Network Twitter can be useful to analysis and detect events which happened or happens in one specific location. The analysis of data coming from Twiter reflect the state of Peruvian users before, during and after of the impeachment process. Performing a sentimental analysis can confirm how negative was the event for the population. Finally, adding Google trends, the previous conclusions can be strengthen, because users were using Google search engine to search about this topics during this weeks. For future work, an analysis of news coming from Peru can be analyzed to support the analysis. Besides, the analysis of publications from other social networks can be performed, i.e. Facebook or comments from blogs/websites where users can express their opinions and ideas. A structured open dataset of government interventions in response to covid-19 Peru took early, aggressive measures against the coronavirus. it's still suffering one of latin america's largest outbreaks Caso richard swing: exasesor de martín vizcarra podría ir hasta 15 años a la cárcel Congreso: convocan al pleno para este viernes para debatir moción de vacancia contra vizcarra Encuesta nacional urbana setiembre 2020 -la crisis política Predicting elections with twitter: What 140 characters reveal about political sentiment Trump tweets the truth: Metric populism and media conspiracy Twitter, trump, and the base: A shift to a new form of presidential talk? Coronavirus goes viral: Quantifying the covid-19 misinformation epidemic on twitter Global sentiments surrounding the covid19 pandemic on twitter Analysis of twitter trends Vacancia contra martín vizcarra: el congreso rechaza la destitución del presidente de perú Edgar alarcón más cerca de responder por graves denuncias Authors wants to thank Research4tech, an Artificial Intelligence(AI) community of Latin American Researcher with the aim of promoting AI, build Science communities to catapult and enforce development of Latin American countries supported on Science and Technology, integrating academic community, technology groups/communities, government and society.