key: cord-0716465-n9a81yin authors: Nurmawiya,; Harvian, Khalista Arkania title: Public sentiment towards face-to-face activities during the COVID-19 pandemic in Indonesia date: 2022-12-31 journal: Procedia Computer Science DOI: 10.1016/j.procs.2021.12.170 sha: d40e0cae49783c69656a3259fcde7c61a79d1fbe doc_id: 716465 cord_uid: n9a81yin A year after the COVID-19 pandemic took place, activities that were carried out online gradually switched back to face-to-face. This has caused controversy given the high transmission. Therefore, this study aims to analyze public sentiment by utilizing Twitter data. Latent Dirichlet Allocation (LDA) was also conducted in this study to classify public opinion. It was found that face-to-face learning was the highlight of public conversation and was dominated by negative sentiment, followed by neutral and positive sentiment. Meanwhile, the LDA model produced topics about vaccination, public preference, school reopening, public sentiment, students’ longing for face-to-face learning and face-to-face learning plan. The World Health Organization (WHO) has declared COVID-19 as a pandemic in its press briefing in the media on March 11 th , 2020. This statement was made after there were more than 118,000 cases of COVID-19 in 114 countries with a death toll of 4,291 people [1] . According to WHO, a pandemic is not only a problem concerning a public health crisis, but also a crisis that will impact various sectors including the achievement of the Sustainable Development Goals (SDGs). The World Health Organization (WHO) has declared COVID-19 as a pandemic in its press briefing in the media on March 11 th , 2020. This statement was made after there were more than 118,000 cases of COVID-19 in 114 countries with a death toll of 4,291 people [1] . According to WHO, a pandemic is not only a problem concerning a public health crisis, but also a crisis that will impact various sectors including the achievement of the Sustainable Development Goals (SDGs). Post the start of the COVID-19 pandemic to Indonesia in March 2020, the Indonesian government immediately followed up on the case by enacting a policy namely Large-Scale Social Restriction (LSSR) [2] . LSSR is implemented for 14 days and can be extended as required. According to the Center for Disease, social distancing is an activity to stay away from associations, avoid mass gathering and maintain distance between people. Social distancing is carried out to suppress the spread of the COVID-19 in Indonesia. The government also urges people not to leave the house and avoid gathering. This causes all face-to-face activities to be minimized and replaced by online activities. The COVID-19 pandemic also leads to changes in teaching and learning patterns. During the pandemic period, there are several things that need to be considered if activities were carried out online, including teacher readiness, ICT infrastructure that needs to be optimized, as well as internet technology for poor and vulnerable families. Not only that, the risks that will occur should also be anticipated, such as the drop-out rate which is likely to increase, the decrease in the ability to read and count, and the basic education that will be disrupted [3] . In the implementation of online learning, an imbalance in accessing technology makes it difficult for some students, especially those who live in areas without adequate internet networks [4] . Additional cost in accessing the internet is also an obstacle to online learning [5] . Furthermore, online learning is also considered to have negative social impacts on students [6] . As the new normal has been implemented, online activity gradually shifts back to a face-to-face activity, including school activities. However, school activities particularly face-to-face learning has also caused controversy given the high transmission of COVID-19 in Indonesia [7] . This is a challenge in achieving the 4 th SDG goals which is quality education as well as the 3 rd SDG goals which is good health and wellbeing at the same time. This causes various responses in the community. Therefore, this research aims to analyze public opinion towards this issue. The data used were obtained from social media twitter. Twitter is a microblogging site that allows users to write various topics and discuss current issues. The recommendation to stay home during a pandemic has a significant effect on media consumption by up to 60% [8] . Nowadays social media has become a part of daily life for most people and has become the main source of news about COVID-19. Research using sentiment analysis has been carried out many times, including research conducted by Fauziyyah [9] about the COVID-19 pandemic. The results of sentiment analysis regarding COVID-19 and corona virus in the neutral category have the highest polarity values; 58.94% and 55.10%, respectively. This means that public opinion regarding the COVID-19 pandemic and the corona virus was still within neutral limits. Another research conducted by Samsir [10] on online learning in early November 2020 showed that 30% was positive sentiment, 69% was negative sentiment, and 1% was neutral sentiment. This negative perception was caused by the public's discontentedness with online learning and its implementation in Indonesia which was less than optimal. Previous researches had discussed about sentiment analysis regarding COVID-19 and online learning. There is no research that discusses about public opinion on face-to-face activities during the pandemic. Therefore, this research was done to capture the public sentiment on face-to-face activities during the pandemic. In addition, this research was also conducted to identify several topics circulating in the community about face-to-face activities during the pandemic. The data used in this research are opinions regarding face-to-face activities in Indonesian language with keyword "tatap muka''. The data was collected from Twitter by web scraping techniques using Twitter Intelligence Tools (Twint) ranging from March 11 th , 2020 to July 17 th , 2021. The data was collected from March 11 th , 2020 because the United Nations has announced COVID-19 as a pandemic since then. The tweets used in this research are from personal accounts only by making a list of non-personal accounts and deleting the tweets from non-personal accounts. Furthermore, data preprocessing was carried out to create the dataset suitable for the analysis. Preprocessing consists of data cleansing which was done by removing url, hashtag, username, punctuation mark, and emoji as well as changing all words into lowercase, tokenizing by cutting sentences based on the constituent words, and word normalization. The next, stopwords were filtered out and then stemming was done using sastrawi [11] to change the affixed word into basic word. After that, the words were rejoined and the duplicate tweets were removed. Sentiment analysis was used to analyze people's opinion, sentiment, evaluation, attitudes and emotions written by the public towards face-to-face activities. Sentiment analysis focuses on opinions that express positive or negative sentiments. In addition to that, it is necessary to also consider neutral expressions [12] . Sentiment determination uses the lexicon-based approach by adding up the scores of each sentiment word in a sentence. Lexicon is a collection of sentiment words both positive and negative. The initial step to take is to determine the score of each sentiment word. If a word has a positive sentiment, then it is given a score of 1, while negative sentiment is given a score of -1 [13] [14] and neutral sentiment is given a score of 0. Next, the score of each sentiment word contained in a sentence will be calculated by adding it up. If the total value in one sentence is more than 0, then the sentence is a positive sentiment, whereas if the sum is less than 0, then the sentence is a negative sentiment. Meanwhile, the sum of 0 is a sentence with neutral sentiment. The Naive Bayes Classifier (NBC) was also used to assess the accuracy of the classification that has been done. NBC classifies a tweet based on the existing text. There are two important processes in Naive Bayes in performing classification, and those are training and testing. NBC calculates accuracy by finding the highest probability value. The accuracy depends on the amount of training data used in the system. The NBC method was used in this research because it is considered a method that has good potential in classifying data in terms of accuracy and computation compared to other classification methods [15] . Next is grouping public opinion using Latent Dirichlet Allocation (LDA). LDA method was used in this study to identify topics of the public concern regarding face-to-face activities during the COVID-19 pandemic. Thus, this study focuses more on topics capturing the phenomenon in the real world about face-to-face activities, not only limited to the public sentiment. LDA is a generative probabilistic model of a corpus with the basic idea of documents represented as random mixtures over latent topics and each topic is characterized by word distribution [16] . The purpose of LDA is to find the latent structure of a topic or concept in text. In LDA there are 2 parameters, and those are α and β, where α is a dirichlet parameter from the distribution of the document towards the topic and β as a dirichlet parameter from the distribution of topics to words. The Gibbs Sampling algorithm was used to determine the probability of topics on documents and topics on words. This calculation will be repeated continuously until the difference between iterations has converged or is close to zero. LDA can be explained in Fig. 1 [17] below. Before forming an LDA model, it is important to determine the number of topics. In this research we used several metrics to determine the number of topics; Griffiths [18] , Arun [19] , Cao [20] and Deveaud [21] . Then, the coherence and prevalence of each topic will be evaluated from the LDA model that has been formed. The coherence measures the semantic similarity between words with high scores on the topic, thus it can help differentiate between topics that can be interpreted semantically and topics that are artifacts of statistical inference [22] . A higher score of coherence indicates a better correlation with humans determine to be a coherent topic [23] . Whereas the prevalence is a value that provides information on the most frequently occurring topics, this value shows the distribution opportunity of the topics contained through-out the document [24] . The data collected using web scraping techniques resulted in 136,583 Indonesian tweets discussing face-to-face activities. In Fig. 2 it can be seen that the number of tweets written by users had a fluctuating trend every day. The number of tweets were sharply increased on March 17 th , 2020 and November 20 th , 2020. This happened because some universities began implementing online learning at the beginning of the pandemic. Meanwhile, the increase that occurred on November 20 th was caused due to the face-to-face learning plan at the beginning of 2021. These things caught the public's attention and triggered them to post their opinions on Twitter. It indicates that Twitter data can be used to capture things that happened in the real world. Through the data preprocessing, 114,384 tweets were obtained and ready to be analyzed. Fig. 3 below shows the words that appear with the most frequency. From Fig. 3 , it can be seen that face-to-face activities which have become a lot of public conversations on Twitter are face-to-face learning in school. Face-to-face learning will be carried out based on the consideration of the negative social impacts caused by distance learning, including a decrease in learning achievement or learning loss, students dropping out of school, and violence against children 6 . But on the other hand, Indonesia's positivity rate is still around 13%, which means it has high transmission and is dangerous for students if they have to go to school 7 . Certainly, the policy regarding face-to-face learning is a dilemma for the public. Fig. 4 it can be observed that the word 'covid' has associations with some words such as 'papar' (exposure), 'lonjak' (increase), 'vaksin' (vaccine) and 'pandemi' (pandemic). Furthermore, the word 'vaksin' (vaccine) has associations with 'anak' (kid), 'murid' (pupil), 'guru' (teacher), 'cepat' (fast), and 'sekolah' (school). These associations indicate the high exposures of COVID-19 so that it requires a fast vaccination for teachers and pupils to reopen schools. Then, from the result of the sentiment analysis, there were 59,927 tweets with negative sentiment, 19,391 tweets with positive sentiment and 35,066 tweets with neutral sentiment. Thus, the negative polarity is 52.39%, whereas the positive polarity is 16.95% and the neutral polarity is 30.66%. Fig. 5 below are the wordclouds of tweets with positive and negative sentiments. Through the wordcloud in Fig. 5 the word "takut" (fear) appears quite often in negative sentiments. Based on the performed observation, this was caused by the fear of the public about exposure to COVID-19 due to face-to-face learning. School reopening without robust COVID-19 mitigation can increase the risk of transmission with a more varied virus [25] . Furthermore, the Naïve Bayes Classifier algorithm was used to assess the accuracy of the sentiment determination. This study used 80% of the data as training data and 20% of the data as testing data. In this accuracy test, the tweets with neutral sentiment were excluded, thus the amount of the used tweets were 79,318 tweets. The distribution of training and testing data was carried out using random sampling with a comparison of the probabilities between positive and negative sentiments in the two subsets relatively the same. Based on this process, an accuracy value of 91.47% was obtained. Next is to group public opinion into several topics. This grouping was done using the LDA method which first determines the number of topics. The determination was based on several metrics as shown in Fig. 6 . According to Fig. 6 , the result of the LDAtuning algorithm shows that the correct number of topics is 6. It can be seen that each of the CaoJuan and the Griffiths metric achieved a minimum value which was close to the global minimum on 6 topics. On the same number of topics, the Arun and the Deveaud metrics also achieved maximum values that were quite close to the global maximum. Furthermore, it was found that certain terms start repeating when the grouping exceeds 6 topics. Therefore, the number of topics used in this study is 6 topics. After evaluating the words using the four metrics, 6 topics were generated by LDA model as shown in Table 1 . Table 1 contains the name of each topic along with the words or terms that represent it. The first topic is about vaccination for teachers and students. The second topic is about public preference between online learning and faceto-face learning. The third topic concerns the step taken by the government to reopen schools. The fourth and fifth topics respectively captured the public sentiment and the student's longing for face-to-face learning. Then, the sixth topic is about face-to-face learning that will be held based on the zone of the virus spread and by enforcing health protocols. From those topics, the coherence of each word that was incorporated into each topic was assessed along with its prevalence as shown below. Fig. 7 shows that the six topics have the same coherence value. This means that every word that belongs to a certain topic has the same amount of association as in other topics. Meanwhile, the third topic had the highest prevalence compared to other topics. This means that users often use the word combinations contained in topic 3, which is about school reopening. The LDA model that has been formed has an r-square value of 0.9787. It can be interpreted that 97.87% variability of the data can be explained by the model. Here is the sample of tweets that have been analyzed. Table 2 contains the sample of tweets that has negative, positive and neutral sentiment along with the topic of each tweet. The first tweet shows user's anger because of face-to-face learning in school and this tweet was defined as negative tweet which is included in the topic of public sentiment. The second tweet tells the vaccination for teachers and user's hope for face-to-face learning to begin soon. It was classified as positive tweet with the topic of vaccination. The third tweet shows neutral sentiment which tells us a school that has held a face-to-face learning. This includes the topic of school reopening. Finally, to find out the rationality of this study, there are several studies to compare. The similar studies were conducted by Sahir [26] , Pastor [27] and Bhagat [28] to analyze public opinion towards online learning during the pandemic. The public responses were dominated by negative sentiments in Sahir and Pastor, while it was the opposite in Bhagat's study. Another research conducted by Barkur [29] regarding sentiment analysis of nationwide lockdown due to COVID-19 outbreak in India showed that the lockdown arose positive sentiment. The people were clear that they have to flatten the curve and they committed with it. Meanwhile, this study has showed that the public responses were dominated with negative sentiments toward face-to-face activities due to the high transmission of COVID-19, public dissatisfaction towards the government decision to open tourist destinations and the vaccination progress that is still slow. Not only that, this study also found several topics that the public had discussed regarding face-to-face activities during the pandemic through the LDA method. According to the result of the sentiment analysis, public responses regarding face-to-face activities were dominated by negative sentiments, followed by neutral and positive sentiments. Then, public opinion grouping using the LDA method resulted in 6 topics including vaccination, public preference, school reopening, public sentiment, students' longing for face-to-face learning and the implementation of face-to-face learning plan. In this regard, it is expected that the government can ensure the implementation of health protocols, speed up the vaccination, calculating the worst possibility that might happen and preparing for the best possible mitigation. For the next research, it is expected that there will be a more in-depth analysis about face-to-face activities. Such analysis can be done by including the location of the posted tweets or involving a combination of likes and retweets in determining public sentiment. The analysis can also be augmented by predicting the spread of the COVID-19 through the public mobility based on the location of the posted tweets. WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Inilah PP Pembatasan Sosial Berskala Besar untuk Percepatan Penanganan Covid-19 SDGs: Solusi Bersama Pulihkan Indonesia Pascapandemi Covid-19 [Title in English: SDGs: A Joint Solution to Restore Indonesia Post-Covid-19 Pandemic Best practices for implementing remote learning during a pandemic Faktor pemicu kecemasan siswa dalam melakukan pembelajaran daring di masa pandemi covid-19 Kemendikbud Siapkan Kebijakan Pembelajaran Tatap Muka Terbatas [Title in English: Ministry of Education and Culture Prepares Limited Face-to-Face Learning Policy Pendidikan anak: Mendikbud tegaskan sekolah tatap muka harus dibuka lagi setelah semua guru divaksinasi Covid-19 [Title in English: Children's education: Minister of Education and Culture confirms face-to-face schools must be reopened after all teachers are vaccinated against Covid-19 COVID-19 and The State of Media in North Asia Analisis Sentimen Pandemi Covid19 Pada Streaming Twitter Dengan Text Mining Python [Title in English: Analysis of Covid19 Pandemic Sentiment On Twitter Streaming With Text Mining Python Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19 Menggunakan Metode Naïve Bayes Stemming Indonesian: A confix-stripping approach Sentiment analysis: Mining opinions, sentiments, and emotions Opinion Observer: Analyzing and Comparing Opinions on the Web Peringkasan Sentimen Esktraktif di Twitter Menggunakan Hybrid TF-IDF dan Cosine Similarity Is Naive Bayes a good classifier for document classification Latent dirichlet allocation A survey of topic modeling in text mining Integrating topics and syntax On finding the natural number of topics with latent dirichlet allocation: Some observations A density-based method for adaptive LDA model selection Accurate and effective latent concept modeling for ad hoc information retrieval Exploring topic coherence over many models and many topics An evaluation of topic modelling techniques for twitter Aplikasi Topic Modeling Pada Pemberitaan Portal Berita Online Selama Masa PSBB Pertama [Title in English: Topic Modeling Application for Online News Portal Reporting During the First PSBB Period School reopening without robust COVID-19 mitigation risks accelerating the pandemic Online learning sentiment analysis during the covid-19 Indonesia pandemic using twitter data Sentiment analysis on synchronous online delivery of instruction due to extreme community quarantine in the Philippines caused by COVID-19 pandemic Public Opinions about Online Learning during COVID-19: A Sentiment Analysis Approach Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India