key: cord-0773229-yb9shj71 authors: Imtiaz Khan, Nafiz; Mahmud, Tahasin; Nazrul Islam, Muhammad title: COVID‐19 and black fungus: Analysis of the public perceptions through machine learning date: 2021-11-14 journal: Eng Rep DOI: 10.1002/eng2.12475 sha: 82a6e92f05fc8bfa6564506cbde871f7e199783a doc_id: 773229 cord_uid: yb9shj71 While COVID‐19 is ravaging the lives of millions of people across the globe, a second pandemic “black fungus” has surfaced robbing people of their lives especially people who are recovering from coronavirus. Thus, the objective of this article is to analyze public perceptions through sentiment analysis regarding black fungus during the COVID‐19 pandemic. To attain the objective, first, a support vector machine (SVM) model, with an average AUC of 82.75%, was developed to classify user sentiments in terms of anger, fear, joy, and sad. Next, this SVM model was used to predict the class labels of the public tweets (n = 6477) related to COVID‐19 and black fungus. As outcome, this article found public perceptions towards black fungus during COVID‐19 pandemic belong mostly to sad (n= 2370, 36.59%), followed by joy (n = 2095, 32.34%), fear (n = 1914, 29.55%) and anger (n = 98, 1.51%). This article also found that public perceptions are varied to some critical concerns like education, lockdown, hospital, oxygen, quarantine, and vaccine. For example, people mostly exhibited fear in social media about education, hospital, vaccine while some people expressed joy about education, hospital, vaccine, and oxygen. Again, it was found that mass people have an ignorance tendency to lockdown, COVID‐19 restrictions, and prescribed hygiene rules although the coronavirus and black fungus infection rates broke the previous infection records. COVID-19 is an infectious disease caused by "Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)" which is broadly termed coronavirus. 1, 2 Coronavirus disease has created a global pandemic situation as the death toll continues to rise worldwide. 3, 4 Amidst the crisis of coronavirus, a new epidemic called "black fungus" 5 is spreading fear in people. Black fungus, formally known as mucormycosis, a potentially deadly fungal infection caused by a group of molds called micromycetes. It is more likely to affect people having diabetes, cancer, HIV or AIDS, and organ transplant that means having compromised immune systems. 6 Cases of mucormycosis have been found in patients who are recovering from coronavirus. 5, 7, 8 As coronavirus leaves it is patients' immune systems in a debilitating situation, they are more susceptible to mucormycosis. As of June 21, 2021, 31,216 cases of infection and 2109 deaths due to black fungus have been reported globally. 9 While almost 71% of the global cases of mucormycosis have been reported in India. 10 Due to the COVID-19 outbreak, around half the population of the world was under complete or partial lockdown, which is still ongoing in some countries. 11 To control this outbreak social distancing, staying at home, quarantine are considered the most effective. Thus, social media and social networking sites became very fundamental for expressing opinions and emotions. 12 COVID-19 has altered the way people use the internet since more individuals are logging on to various social media sites. 13, 14 It is possible to comprehend people's mental states by analyzing their views and opinions, comments, and posts on various platforms. After the surge of coronavirus, some studies have been carried out focusing on sentiment analysis with Twitter data. 15, 16 During the first phase of the COVID pandemic, false and misinformation were spreading like wildfire. This gave birth to different physiological and mental issues for social media users. 17 The impact of black fungus may affect people in the same way. Moreover, people's perceptions of black fungus can be explored with the sentiment analysis of social media data. Machine learning (ML) algorithms learn the hidden pattern of the data and can predict the class labels of unknown samples. Thus, ML is widely being used in the field of health informatics, robotics, building intelligent systems, and so forth.. [18] [19] [20] [21] [22] Similarly, ML is widely being used in the field of sentiment analysis for predicting public sentiment. 23, 24 Nonetheless, It can be noticed that a significant number of studies have been conducted focusing on sentiment analysis due to the COVID-19 pandemic in specific as well as cross-country scenarios. 17, 25 However, to the best of our knowledge, the views, and feelings of social media users towards black fungus were not revealed yet. Therefore, this article aimed to explore the public views in terms of joy, fear, anger, and sad towards black fungus during the COVID-19 pandemic through sentiment analysis of Twitter data. The article was carried out in six phases, which include: data acquisition, data pre-processing, data embedding, developing the ML model, analyzing the ML model, and analyzing public perceptions. The phases are briefly discussed in the following subsections. In this phase, two datasets were acquired. First, a total of 8308 tweets were collected through searching Twitter using several keywords for instance "COVID-19," "coronavirus," "COVID pandemic," "micromycetes," "black fungus," and "COVID delta variant" in its text. The timestamp of these tweets varies from May 2021 to June 24 since the first black fungus infection case turned up during the COVID-19 pandemic in early May. 26 Second, a publicly available dataset * on Twitter was acquired for developing the ML model. The acquired dataset contains a total of 3046 tweets delineating Indian sentiment regarding COVID-19, coronavirus, and lockdown; and labeled in terms of joy, fear, anger, and sad. Since, the objective of this research was to analyze the public sentiments in terms of joy, fear, anger, and sad, that particular dataset was considered. All the collected tweets from the first phase have gone through a series of pre-processing steps that subsume: (a) conversion of tweets to lower case character; (b) removal of username and URLs, punctuation, links, and tabs, white spaces at the start and end of tweets, stop words; (c) expanding of contractions; and (d) removal of duplicate tweets. After the pre-processing steps, the first dataset (prepared dataset) contained 6477 tweets, while the second dataset (open-access dataset) contained 3030 tweets. In this phase, sentences of the tweets from both datasets are encoded into machine-understandable embedding vectors. For encoding the sentences, Universal Serial Encoder (USE) is used. In USE, the text is encoded into high-dimensional vectors using the Universal Sentence Encoder, 27 which can then be utilized for text classification, semantic similarity, clustering, and other natural language applications. The model is designed to handle text that is longer than a word, such as sentences, phrases, or brief paragraphs. It is been trained on a wide range of data sources and tasks in order to flexibly accommodate a wide range of natural language comprehension tasks. The input is a 512-dimensional vector of variable length English text. With a deep averaging network (DAN) encoder, the universal-sentence-encoder model is trained. The ML model is developed in this phase by using the open-access dataset (second dataset). A random train test split of 80-20 was done, where 80% data was considered as the training dataset, while 20% data was considered as the test dataset. Next, a support vector machine (SVM) model was developed on the training data to classify user sentiments in the pre-defined class labels. The SVM algorithm is one of the most used supervised ML algorithms that have been adopted for classifying user sentiments and showed comparatively good performance. [28] [29] [30] [31] Moreover, SVMs have the ability to manage very huge feature spaces since they utilize overfitting prevention that is not dependent on the number of features. 32 In this phase, first, the performance of the developed SVM model was analyzed. The model was evaluated in terms of precision, recall, and f1 score, while ROC curves and confusion matrices were also generated for further analysis of the model. The equation for calculating precision, recall, and f1-score are shown in Equations (1)-(3), respectively. Each of the evaluation parameters was obtained by performing "Weighted" averaging on the actual and model-predicted class labels. In weighted averaging, first, the metrics are calculated for each class label, then, the average weight for all class labels is calculated with the help of support (the number of true instances for each label). In the equations, the term "tp" refers to true positive, the term "tn" refers to true negative, the term "fp" refers to false-negative whereas, the term "fn" refers to a false negative. Again, in the last phase, the public perceptions towards black fungus were analyzed. The developed SVM model was used to predict the class label of 6477 unlabeled tweets (dataset 1). Some of the tweets and their model predicted sentiments are shown in Table 1 . The model predicted classified data are analyzed for understanding the public perceptions towards black fungus during the COVID-19 outbreak. The findings of these analyses are further discussed in the following section. Figure 1A ,B, respectively, while confusion matrices for the train and test dataset are shown in Figure 2A ,B, respectively. In Figures 1 and 2 , labels 0, 1, 2, and 3 represent anger, fear, joy, and sad, respectively. It can be seen from Figure 1B that, for the test data, area under the curve (AUC) for class labels 0, 1, 2, and 3 were 80%, 77%, 89%, and 85%, respectively (82.75% average AUC), while AUC for class labels 0, 1, 2, and 3 were 95%, 93%, 97%, and 97%, respectively (95.5% average AUC) for the train dataset (see Figure 1A) . Again, it can be seen from the right diagonals of the Figure 2B that, 113, 111, 118 , and 112 test samples were correctly classified as labels 0, 1, 2, and 3, respectively, while 566, 547, 565, and 622 train samples were correctly classified as 0, 1, 2, and 3, respectively. Therefore, it can be said from the ROC curves and the confusion matrices that the model has achieved satisfactory performance for both the train as well as the test dataset. The trained ML model was used to predict the sentiments of the 6477 unlabeled tweets (as stated in Section 2.1). The classification of tweets into four sentiments portrayed that 36.59% (n = 2370) of people expressed sadness, followed by joy (32.34%, n = 2095), fear (32.34%, n = 1914), and anger (1.51%, n = 98). Surprisingly, results showed more tweets expressing joy, than fear. However, it was observed that lots of people expressing positive (joyful) sentiment may be due to getting vaccines, being cured of COVID-19 disease, or being optimistic to fight against the pandemic could be the possible reasons for finding more joyful tweets than fearful tweets. It is apparent that in the majority of the tweets (36.59%), people were expressing sadness which is natural since the COVID pandemic has already put mental stress on people of all ages. On top of this, black fungus infections have added an extra layer of mental burden and thus in the tweets people were expressing sorrow. However, only a few tweets were found expressing anger. The word clouds for anger, fear, joy, and sad sentiments are presented in Figure 3 . The result showed that fungus, COVID, black, white, new, board, exam, and yellow are recurring in each of the sentiments (see Figure 3A -D). In the case of F I G U R E 1 ROC curves for the developed SVM model: (A) Train data, (B) test data anger sentiment (see Figure 3A) , the most frequent words are fungus, black, COVID, government, cancel, white, spread, infected, wave, go, and pandemic. This indicates the public's dissatisfaction with the government initiatives taken to restrain the black fungus infection during the COVID-19 pandemic. The word "China" also appears in the word cloud of anger sentiment considering China is the origin of coronavirus. Again, the words fungus, black, white, COVID, rare, rise, deadly, disease, mucormycosis, treatment, new, news, exam were found in expressing fear and sadness (see Figure 3B ,D). These words indicate that people are concerned about the fact that COVID-19 and black fungus are taking a toll on human lives. The word "news" in both word clouds shows that people are relying on news media for the pandemic update. Also, because of the recent emergence of black fungus amidst the COVID pandemic, people think that black fungus is a new deadly infectious disease even though it has been around for centuries. 33 As such, the words new, deadly, disease were found in the word clouds of negative sentiments (fear and sad) (see Figure 3B ,D). Words related to education such as exam, board, class, the board exam were also found in the world clouds. In the pandemic situation, all educational activities were postponed; hence education became a trending topic in social media. Moreover, the words white, black, and yellow were found in the word clouds since cases of all these three types of fungal diseases were found during the COVID pandemic. To provide a better understanding of people's perspective on black fungus, few concerns were highlighted based on the related recurring words; and the sentiments (anger, fear, sad, and joy) were analyzed regarding these concerns. Figure 4 depicts the number of tweets to different concerns and people's sentiment to each concern. Education, vaccination, and hospital-related tweets had more negative sentiments (fear and sadness) than positive sentiments, whereas hospital and vaccine-related tweets have more sentiments expressing sadness than the other three sentiments. This indicated that people's perceptions regarding healthcare management to handle black fungus and COVID-19 patients during this pandemic were not satisfactory. During the coronavirus pandemic, hospital management systems in several nations crumbled due to the unrestrained rate of COVID-19 infection. 34 Many hospitals were even forced to turn away patients in dire need. People expressed fewer positive feelings (joy) about education, hospitals, quarantine, oxygen, and vaccines than negative feelings. Since people's attention has turned away from lockdown, few tweets on lockdown and quarantine have been detected. This also indicates the ignorance tendency of people to lockdown, COVID-19 restrictions, and prescribed hygiene rules though the coronavirus and black fungus infections rates broke the previous infection records many times. In this article, data extracted from Twitter was mined to understand the public sentiments towards black fungus during the COVID-19 pandemic. A Support Vector Classifier was used for the classification of tweets into sad, joy, fear, and anger sentiments. The classifier had an average AUC of 82.75% on the test dataset. As outcome, the public perceptions were distributed into four sentiments as follows: sad (n = 2370, 36.59%), joy (n = 2095, 32.34%), fear (n = 1914, 29.55%), and anger (n = 98, 1.51%). This article also revealed public perceptions on several important concerns, in particular, education, lockdown, hospital, oxygen, quarantine, and vaccine. It was also found that regarding the topics on education, hospital, oxygen, quarantine, and vaccine people had more negative feelings (fear and sadness) than positive feelings while people paid almost no attention regarding lockdown and quarantine. These findings indicate that people are not more likely to stay at home, maintaining social distance in this pandemic. Thus, many countries in this world are found to be using many synonymous words of lockdown like shutdown, strict lockdown; just to motivate folks to stay at home. 35, 36 However, this article findings can provide a deeper understanding of public perceptions towards black fungus during the COVID-19 epidemic in the world. Moreover, this article can help the government and policymakers to take important decisions and actions for controlling the black fungus and COVID-19 outbreaks. As limitations, this article used only one NLP technique named USE to encode text into high dimensional vectors. However, there are some other modern techniques available and the use of these techniques may provide better performance. Also, the article used only one ML algorithm to predict user sentiments. Thus, future research may focus on exploring different NLP techniques as well as building different ML models for finding out the best performed NLP technique and ML model to achieve more generalized and better results. The peer review history for this article is available at https://publons.com/publon/10.1002/eng2.12475. Author elects to not share data. The COVID-19 epidemic WHO declares COVID-19 a pandemic A systematic review of the digital interventions for fighting COVID-19: the Bangladesh perspective What drives unverified information sharing and cyberchondria during the COVID-19 pandemic? Mucormycosis in COVID-19: a systematic review of cases reported worldwide and in India Mucormycosis-the black fungus The surge in covid related mucormycosis Fulminant Mucormycosis Complicating Coronavirus Disease 2019 (COVID-19). vol. 11 of International Forum of Allergy & Rhinology Tens of thousands of COVID-19 survivors in India are developing deadly 'black fungus' infections that can lead to blindness When uncontrolled diabetes mellitus and severe COVID-19 converge: the perfect storm for mucormycosis A systematic review on the use of AI and ML for fighting the COVID-19 pandemic Exploring the performance of ensemble machine learning classifiers for sentiment analysis of COVID-19 tweets ICT intervention in the containment of the pandemic spread of COVID-19: an exploratory study Why do people share misinformation during the Covid-19 pandemic? Sentiment analysis on COVID-19 twitter data Sentiment analysis on the impact of coronavirus in social life using the BERT model Twitter sentiment analysis during COVID19 outbreak 3572023 Prediction of cesarean childbirth using ensemble machine learning methods Exploring machine learning algorithms to find the best features for predicting modes of childbirth VGG-SCNet: a VGG net based deep learning framework for brain tumor detection on MRI images Exploring the machine learning algorithms to find the best features for predicting the breast cancer and its recurrence Evaluation of user's emotional experience through neurological and physiological measures in playing serious games Sentiment analysis in twitter using machine learning techniques Sentiment analysis of movie reviews using machine learning techniques Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: evidence from India Mucormycosis: the 'black fungus' maiming Covid patients in India Universal sentence encoder Sentiment analysis using support vector machine Sentiment analysis on Bangladesh cricket with support vector machine Comparison of Näve Bayes, support vector machine, decision trees and random forest on sentiment analysis Comparison of Naive Bayes algorithm and support vector machine using PSO feature selection for sentiment analysis on e-wallet review Text categorization with support vector machines: learning with many relevant features Textbook of Medical Mycology Key factors fuelling India's second COVID surge Global lockdown: an effective safeguard in responding to the threat of COVID-19 He is the author of more than ten peer-reviewed publications in international journals and conferences. His research interests include machine learning, artificial intelligence, data science The authors declare no potential conflict of interest.AUTHOR CONTRIBUTIONS Nafiz Imtiaz Khan: investigation (lead); methodology (lead); project administration (equal); writing -original draft (equal). Tahasin Mahmud: visualization (equal); writing -original draft (equal). Muhammad Nazrul Islam: conceptualization (equal); supervision (lead); writing -original draft (equal); writing -review and editing (equal). ENDNOTE * https://www.kaggle.com/surajkum1198/twitterdata ORCID Nafiz Imtiaz Khan https://orcid.org/0000-0003-0149-6012 Tahasin Mahmud https://orcid.org/0000-0002-5065-514X Muhammad Nazrul Islam https://orcid.org/0000-0002-7189-4879 Tahasin Mahmud is a graduate in Computer Science and Engineering (CSE) from the Military Institute of Science and Technology (MIST), Dhaka, Bangladesh. His research interests include machine learning, artificial intelligence, computer vision, and computer security.