key: cord-0791941-0qqg10y4
authors: Chiroma, H.; Ezugwu, A. E.; Jauro, F.; Al-Garadi, M. A.; Abdullahi, I. N.; Shuib, L.
title: Early survey with bibliometric analysis on machine learning approaches in controlling coronavirus
date: 2020-11-05
journal: nan
DOI: 10.1101/2020.11.04.20225698
sha: 79ac98aa3545ee5f6b14f551e27f6b3c67b95605
doc_id: 791941
cord_uid: 0qqg10y4

Background and Objective: The COVID-19 pandemic has caused severe mortality across the globe with the USA as the current epicenter, although the initial outbreak was in Wuhan, China. Many studies successfully applied machine learning to fight the COVID-19 pandemic from a different perspective. To the best of the authors knowledge, no comprehensive survey with bibliometric analysis has been conducted on the adoption of machine learning for fighting COVID-19. Therefore, the main goal of this study is to bridge this gap by carrying out an in-depth survey with bibliometric analysis on the adoption of machine-learning-based technologies to fight the COVID-19 pandemic from a different perspective, including an extensive systematic literature review and a bibliometric analysis. Methods: A literature survey methodology is applied to retrieve data from academic databases, and a bibliometric technique is subsequently employed to analyze the accessed records. Moreover, the concise summary, sources of COVID-19 datasets, taxonomy, synthesis, and analysis are presented. The convolutional neural network (CNN) is found mainly utilized in developing COVID-19 diagnosis and prognosis tools, mostly from chest X-ray and chest computed tomography (CT) scan images. Similarly, a bibliometric analysis of machine-learning-based COVID-19-related publications in Scopus and Web of Science citation indexes is performed. Finally, a new perspective is proposed to solve the challenges identified as directions for future research. We believe that the survey with bibliometric analysis can help researchers easily detect areas that require further development and identify potential collaborators. Results: The findings in this study reveal that machine-learning-based COVID-19 diagnostic tools received the most considerable attention from researchers. Specifically, the analyses of the results show that energy and resources are more dispensed toward COVID-19 automated diagnostic tools, while COVID-19 drugs and vaccine development remain grossly underexploited. Moreover, the machine-learning-based algorithm predominantly utilized by researchers in developing the diagnostic tool is CNN mainly from X-rays and CT scan images. Conclusions: The challenges hindering practical work on the application of machine-learning-based technologies to fight COVID-19 and a new perspective to solve the identified problems are presented in this study. We believe that the presented survey with bibliometric analysis can help researchers determine areas that need further development and identify potential collaborators at author, country, and institutional levels to advance research in the focused area of machine learning application for disease control.

study. Similar bibliometric analyses have been reported in the literature as presented by Chahrour et al. (2020) , Hossain (2020) , and Lou et al. (2020) . However, these existing analyses differ from the current bibliometric analysis in this study because the current analysis result focuses on the application of machine learning techniques to combat COVID-19 pandemic as opposed to various literature reporting general medical practices on In this study, we propose to conduct a dedicated comprehensive survey on the adoption of machine learning to fight the COVID-19 pandemic from a different perspective, including an extensive literature review and a bibliometric analysis. To the best of our knowledge, this study is the first comprehensive analysis of research output focusing on several possible applications of machine learning techniques for mitigating the worldwide spread of the ongoing COVID-19 pandemic. We are mindful that other publications might not be captured in our scope because the current study is only limited to the eight academic databases mentioned in Table 1 . We are also very dependent on the indexing of the databases used, which is akin to any other bibliometric research study.

Other sections of the study are organized as follows: Section 2 presents the methodology for the survey. Section 3 presents the rudiments of the major machine learning algorithms used in fighting the COVID-19 pandemic. Section 4 presents the adoption of machine learning to fight COVID-19. Section 5 unravels the different sources of COVID-19 datasets. Section 6 discusses the survey and bibliometric analysis. Section 7 unveils challenges and future research directions before the conclusion in Section 8. Figure 1 presents the visual structure of the survey paper, which is similar to the work in (Mohammadi et al., 2018) . 

Inclusion/exclusion criteria were set up based on the research aim to decide which articles are eligible for the next review stage. Articles that meet the inclusion criteria were considered relevant for the research, and those that do not meet the inclusion criteria were excluded. The set inclusion/exclusion criteria are provided in Table 2 . 

Exclusion criteria The review only focuses on COVID-19.

Other viral infections and health issues were not considered relevant in the survey. Only articles that applied machine learning techniques to fight COVID-19 were considered.

Articles using techniques other than machine learning techniques were excluded.

Articles/conference papers published by prominent and indexed journals were included Articles/conference papers published by nonindexed journals were excluded. The article uploaded as a preprint in preprint servers such . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi. org/10.1101 org/10. /2020 as bioRxiv, medRxiv, arXiv, etc. without peer review were excluded.

Only articles written in the English language were considered for inclusion.

Articles written in languages other than English were excluded.

Article selection for this research followed a three-stage analysis. The first analysis stage considered only the titles and abstracts of the papers to extract relevant papers. The second analysis stage considered the analysis of the abstract, introduction, and conclusion to refine the selection in the first stage. At the third and final analysis stages, papers were read thoroughly, and a threshold was set to rate the quality of papers in terms of their relevance to the research. A paper was selected if it reported an empirical application of machine learning to fight COVID-19 similar to Rodriguez-Morales et al. (2020) . Articles that met the threshold value were selected, and those below the threshold were dropped. Figure 2 shows the total number of papers obtained from the academic databases and the final number of papers considered for the research after applying all the extraction criteria. 

VOSviewer software was used to present a bibliometric analysis of the existing literature on COVID-19. VOSviewer software is a tool for constructing and visualizing bibliometric maps of items, such as journals, research, or individual publications. These maps can be created based on citation, bibliographic coupling, co-citation, or co-authorship relations . The bibliometric analysis software also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature (see www.vosviewer.com). We only used 1,178 publications with the keyword "novel coronavirus" and 98 publications with the keyword "COVID-19 and artificial intelligence" that were retrieved from Scopus and Web of Science academic databases for the bibliometric analysis presented in this study. Only 57 document results were extracted using the keyword "COVID-19 and machine learning" from the same academic database. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) deployed to enable continual learning of tasks by resetting the state of the LSTM (Greff et al., 2016) . Its updated architecture consists of multiple LSTM units with each unit having an input gate, a forget gate, an output gate, and a memory cell. Sak et al. (2014) described the underlying architecture of LSTM as consisting of memory blocks in its hidden layer. The memory blocks have memory cells for the storage of the temporal state of the network with additional units, known as gates, to supervise the information flow. A memory cell has an input gate that manages the inflow of input activations to the memory cell and an output gate to manage the outflow of cell activations to the network. The forget gate is incorporated to forget or reset the memory of the cell adaptively. LSTM computes mapping iteratively at a timestamp = 1 with an input sequence = ( 1 , 2, … , ) and an output sequence = ( 1 , 2, … , ). LSTM is good at addressing complex sequential machine learning problems (Karim et al., 2018) . Deep LSTM architectures consist of stacked LSTM layers (Sak et al., 2014) . LSTMs are strong in handling temporal dependencies in sequences but weak in dealing with long sequence dependencies (Karim et al., 2018) .

In the fight against COVID-19, different aspects of artificial intelligence (AI) were applied to curtail its adverse effect (Dananjayan & Raj, 2020) . The taxonomy in Figure 3 was created from the project that involved machine learning in fighting COVID-19. The data used in creating the taxonomy were extracted from the papers that applied the machine learning algorithm to fight COVID-19. 

Currently, the sensitivities for reverse transcription-polymerase chain reaction (RT-PCR)-based viral nucleic acid assay are used as the reference standard method to confirm COVID-19 infection (Corman et al., 2020) . However, such a laboratory test is time consuming, and the supply of test kits may be the bottleneck for a rapidly growing suspicious . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 population even for many developed countries such as the USA. More importantly, initial false-negative or weakly positive RT-PCR test results were found in several later-confirmed cases, while highly suspicious computed tomography (CT) imaging features were present Xie et al., 2020) . The treatment and screening of COVID-19 can be more effective when deep learning approach, CT features, and real-time RT-PCR results are integrated . AI and deep learning can assist in developing diagnostic tools and deciding on treatment (Rao and Vazquez, 2020; Shi et al., 2020) . As a result, many diagnostic tools were developed based on the machine learning algorithm to fight COVID-19. For example, Apostolopoulos and Mpesiana (2020) applied transfer learning with CNN to detect COVID-19 from X-ray images containing common bacterial pneumonia and normal incidents and established COVID-19 infection. Transfer learning CNN was used to diagnose COVID-19 cases from X-ray datasets. The results indicated that VGG19 diagnosed COVID-19 confirmed cases with better accuracy on two-and threeclassification problems compared with MobileNet v2, Inception, Xception, and Inception ResNet v2. The proposed approach can help develop a cost-effective, fast, and automatic COVID-19 diagnostic tool, and reduce the exposure of medical workers to COVID-19. Similarly, Rahaman et al. (2020) developed an automated computer-aided diagnosis (CAD) system for the detection of COVID-19 samples from healthy cases and cases with pneumonia using chest Xray (CXR) images. Their study demonstrated the effectiveness of applying deep transfer learning techniques for the identification of COVID-19 cases using CXR images. Ardakani et al. (2020) were motivated by the time consumption and high cost of the traditional medical laboratory COVID-19 test to investigate the performance of 10 well-known CNNs in diagnosing COVID-19. The 10 variants of CNN included AlexNet, Xception, SqueezeNet, GoogleNet, . All the CNN variants were applied on CT scan images because the CT slice is a fast method of diagnosing patients with COVID-19. The diagnostic results of the CNN variants indicated that ResNet-101 and Xception outperformed the other CNN variants in diagnosing COVID-19. They concluded that ResNet-101 has a high sensitivity in characterizing and diagnosing COVID-19 infections. Therefore, it can be used as an alternative tool in the department of radiology for diagnosing COVID-19 infection. It is cheaper and faster compared with traditional laboratory analysis. Butt et al. (2020) applied CNN for the detection of COVID-19 from the chest CT scan of patients. CNN was found very fast and reliable in the detection of COVID-19 from a chest CT scan compared with the conventional RT-PCR testing. In summary, the CNN model is fast and reliable in detecting COVID-19 infection. Huang et al. (2020) applied a deep learning algorithm on a chest CT scan of a patient with COVID-19 to quantify lung burden changes. The patients with COVID-19 were grouped into mild, moderate, severe, and critical based on findings from the chest CT scan, clinical evaluation, and laboratory results. Deep learning algorithm was applied to assess the lung burden changes. They found that the assessment of lung opacification measured on the chest CT scan substantially differed from that of the clinical groups. The approach can remove the subjectivity in the initial assessment of COVID-19 findings. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. 

Machine learning algorithm Accuracy: 70% -80% Detect COVID-19 severity in a patient at the initial presentation Help in optimal utilization of scarce . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. Mei et al. (2020) proposed a joint model comprising CNN, support vector machine (SVM), random forest (RF), and multilayer perceptron integrated with chest CT scan result and non-image clinical information to predict COVID-19 infection in a patient. CNN was run on the CT image, while the other algorithms classified COVID-19 using the nonimage clinical information. The output of the CNN and the different algorithms were combined to predict the patient's COVID-19 infection. The diagnostic tool can rapidly detect COVID-19 infection in patients. used logistic regression for the prediction of COVID-19 infection sliding to the severity of the COVID-19 cohort. The results of the study showed that the CT quantification for the pneumonia lesions could predict the progression of a patient with COVID-19 to a severe stage at an early, non-invasive level. This situation can provide a prognostic indicator for coronavirus clinical management. Jiang et al. (2020) applied a machine learning algorithm to predict COVID-19 clinical severity. They developed a predictive tool that predicts patients at risk for increased COVID-19 severity at the first presentation. The survey can help in the optimal utilization of scarce resources to cope with the COVID-19 pandemic. Hurt et al. (2020) collected CXR images from patients with COVID-19 in China and America. They applied a deep learning algorithm for the early diagnosis of COVID-19 from the CXR. They found that deep learning predicted and consistently localized areas of pneumonia. The deep learning algorithm can diagnose a patient's COVID-19 infection early. Loey et al. (2020) were motivated by the insufficient COVID-19 dataset to propose a generative adversarial network (GAN) and CNN variant to detect COVID-19 in patients. GAN was used to generate more X-ray images. Googlenet, Alexnet, and ResNet18 were applied as the deep transfer learning models. They found that Googlenet and Alexnet scored 80.6%, 85.2%, and 100%, respectively, in the four-, three-, and two-class classification problem, respectively. The study's method can facilitate the early detection of COVID-19 and reduce the workload of a radiologist. Wu et al. (2020) proposed a multi-view ResNet50 for the screening of COVID-19 from chest CT scan images. ResNet50 was trained with the multi-view chest CT scan images. The results showed that the multi-view ResNet50 fusion achieved a high performance compared with the single view. The diagnosis tool developed can reduce the workload of a radiologist by offering fast, accurate COVID-19 diagnosis. Ucar and Korkmaz (2020) developed a rapid COVID-19 diagnosis tool from X-ray images based on SqeezeNet (a pre-defined CNN) and the Bayesian optimization method. The SqueezeNet hyperparameters were optimized using the Bayesian optimization method. Bayesian optimization-based SqueezeNet was applied to detect COVID-19 from X-ray images labeled normal, pneumonia, and COVID-19. Bayesian-based SqueezeNet outperformed the baseline diagnostic tools. Togaçar et al. (2020) applied CNN for the exploitation of social mimic and CXR based on fuzzy color and the stacking method to diagnose COVID-19. The stacked data were trained using CNN, and the features obtained were processed with mimicking social optimization. The compelling features were used for classification into COVID-19, pneumonia, and standard X-ray imagery using SVM. used CNN and multi-objective differential evolution (MODE) for the early detection of COVID-19 from a chest CT scan image. The initial parameters of the CNN were tuned using MODE to create a MODE-based CNN and classify patients with COVID-19 based on positive or negative . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint chest CT scan images. MODE-based CNN outperformed the competitive models (ANN, ANFIS, and traditional CNN). The proposed method is beneficial for COVID-19 real-time classification owing to its speed in diagnosing COVID-19. Salman et al. (2020) constructed a CNN-based COVID-19 diagnostic tool for the detection of COVID-19 from CXR images. CNN-inceptionV3 was applied to detect COVID-19 from 130 X-ray images of patients infected with COVID-19 and 130 normal X-ray images. The results indicated that CNN-inceptionV3 could detect COVID-19 from the X-ray images and reduce the testing time required by a radiologist. Ozturk et al. (2020) used CNN to develop an automated tool for diagnosing COVID-19 from raw CXR images. Binary and multi-class categories were experimented on using a CNN with 17 convolution layers with a different filter on each convolution layer. The model can be used for the early screening of patients with COVID-19 and assist the radiologist in validating COVID-19 screening. developed an automated framework based on CNN for the detection of COVID-19 from chest CT scan and differentiate it from community-acquired pneumonia. The study collected data from 3,322 patients comprising 4,356 chest CT scans. CNN was applied to detect patients with COVID-19 and typical community pneumonia. The experiment results showed that CNN can distinguish patients with COVID-19 from those with community-acquired pneumonia and other similar lung diseases. The proposed framework automated the COVID-19 testing and reduced the testing time and fatigue. Yang et al. (2020b) applied densely connected convolutional networks optimized with stochastic gradient descent algorithm for the detection of COVID-19 from chest CT scan images. Oh et al. (2020) applied patch-based CNN-ResNet-18 (P-CNN) due to lack of sufficient training data for diagnosing COVID-19 from CXR images. The study used imaging biomarkers of the CXR radiographs. P-CNN ResNet-18 was applied, and P-CNN produced clinically salient maps that are useful in diagnosing COVID-19 and patient triage. P-CNN ResNet-18 achieved the best result compared with the baselined algorithm performance. The limited amount of data can be used for COVID-19 diagnoses and were interpretable. Table 3 summarizes the diagnostic tools developed based on machine learning. Refer to Dong et al. (20202) for an engaging research review on the role of imaging in the detection and management of COVID-19 disease spread.

The decision support system related to COVID-19 can help decision/policymakers formulate policy to curtail COVID-19. Many COVID-19 decision support systems were developed based on machine learning approaches. For example, applied LSTM and linear regression to predict the number of positive cases in Iran. LSTM and linear regression were used on Google search data to predict the COVID-19 cases in Iran. The results indicated that linear regression outperforms LSTM in predicting the positive cases of COVID-19. The algorithm can predict the trend of the COVID-19 pandemic in Iran, which can help policymakers plan the allocation of medical resources. Chimmula and Zhang (2020) applied deep LSTM for forecasting COVID-19 transmission and possible COVID-19 ending period in Canada and other parts of the world. The transmission rate of Canada was compared with that of Italy and the USA. The future outbreak of the COVID-19 pandemic was predicted to help Canadian decision makers monitor the COVID-19 situation and prevent the future transmission of the epidemic in Canada. Liu et al. (2020b) proposed ANN in modeling the trend of COVID-19 and restoring the operational capability of medical services in China. ANN was used for modeling the pattern of COVID-19 in Wuhan, Beijing, Shanghai, and Guangzhou. Autoregressive Integrated Moving Average (ARIMA) was applied for the estimation of nonlocal hospital demands for the period of COVID-19 pandemic in Beijing, Shanghai, and Guangzhou. The results indicated that the number of people infected with COVID-19 would increase by 45%, while death would increase by 567%. COVID-19 will reach its peak by March 2020 and toward the end of April 2020. This finding will assist policymakers and health officials in planning to deal with challenges of the unmet medical requirement of other diseases during the COVID-19 pandemic.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint (2020) proposed a group method of data handling in a neural network to predict the number of COVID-19 confirmed cases based on weather conditions. The dominant weather condition used included temperature, city density, humidity, and wind speed. The results indicated that humidity and temperature have a substantial influence on COVID-19 confirmed cases. Temperature and humidity influence COVID-19 negatively and positively, respectively. These results can be used by decision makers to manage the COVID-19 pandemic. Yang et al. (2020b) applied LSTM to predict the COVID-19 trend in China. The prediction model indicated that the COVID-19 pandemic should peak toward the end of February 2020 and start declining at the end of April 2020. The prediction model can be used by authorities in China to decide in controlling the COVID-19 pandemic. Vaid et al. (2020) adopted a machine learning approach to predict COVID-19 potential infections based on reported cases in North America. Critical parameters were identified from dimension reduction. Passed diseases were inferred from recent fatalities using a hierarchical Bayesian estimator. The model predicted potential COVID-19 infections in North America. Policymakers in North America can use the projection to curtail the effect of the COVID-19 pandemic. Tuli et al. (2020) developed a machine learning COVID-19 predictive model and deployed it in the cloud computing environment for real-time tracking of COVID-19, predicting the growth and potential thread of COVID-19 in different countries worldwide. Government and citizens can use the results for proactive measures to fight COVID-19.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint Tiwari et al. (2020) used a machine learning approach to predict the COVID-19 pandemic number of cases, recoveries, and deaths in India based on data from China. The prediction results indicated that COVID-19 would peak between the third and fourth week of April 2020. The Indian government can use the study to formulate policies and decide on mitigating the spread of COVID-19. Ribeiro et al. (2020) evaluated six machine learning algorithms, namely, Cubis regression (CUBIST), RF, ridge regression (RIDGE), support vector regression (SVR), ARIMA, and stackingensemble learning (SEL), on COVID-19 datasets collected in Brazil to predict confirmed cases for one, three, and six days ahead. They found that SVR outperformed RIDGE, ARIMA, RF, CUBIST, and SEL. The study can help monitor COVID-19 cases in Brazil and facilitate critical decisions on COVID-19. Tummers et al. (2020) applied k-means to cluster documents based on COVID-19 and people with intellectual disability. Table 4 summarizes the studies on COVID-19 decision support system.

The protein sequence of COVID-19 can be collected to apply the machine learning approach for the prediction of COVID-19 (Qiang et al., 2020) . For example, Qiang et al. (2020) predicted the infection risk of non-human origin of COVID-19 from spike protein for prompt alarm using RF. The genome data comprised of non-human COVID-19 origin (positive) and human COVID-19 origin (negative). RF was applied for the training to predict non-human COVID-19 origin. The results showed that the RF model achieved high accuracy in predicting non-human COVID-19 origin. The study can be used in COVID-19 genome mutation surveillance and exploring evolutionary dynamics in a simple, fast, and large-scale manner. combined decision tree and digital signal processing (DT-DSP) to detect the COVID-19 virus genome and identified the signature of intrinsic COVID-19 viruses' genome. DT-DSP was applied to explore over 5,000 viral genome sequences with 61.8 million by the 29 viral sequences of COVID-19. The result obtained supported the bat origin of COVID-19 and successfully classified COVID-19 with 100% accuracy as sub-genus Sarbecovirus within Betacoronavirus. DT-DSP is a reliable real-time alternative taxonomic classification. Table 5 summarizes the studies. The DT-DSP is a reliable real-time alternative for the classification of taxonomic

Machine learning and AI provide approaches for the speedy processing of a large amount of collected medical data generated daily as well as the extraction of new information from transversely different applications. In the prediction of disease, a viral mutation can be forecast before the emergence of new strains. It also allows the prediction of new structure and availability of broader structural information. Efficient drug repurposing can be achieved in mining existing data. The stages for the development of COVID-19 drugs are as follows : Disease prediction: The prediction of future-generation viral mutation can be accomplished by AI and machine learning approaches. Structural analysis: The COVID-19 structure and primary functional site are characterized. Drug repurposing: For insight into new disease treatment, existing drug data are mined. New drug development: Efficiencies across the entire pharmaceutical life cycle are achieved by rapid processing. Ke et al. (2020) applied machine learning . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10. 1101 to identify drugs already marketed that can treat COVID-19. They compiled two independent datasets to develop two machine learning models. The first model was built based on drugs that are known to have antiviral activities. The second model was built based on 3C-like protease inhibitors. The database of market-approved drugs was screened by the machine learning model to predict the drugs with potential antiviral activities. The drugs predicted to have antiviral activities were evaluated against the antiviral activities by a cell-based feline infectious peritonitis virus duplication assay. The assay results were the machine learning model feedbacks for incremental learning of the model. Finally, 80 marketed drugs were identified to have potential antiviral activities. Old drugs with antiviral activities against feline infectious peritonitis COVID-19 were found.

Typically, the immune system is prepared to elicit antibody or cell-mediated responses against a pathogen by a vaccine that protects the body from infectious diseases. Immunogenicity is the vaccine ability to the response. For a long-time, effective immunity, the vaccine has to properly activate innate, adaptive responses (Klein et al., 2010) .

The following phases should be adopted to develop a COVID-19 vaccine (Gonzalez-Dias et al., 2020) : Dataset preparation: The quality of the data to be used influences the machine learning algorithm. Thus, preparing quality data before feeding into the algorithm is sacrosanct. Data come in different sizes ranging from small, medium, and large. Data quality must be ensured because a quality immune response is needed. The reliability of the data needs to be guaranteed by ensuring that the serological assay is well qualified in case it is not validated based on known parameters (linearity, specificity, LLOQ, ruggedness, LLOD, ULOQ, and reproducibility). Vaccines and relevant genes: In vaccinology, the machine learning algorithm is trained to discover the combination of genes and the best vaccines parameters. The data for the training are extracted from omics experiment, which will be used to obtain the required combination. Feature selection is performed to find the best representative of the discriminatory gene signatures. Then, the new vaccines are predicted. The three main feature selection methods are filter, wrapper, and embedded. Machine learning algorithm selection: This task is not a straightforward task because many factors must be considered before selecting the appropriate algorithm for the modeling. The choice of the algorithm depends on the nature of the data, and the options include supervised, unsupervised, and semi-supervised learning. For instance, if the data have no output, then unsupervised learning algorithm, e.g., k-the nearest neighbor is the possible candidate algorithm for the modeling but is not guaranteed. Many algorithms have to be tested on solving the same problem before the algorithm that produces the best output is selected. Model testing: The performance of the model is tested. The data are partitioned into training and testing; the former is used for training the algorithm, and the latter is used for evaluating the performance of the model using several performance parameters, e.g., MSE, accuracy, and Fmeasure (Gonzalez-Dias et al., 2020) . The application of a machine learning algorithm to sift through trillions of compounds of the vaccine adjuvants can shorten the vaccine development time. The machine learning algorithm can be used for screening compounds for a potential adjuvant candidate for the SARS-CoV-2 vaccine (Ahuja et al., 2020) . Ahuja et al. (2020) reported that COVID-19 data are now growing. In this section, we present the sources of COVID-19 data to the machine learning community. Given the novelty of the virus, centralizing the collection of sources of data will help researchers access different types of COVID-19-related data and provide them opportunities to work on a different aspect of COVID-19 that may lead to novel discoveries. Table 6 has five columns, where the first, second, third, fourth, and fifth columns represent the reference, data, owners, source/accessibility, and remarks, respectively. We only present the projects that revealed and fully discussed their data sources.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint The data is chaos game representation of SARS-CoV-2 containing both the raw and processed data with 100 instances of SARS-CoV-2 genome Butt et al. (2020) CT scan Butt et al. Huang et al. (2020) 126 COVID-19 patients that underwent a CT chest scan from 1/1/2020 to 3/2/ 2020 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint Hurt et al. (2020) x-ray Hurt et al. 

In this section, we discuss the diagnosis of COVID-19 based on X-ray and CT scan images because of their high value in COVID-19 screening. Table 6 shows that researchers heavily utilize X-rays and CT scans in developing a machinelearning-based COVID-19 diagnosis tool. Guan et al. (2020) and Wong et al. (2020) found that portable chest radiography (CXR) has a sensitivity of 59% for the initial detection of COVID-19-related abnormalities. Radiographic abnormalities, when present, mirror those of CT, with a bilateral lower zone, a peripherally predominant consolidation, and hazy opacities (Wong et al., 2019) . The radiological findings of COVID-19 on CXR are those of atypical pneumonia or organizing pneumonia (Kooraki et al., 2020) . Although chest CT scans are reportedly less sensitive than CXRs, chest radiography remains the first-line imaging modality of choice used for patients with suspected COVID-19 infection because it is cheap and readily available, and can easily be cleaned. For ease of decontamination, the use of portable radiography units is preferred. Chest radiographs are often normal in early or mild disease. According to a recent study of patients with COVID-19 requiring hospitalization, 69% had an abnormal chest radiograph at the initial time of admission, and 80% had radiographic abnormalities sometime during hospitalization. The findings are reported to be most extensive about 10-12 days after symptom onset. The most frequent radiographic findings are airspace opacities, whether described as consolidation or less commonly, groundglass opacity (GGO) . The distribution is most often bilateral, peripheral, and lower zone predominant (Rodrigues et al., 2020). Unlike parenchymal abnormalities, pleural effusion is rare (3%) . According to the Center for Disease Control (CDC), even if a chest CT or X-ray suggests COVID-19, viral testing is the only specific method for diagnosis. Radiography's sensitivity was reported at only 25% for detection of lung opacities related to COVID-19, among 20 patients seen in South Korea with a reported specificity of 90% (Wen et al., 2020) . The X-ray image should be considered a useful tool for detecting COVID-19 which is challenging the healthcare system due to the overflow of patients. As the COVID-19 pandemic grinds on, clinicians on the front lines may increasingly turn to radiography (Casey, 2020) . The most frequent findings are airspace opacities, whether described as consolidation or less commonly, GGO. The distribution is most often bilateral, peripheral, and lower zone predominant . Much of the imaging focus is on CT. In February 2020, Chinese studies revealed that chest CT achieved a higher sensitivity for the diagnosis of COVID-19 compared with initial RT-PCR tests of pharyngeal swab samples Fang et al., 2020) . Subsequently, the National Health Commission of China briefly accepted chest CT findings of viral pneumonia as a diagnostic tool for detecting COVID-19 infection (Yuen et al., 2020; Zu et al., 2020) .

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint

The typical appearance of COVID-19 on chest CT consists of multi-lobar, bilateral, predominantly lower lung zone, rounded GGOs, with or without consolidation, in a mostly peripheral distribution. However, such findings are nonspecific; the differential diagnosis includes organizing pneumonia and other infections, drug reactions, and other inflammatory processes. Consequently, using CT to screen for COVID-19 may result in false positives. Moreover, the presence of abnormalities not typically associated with COVID-19 infection, including pure consolidation, cavitation, thoracic lymphadenopathy, and nodules suggests a different etiology . COVID-19-related chest CT abnormalities are more likely to appear after symptom onset, but they may also precede clinical symptoms. In a retrospective study by Bernheim et al. (2020) , 44% of patients presenting within two days of symptom onset had an abnormal chest CT, while 91% presenting within 3-5 days and 96% presenting after six days had abnormal chest CTs. Shi et al. (2020) found GGOs in 14 of 15 asymptomatic healthcare workers with confirmed COVID-19. Similarly, 54% of 82 asymptomatic passengers with COVID-19 on the Diamond Princess cruise ship had findings of viral pneumonia on the CT (Inui et al., 2020) .

In a prospective study by Wang et al., pure GGOs were the only abnormalities seen prior to symptom onset. Subsequently, 28% of patients developed superimposed septal thickening 6-11 days after symptom onset . Architectural distortion evolving from GGOs appeared later in the disease course, likely reflecting organizing pneumonia and early fibrosis. Long-term follow-up imaging is also needed to determine the sequelae of SARS-CoV-2 infection. In a retrospective study by Das et al., 33% of patients who recovered from MERS-CoV developed pulmonary fibrosis; a similar outcome following COVID-19 is likely (Das et al., 2020) . Lung ultrasound offers a low-cost, point-of-care evaluation of the lung parenchyma without ionizing radiation. The modality is especially useful in resource-limited settings (Stewart et al., 2020). Peng et al. (2020) found that sonographic findings in patients with COVID-19 correlated with typical CT abnormalities. The predominantly peripheral distribution of lung involvement facilitated sonographic visibility. Characteristic findings included thickened, irregular pleural lines, B lines (edema), and eventual appearance of A lines (air) during recovery. Peng et al. (2020) suggested that ultrasound may be useful in recruitment maneuver monitoring and guide prone positioning.

Previous studies confirmed that the majority of patients infected with COVID-19 exhibited common chest CT characteristics, including GGOs and consolidation, which reflect lesions affecting multiple lobes or infections in the bilateral lung parenchyma. Increasing evidence suggests that these chest CT characteristics can be used to screen suspected patients and serve as a diagnostic tool for COVID-19-caused acute respiratory diseases (ARDS) . These findings have led to the modification of the diagnosis and treatment protocols of SARS-CoV-2-caused pneumonia to include patients with characteristic pneumonia features on chest CT but negative RT-PCR results in severe epidemic areas such as Wuhan City and Hubei Province . Patients with negative RT-PCR but positive CT findings should be isolated or quarantined to prevent clustered or wide-spread infections. The critical role of CT in the early detection and diagnosis of COVID-19 becomes more publicly acceptable. However, several studies also reported that a proportion of RT-PCR-positive patients, including several severe cases, had initially normal CXR or CT findings . According to the diagnostic criteria of COVID-19, patients might have no or atypical radiological manifestations even at the mild or moderate stages because several lesions are easily missed in the low-density resolution of CXR, suggesting that chest CT may be a better modality with a lower false-negative rate. Another possible explanation is that in several patients, the targeted organ of COVID-19 may not be the lung. Multiple-organ dysfunctions, including ARDS, acute cardiac injury, hepatic injury, and kidney injury, have been reported during COVID-19 infection . Studies also reported the chest CT appearances in patients with COVID-19 after treatment, suggesting its critical role in treatment evaluation and follow up. For example, a study investigated the change in chest CT findings associated with COVID-19 at different time points during the infection course (Pan et al., 2020) . The results showed that most apparent abnormalities on the chest CT were still observable for 10 days but disappeared at 14 days after the initial onset of symptoms. Unexpectedly, a case report showed pre-and post-treatment chest CT findings of a 46-year-old woman whose RT-PCR result became negative, while pulmonary lesions were reversal (Duan et al., 2020) .

Singh et al. (2020) developed a deep CNN, which was applied in the automated diagnosis and analysis of COVID-19 in infected patients to save the time and energy of medical professionals. They tuned and used the hyperparameters of CNN by using multi-objective adaptive differential evolution (MADE). Further in the course of their experiments . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ;

which were extensively carried out, they used several benchmark COVID-19 datasets. The data used to evaluate the performance of their proposed model were divided into training and testing datasets. The training sets were used to build the COVID-19 classification model. Then, the hyperparameters of the CNN model were optimized on the training sets by using the MADE-based optimization approach. The results from the comparative analysis showed that their proposed method outperformed existing machine learning models such as CNN, GA-based CNN, and PSO-based CNN in terms of different metrics (including F-measure, sensitivity, specificity, and Kappa statistics). Jaiswal et al. (2020) applied deep learning models for the diagnosis and detection of COVID-19, and it was called DenseNet201-based deep transfer learning (DTL). The authors used these pre-trained deep learning architecture as automation tools to detect and diagnose COVID-19 in chest CT scans. The DTL model was used to classify patients as COVID-19 positive (+ve) or COVID-19 negative (−ve). The proposed model was also utilized to extract several features by adopting its own learned weights on the ImageNet dataset along with a convolutional neural structure. Extensive analysis of the experiments showed that the proposed DTL-based COVID-19 model was superior to competing methods. The proposed DenseNet201 model achieved a 97% accuracy compared with other models and could serve as an alternative to other COVID-19 testing kits. developed a fully automated AI system to assess the severity of COVID-19 and its progression quantitatively using thick-section chest CT images. The AI system was implemented to partition and quantify the COVID-19-infected lung regions on thick-section chest CT images automatically. The data generated from the automatically segmented lung abnormalities were compared with those of the manually segmented abnormalities of two professional radiologists by using the Dice coefficient on a randomly selected subset of 30 CT scans. During manual and automatic comparisons, two biomarker images were automatically computed, namely, the portion of infection (POI) and the average infection HU (iHU), which were then used to assess the severity and progression of the viral disease. The performance of the assessments was then compared with patients' status of diagnosis reports, and key phrases were extracted from the radiology reports using the area under the receiver's operating characteristic curve (AUC) and Cohen's kappa statistics. Further in their study, the POI was the only computed imaging biomarker that was effective enough to show high sensitivity and specificity for differentiating the groups with severe COVID-19 and groups with non-severe COVID-19. The iHU reflected the progress rate of the infection but was affected by several irrelevant factors such as the construction slice thickness and the respiration status. The results of the analysis revealed that the proposed deep-learning-based AI system accurately quantified the COVID-19 strains associated with the lung abnormalities, and assessed the virus' severity and its corresponding progression. Their results also showed that the deep learning-based tool can help cardiologists in the diagnosis and follow-up treatment for patients with COVID-19 based on the CT scans. used a CNN to classify patients with COVID-19 as COVID-19 +ve or COVID-19 −ve. The initial parameters of CNN were tuned by using MODE. The authors adopted the mutation, crossover, and selection operations of the differential evolution (DE) algorithm. They extracted the chest CT dataset of COVID-19infected patients and decomposed them into training and testing groups. The proposed MODE-based CNN and competitive classification models were then applied to the training dataset. They compared the competitive and proposed classification models by considering different fractions of the training and testing datasets. The extensive analysis showed that the proposed model classified the chest CT images at reasonable accuracy rates compared with other competing models, such as ANN, ANFIS, and CNS. The proposed model was also useful for COVID-19 disease classification from chest CT images. Asif and Yi (2020) implemented a model that automatically detected COVID-19 pneumonia in patients using digital CXR images while maximizing the accuracy in detection by using deep convolutional neural networks (DCNN). Their model named DCNN-based model Inception V3 with transfer learning detected COVID-19 infection in patients using CXR radiographs. The proposed DCNN also provided insights on how deep transfer learning methods were used for the early detection of the disease. The experimental results showed that the proposed DCNN model achieved high accuracy. The proposed model also exhibited excellent performance in classifying COVID-19 pneumonia by effectively training itself from a comparatively lower collection of images. Hu et al. (2020) implemented a weak supervised deep learning model for detecting and classifying COVID-19 infection from CT images. The proposed model minimized the requirements of manual labelling of CT images and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 accurately detected the viral disease. The model could distinguish positive COVID-19 cases from non-positive COVID-19 cases by using COVID-19 samples from retrospectively extracted CT images from multiple scanners and centers. The proposed method accurately pinpointed the exact position of the lesions (inflammations) caused by the viral COVID-19 and potentially provided advice on the patient's severity to guide the disease triage and treatment. The experimental results indicated that the proposed model achieved high accuracy, precision, and classification as well as good qualitative visualization for the lesion detections. conducted a study to predict the incidence and occurrence of COVID-19 in Iran. The authors obtained data from the Google Trends website (recommender systems) and used linear regression and LSTM models to estimate the number of positive COVID-19 cases from the extracted data. Root mean square error and 10fold cross-validation were used as performance metrics. The predictions obtained from the Google Trend's website were not very precise but could be used to build a base for accurate models for more aggregated data. Their study showed that the population (Iranians) focused on the usage of hand sanitizer and handwashing practices with antiseptic as preventive measures against the disease. The authors used specific keywords related to COVID-19 to extract Google search frequencies and used the extracted data to predict the degree of COVID-19 epidemiology in Iran. They suggested future research direction using other data sources such as social media information, people's contact with the special call center for COVID-19, mass media, environmental and climate factors, and screening registries. integrated supervised machine learning with digital signal processing called MLDSP for genome analyses, which were then augmented by a DT approach to the machine learning component, and a Spearman's rank correlation coefficient analysis for result validation. The authors identified an intrinsic COVID-19 virus genome signature and used it together with a machine-learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of the COVID-19 genomes. They also demonstrated how machine learning used intrinsic genomic signature to provide a rapid alignment-free taxonomic classification of novel pathogens. The model accurately classified the COVID-19 virus without having a priori knowledge by simultaneous processing of the geometric space of all relevant viral genomes. Their result analysis supported the hypothesis of a bat origin and classified the COVID-19 virus as Sarbecovirus within Betacoronavirus. Also, their results were obtained through a comprehensive analysis of over 5,000 unique viral sequences through an alignment-free analysis of their 2D genomic signatures, combined with a DT use of supervised machine learning, and confirmed by Spearman's rank correlation coefficient analyses. Farhat et al. (2020) reviewed the developments of deep learning applications in medical image analysis which targeted pulmonary imaging and provided insights into contributions to COVID-19. The study covered a survey of various contributions from diverse fields for about three years and highlighted various deep learning tasks such as classification, segmentation, and detection as well as different pulmonary pathologies such as airway diseases, lung cancer, COVID-19, and other infections. The study summarized and discussed current state-of-the-art approaches in the research domain, highlighting the challenges, especially given the current situation of COVID-19. First, the authors provided an overview of several medical image modalities, deep learning, and surveys on deep learning in medical imaging, in addition to available datasets for pulmonary medical images. Second, they provided a summarized survey on deep-learning-based applications and methods on pulmonary medical images. Third, they described the COVID-19 disease and related medical imaging concerns, summarized reviews on deep learning application to COVID-19 medical imaging analysis, and listed and described contributions to this domain. Finally, they discussed the challenges experienced in the research domain and made suggestions for future research.

In this survey, we review the projects that used machine learning to fight COVID-19 from a different perspective. We only considered published papers in reputable journals, and conferences, and no preprint papers uploaded in preprint server was used in the survey. We apprised 30 studies that reported the description of the machine learning approach to fighting COVID-19. We found that machine learning has made an inroad into fighting COVID-19 from a different . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint aspect with potential for real-life applications to curtail the negative effect of COVID-19. Machine learning algorithms such as CNN, LSTM, and ANN that are utilized in fighting COVID-19 mostly reported excellent performance compared with the baseline approaches. Many of the studies complained about the scarcity of sufficient data to carry out large-scale study because of the novelty of the COVID-19 pandemic.

We found that various studies used different COVID-19 data. Figure 4 depicts the type of data used in different studies that applied machine learning algorithm to develop different models for fighting COVID-19 pandemic. The data used to plot Figure 4 were extracted from machine learning research on COVID-19 (refer to Table 6 ). The longest bars show that X-rays and CT scans have the highest patronage from the studies. Many of the studies used deep learning algorithms, e.g., CNN and LSTM, for the diagnosis of COVID-19 on X-rays and CT scans. The evaluation indicated the excellent performance of the algorithms in detecting COVID-19 on X-rays and CT scan images. The CT scan has a great value in the screening, diagnosis, and follow up of patients with COVID-19. The CT scan has now been added as a criterion for diagnosing of COVID-19 . The X-rays with COVID-19 pandemic data project by Cohen hosted on GitHub is receiving unprecedented attention from the research community for accessing freely available data. Figure 5 presents the frequency of machine learning algorithms adopted to fight COVID-19. The longest bar indicates that CNN received the most considerable attention from the researchers working in this domain to fight COVID-19. The likely reason why CNN has the highest number of applications is that most of the data used in detecting COVID-19 infection in patients are images (see Figure 4 ). CNN is well known for its robustness, effectiveness, and efficiency in image processing compared with other conventional machine learning algorithms because of its automated feature engineering and high performance. The CNN variant suitable for the diagnosis of COVID-19 from X-ray and CT scan images is ResNet. However, many of the studies did not provide the specific type of CNN adopted for the diagnosis of the COVID-19 from X-ray and CT scan images (see Table 3 ). CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint Figure 5 : Machine learning algorithms adopted in fighting COVID-19 Figure 6 shows the different aspects where machine learning algorithms were applied in fighting COVID-19. We found that the studies mainly adopted machine learning algorithms in developing COVID-19 diagnosis tools, decision support system, drug development, and detection from protein sequence. The most extended portion of the pie chart indicates that diagnostic tools attracted the most considerable attention, showing the quest for diagnostic tools in the fight against COVID-19 pandemic because the match starts with a diagnosis before the appropriate treatment is administered to save a life, and incorrect diagnosis can lead to inappropriate medication, resulting in further health complications. Most of the studies that adopted machine learning to develop diagnostic tools intended to reduce the workload of radiologists, improve the speed of diagnosis, automate the COVID-19 diagnostic process, reduce the cost compared with traditional laboratory tests, and help healthcare workers in making critical decisions. 

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 The studies argued that the diagnostic tool could reduce the exposure of healthcare workers to patients with COVID-19, thus decreasing the risk of spreading COVID-19 to healthcare workers. The second part of the pie chart with the most substantial portion is the decision support system for detecting the rate of spread of the virus, confirmed cases, mortalities, and recovered cases. This information from the decision support system can help the government functionaries, policymakers, decision makers, and other stakeholders in formulating policy that can help fight COVID-19 pandemic. 

The primary purpose of conducting a bibliometric analysis study in this study is to reflect the trend of rapidly emerging topics on COVID-19 research, where substantial research activity has already begun extensively during the early stage of the outbreak. Another significance of the bibliometric analysis method presented is to aid in the mapping of research situation on coronavirus disease as reported in several scientific works of literature by the research community. In this section, we present the bibliographic coupling among different article items on machine learning to fight COVID-19. The link between the items on the constructed map corresponds to the weight between them either in terms of the number of publications, common references, or co-citations. These items may belong to a group or a cluster. In the visualization, items within the same cluster are marked with the same color, and colors indicate the cluster to which a journal was assigned by the clustering technique implemented by the VOSviewer software. The circular node may represent the items, and their sizes may vary depending on the weight of the article .

The bibliographic coupling between the top 25 authors is shown in Figure 8 . The two clusters, namely, red and green, correspond to all authors working on similar research fields "COVID-19" and citing the same source in their reference listings. The similarity in cluster color for the authors also implies that the degree of overlap between the reference lists of publications of these authors is higher. Figure 8 shows the visible names, and other names may not be represented in the constructed map. 

Year . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 Figure 9 shows the bibliographic coupling of the topmost productive countries. Here, bibliographic coupling indicates that a common reference list in the papers published by these countries. The five clusters are represented by six colors. Red represents China and the USA with the highest strength in terms of contributions, after which comes India and Iran as the next countries within the red node. Green represents Hong Kong, which appears to have the highest strength, whereas blue is for the United Kingdom and Saudi Arabia that have the highest strength. Yellow denotes Japan, Singapore, Thailand, and Taiwan as the highest contributors. Purple refers to Italy and Canada as the two contributing countries. The link between the red and green clusters are thicker compared with that between the blue and red clusters, or between the blue and purple clusters. The thickness of the link simply depicts the degree of intersection of the literature work between the different locations or countries.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 This article has been accepted for publication in PeerJ Computer Science . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 This article has been accepted for publication in PeerJ Computer Science 

Bibliographic coupling between journals implies that the papers published in these journals have more common reference lists. Three clusters are depicted on the map with red, blue, and green colors. The links with the highest strength occur between Emerging Microbes Journal, Journal of Virology, and Journal of Infection. This link is closely followed by the links between Eurosurveillance and Journal of Infection, Archive of Academic Emergency Medicine, Chinese Medical Journal, and The Lancet. The Journal of Infection Control and Hospital and Journal of Hospital Infection form the weakest networks of a cluster. Figure 11 illustrates the bibliographic coupling between the considered journals.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101 doi: medRxiv preprint Figure 11 : Bibliographic coupling among journals Figure 12 illustrates the co-authors and author map visualization. This analysis aims to produce the visualization of all the major authors publishing together or working on similar research fields. The analysis type is co-authorship, and the unit of analysis is authors. The threshold of the minimum number of papers by an author is 25. Network construction and analysis shows that of 2,381 authors, only 9 authors meet the limits. However, the most extensive set of connected entities consists of only 8 authors, whose visual representation is depicted in Figure 12 , where only one cluster is denoted by red color. The connected link illustrates that these authors have collaborated on the same project or worked on the same research with a similar focus. The thickness of the link between these three authors indicates more common publications.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10. 1101 This article has been accepted for publication in PeerJ Computer Science Figure 12 : Co-authorship and authors' analysis Figure 13 illustrates the citation analysis among authors' institutions. Six clusters are represented using different colors. The red cluster has the highest number of author citations from two institutions, namely, the Huazhong University of Science and Technology, Wuhan University (State Key Laboratory of Virology), and the Department of Microbiology, University of Hong Kong. Figure 14 shows the bibliometric analyses of author citations by journal sources. A link between two journal sources indicates the citation connectivity between the two sources.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.04.20225698 doi: medRxiv preprint CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ;

The connected links between the Journal of Virology and the New England Journal of Medicine in Figure 14 reveal that a publication from the Journal of Virology has cited another publication that is published in the New England Journal of Medicine or vice versa. The thickness and link strength signify more numbers of citation among the clusters. Therefore, among the different clusters identified in the analysis, the Journal of Virology is the top-cited source by publication from other journal sources.

In this section, we present challenges and future research prospects. More so, Figure 15 describes the course of conducting the literature survey and opportunities for future research with the possibility of solving the challenges to help expert researchers easily identify areas that need development. The challenges and future research opportunities are presented as follows:

Lack of sufficient COVID-19 data: The primary concern with the research in COVID-19 is the barrier prompted by the lack of adequate COVID-19 clinical data (Alimadadi et al., 2020; Mei et al., 2020; Fong et al., 2020; Oh et al., 2020; Togaçar et al., 2020; Ucara and Korkmazb, 2020; Belfiore et al., 2020; Oyelade and Ezugwu, 2020) . However, an in-depth analysis of patients with COVID-19 requires much more data (Apostolopoulos and Mpesiana, 2020). Data is the key component in machine learning. Machine learning approaches typically experience a limitation in their efficiency and effectiveness in solving machine learning problems without sufficient data. Therefore, insufficient COVID-19 clinical data can limit the performance of specific machine learning algorithms, such as deep learning algorithms that require large-scale data. In this case, developing machine-learningbased COVID-19 diagnostic and prognosis tools, and therapeutic approaches to curtail COVID-19, and predicting a future pandemic can face a severe challenge in terms of performance due to insufficient COVID-19 clinical data. Alimadadi et al. (2020) suggested global collaborations among stakeholders to build COVID-19 clinical database and mitigate the issue of inadequate COVID-19 clinical data. Existing biobanks containing the data of patients with COVID-19 are integrated with COVID-19 clinical data. We suggest that researchers use GAN to generate additional X-rays and CT scan images for COVID-19 to obtain sufficient data for building COVID-19 diagnosis tools. For example, Loey et al. (2020) were motivated by insufficient data and used GAN to generate more X-ray images and develop a COVID-19 diagnostic tool. Figure 4 shows that X-ray and CT scan are the two primary clinical data for detecting COVID-19 infection in patients. Distinguishing patients with COVID-19 and mild symptoms from pneumonia on X-ray images could be visualized inaccurately or cannot be visualized totally (Apostolopoulos and Mpesiana, 2020). We suggest that researchers propose machine learning strategies that can accurately differentiate patients with COVID-19 and mild symptoms from patients with pneumonia symptoms based on X-ray images. COVID-19 that is caused by coronavirus might have a CT scan image characteristic similar to other pneumonia caused by a different virus. In the future, the performance of CNN should be evaluated in classifying COVID-19 and viral pneumonia with RT-PCR Uncertainties: When a new pandemic breaks out, it comes with limited information and very high uncertainly, unlike the commonly known influenza. Therefore, knowledge regarding the new epidemic is not sufficient due to the absence of a prior case that is the same as the recent pandemic. In the case of COVID-19, many of the decision makers relied on SARS for reference because of the similarity, even though it is considerably different from COVID-19. The new pandemic typically poses a challenge to data analytics, considering its limited information and geographical and temporal evolving of the recent epidemic. Therefore, an accurate model for predicting the future behavior of a pandemic becomes challenging due to uncertainty . We suggest that researchers propose a new pandemic forecasting model based on active learning in machine learning to reduce the level of uncertainty, typically accompanying new pandemics such as COVID-19. applied Susceptible, Exposed, Infectious, Recovered (SEIR) for modeling COVID-19. However, the SEIR model could not capture the complete number of infected cases, while the study ignored imported COVID-19 confirmed cases. SEIR was based on the people's natural distribution and cannot apply to welfare institute an example of different population distribution. The epidemiological trend of COVID-19 was not predicted accurately by the SEIR model under the viral mutation and specific ant-viral therapy development scenario.

. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ;

The SEIR model was unable to simulate non-uniform patterns, such as the issue of increasing medical professionals and bed capacity . We suggest that researchers propose a machine-learning-based strategy for handling the non-uniform pattern in the future and consider all the other factors not considered in the study.

Adequate COVID-19 data for a particular region are lacking because the capacity to gather reliable data is not uniform across regions worldwide. This situation can bring a challenge to the region without available COVID-19 data. We suggest that researchers apply the cross-population train-test model because a model trained in a different region can be used to detect COVID-19 in a different region. For example, the model trained to detect the new virus in Wuhan, China, can be used in Italy (Santosh, 2020) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10. 1101 Image resolution: The resolution of the X-ray images affects the performance of the machine learning algorithm. Dealing with low-resolution images typically poses a challenge to the machine learning approach. Variable size of the resolution dimension has a negative effect. Successful performance cannot be achieved if the input images of the data have different sizes. The original image resolution dimension, structured images, and stacking technique need to be the same (Togaçar et al., 2020) . We suggest high-resolution X-ray images for developing COVID-19 diagnostic and prognosis system with the ability to work with low-resolution X-ray images.

Outliers and noise: At the early phase of COVID-19, the COVID-19 data contained many outliers and much noise (Tuli et al., 2020) . An outlier in data is a subset of the data that appears with inconsistencies from the remaining data. Outliers typically lower the fit of a statistical model (Bellazzi et al., 1998) . The presence of outliers and noise in COVID-19 makes predicting the correct number of COVID-19 cases challenging (Tuli et al., 2020) . Dealing with outliers and noise in data increases data engineering efforts and expenses. We suggest that researchers propose a robust machine learning approach that can effectively handle outliers and noise in COVID-19 data.

The limitation of deep learning algorithms is a deficiency in terms of transparency and interpretability. For instance, knowing the image features that are applied to decide the output of the deep learning algorithms is not possible. The unique features used by the deep learning algorithm to differentiate COVID-19 from CAP cannot be sufficiently visualized by the heatmap, although the heatmap is used to visualize region in images that led to the algorithm output . Images, especially X-rays and CT scans, are heavily relied on in detecting COVID-19. We suggest that researchers propose explainable deep learning algorithms for the detection of COVID-19 to instill transparency and interpretation in deep learning algorithms.

The application of a deep learning algorithm to detect COVID-19 on a chest CT scan has the possibility of misdiagnosis because of the similarity of the COVID-19 symptoms with other types of pneumonia (Belfiore et al., 2020) . Incorrect diagnoses can mislead the health professional in deciding and lead to inappropriate medication, further complicating the health condition of the patient with COVID-19. We suggest that researchers combine the CT scan diagnosis using deep learning algorithm with clinical information such as the nucleic acid detection results, clinical symptoms, epidemiology, and laboratory indicators to avoid misdiagnosis .

Resource allocation is a challenge as the COVID-19 pandemic keeps spreading because the increase in the number of patients means more resources are required to take care of them. The allocation of limited resources in a rapidly expanding pandemic entails a difficult decision for the distribution of scarce resources . The epicenters of the COVID-19 are challenged with resource problems of shortage of beds, gowns, masks, medical staff, and ventilators (Ahuja et al., 2020; Taiwo and Ezugwu, 2020) . We propose the development of a machine learning decision support system to help in crucial decisions on resource allocation.

In this study, we propose a survey, including a bibliometric analysis of the adoption of machine learning, to fight COVID-19. The concise summary of the projects that adopted machine learning to fight COVID-19, sources of COVID-19 datasets, new comprehensive taxonomy, synthesis and analysis, and bibliometric analysis is presented. The results reveal that COVID-19 diagnostic tools received the most considerable attention from researchers, and energy and resources are more dispensed toward automated COVID-19 diagnostic tools. By contrast, COVID-19 drugs and vaccine development remain grossly underexploited. The algorithm predominantly utilized by the researchers in developing the diagnostic tool is CNN mainly from X-rays and CT scan images. The most suitable CNN architecture for the detection of COVID-19 from the X-ray and CT scan images is ResNet. The challenges hindering practical work on machine learning to fight COVID-19 and a new perspective to solve the identified problems are presented in the study. We believe that our survey with bibliometric analysis could enable researchers to determine areas that need further development and identify potential collaborators at author, country, and institutional levels.

Based on the bibliometric analysis conducted on the global scientific research output on COVID-19 disease spread and preventive measures, the analysis results reveal that most of the research outputs were published in prestigious journals with high influence factors. These journals include The Lancet, Journal of Medical Virology, and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10. 1101 Eurosurveillance. The bibliometric analysis also shows the focused subjects in various aspects of COVID-19 infection transmission, diagnosis, treatment, prevention, and its complications. Other prominent features include strong collaboration among research institutions, universities, and co-authorships among researchers across the globe.

Machine learning algorithms have many practical applications in medicine, and novel contributions from different researchers are still evolving and growing exponentially in a bid to satisfy the essential clinical needs of the individual patients, as it is the case with its application to fighting the COVID-19 pandemic. As a way forward, we suggest an in-depth machine learning application review that would focus on the critical analysis of the novel coronavirus disease and other related cases of global pandemics.

Artificial Intelligence and COVID-19: A Multidisciplinary Approach

Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology

Artificial intelligence and machine learning to fight COVID-19

Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks

Automatic Detection of COVID-19 Using X-ray Images with Deep Convolutional Neural Networks and Machine Learning

Predicting COVID-19 incidence through analysis of google trends data in iran: data mining and deep learning pilot study

Predicting COVID-19 incidence through analysis of google trends data in iran: data mining and deep learning pilot study

Chaos game representation dataset of SARS-CoV-2 genome

Artificial neural networks : fundamentals , computing , design , and application

Artificial intelligence to codify lung CT in Covid-19 patients

Qualitative and Fuzzy Reasoning for identifying non-linear physiological systems: an application to intracellular thiamine kinetics

Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection

Deep learning system to screen coronavirus disease 2019 pneumonia. Applied Intelligence

How good is radiography for COVID-19 detection?

A bibliometric analysis of Covid-19 research activity: A call for increased output

Time series forecasting of covid-19 transmission in canada using lstm networks

Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR

Artificial Intelligence during a pandemic: The COVID-19 example

Followup chest radiographic findings in patients with MERS-CoV after recovery

A tutorial survey of architectures, algorithms, and applications for deep learning

The role of imaging in the detection and management of COVID-19: a review

Pre-and posttreatment chest CT findings: 2019 novel coronavirus (2019-nCoV) pneumonia

Restructured society and environment: A review on potential technological strategies to control the COVID-19 pandemic

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Sensitivity of chest CT for COVID-19: comparison to RT-PCR

Deep learning applications in pulmonary medical imaging: recent updates and insights on COVID-19. Machine Vision and Applications

Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction

Methods for predicting vaccine immunogenicity and reactogenicity

Severe acute respiratory syndrome-related coronavirus-The species and its viruses, a statement of the Coronavirus Study Group

LSTM : A Search Space Odyssey

Clinical characteristics of coronavirus disease 2019 in China

Deep learning approaches to biomedical image segmentation

First case of 2019 novel coronavirus in the United States

Current status of global research on novel coronavirus disease (Covid-19): A bibliometric analysis and knowledge mapping. Hossain MM. Current status of global research on novel coronavirus disease (COVID-19): a bibliometric analysis and knowledge mapping version 1

Weakly supervised deep learning for covid-19 infection detection and classification from ct images

Clinical features of patients infected with 2019 novel coronavirus in Wuhan

Serial quantitative chest ct assessment of covid-19: Deep-learning approach

The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health-The latest 2019 novel coronavirus outbreak in Wuhan

Deep Learning Localization of Pneumonia: 2019 Coronavirus (COVID-19) Outbreak

Report 2: Estimating the potential total number of novel Coronavirus cases in Wuhan City

Chest CT findings in cases from the cruise ship "Diamond Princess

Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning

Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity

LSTM Fully Convolutional Networks for Time Series Classification

Artificial intelligence approach fighting COVID-19 with repurposing drugs

The Xs and Y of immune responses to viral vaccines. The Lancet infectious diseases

Coronavirus (COVID-19) outbreak: what the department of radiology should know

A review of modern technologies for tackling COVID-19 pandemic

Deep learning

False-negative results of real-time reversetranscriptase polymerase chain reaction for severe acute respiratory syndrome coronavirus 2: role of deeplearning-based CT diagnosis and insights from two cases

Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT

From Community Acquired Pneumonia to COVID-19: A Deep Learning Based Method for Quantitative Analysis of COVID-19 on thick-section CT Scans

CT quantification of pneumonia lesions in early days predicts progression to severe illness in a cohort of COVID-19 patients

Deep Learning-Based Channel Prediction for Edge Computing Networks Toward Intelligent Connected Vehicles

The indispensable role of chest CT in the detection of coronavirus disease 2019 (COVID-19)

Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: a machine learning and mathematical model-based analysis

Artificial neural networks: methods and applications

Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based on

Coronavirus disease 2019: a bibliometric analysis and review

Artificial intelligence-enabled rapid diagnosis of patients with COVID-19

Deep learning for IoT big data and streaming analytics: A survey

Deep learning covid-19 features on cxr using limited training data sets

A case-based reasoning framework for early detection and diagnosis of novel coronavirus

Automated detection of COVID-19 cases using deep neural networks with X-ray images

Emergence of new disease-how can artificial intelligence help?

Initial public health response and interim clinical guidance for the 2019 novel coronavirus outbreak-United States

Findings of lung ultrasonography of novel corona virus pneumonia during the 2019-2020 epidemic

A Survey on Deep Learning : Algorithms , Techniques , and Applications. ACM Computing Surveys (CSUR)

Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus

Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study

Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone-based survey when cities and towns are under quarantine

Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions

Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil

Clinical, laboratory and imaging features of COVID-19: A systematic review and meta-analysis. Travel medicine and infectious disease

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling Has. Fifteenth Annual Conference of the International Speech Communication Association

Covid-19 detection using artificial intelligence

AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Classification of COVID-19 patients from chest CT images using multiobjective differential evolution-based convolutional neural networks

Classification of COVID-19 patients from chest CT images using multiobjective differential evolution-based convolutional neural networks

Deep Convolutional Neural Networks based Classification model for COVID-19 Infected Patients using Chest X-ray Images

Smart healthcare support for remote patient monitoring during covid-19 quarantine

Digital technology and COVID-19

Outbreak Trends of Coronavirus Disease-2019 in India: A Prediction. Disaster medicine and public health preparedness

COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches

Predicting the Growth and Trend of COVID-19 Pandemic using Machine Learning and Cloud Computing

Coronaviruses and people with intellectual disability: an exploratory data analysis

COVIDiagnosis-Net: Deep Bayes-SqueezeNet based Diagnostic of the Coronavirus Disease 2019 (COVID-19) from X-Ray Images

Using Machine Learning to Estimate Unobserved COVID-19 Infections in North America

Artificial Intelligence (AI) applications for COVID-19 pandemic

Detection of SARS-CoV-2 in different types of clinical specimens

Temporal changes of CT findings in 90 patients with COVID-19 pneumonia: a longitudinal study

Systematic literature review in computer science-a practical guide

Coronavirus disease 2019: initial detection on chest CT in a retrospective multicenter study of 103 Chinese subjects

Frequency and distribution of chest radiographic findings in COVID-19 positive patients

Deep learning-based multi-view fusion model for screening 2019 novel coronavirus pneumonia: a multicentre study

Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal

Chest CT for typical 2019-nCoV pneumonia: relationship to negative RT-PCR testing

Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2

Deep learning for detecting corona virus disease 2019 (COVID-19) on high-resolution computed tomography: a pilot study

Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions

SARS-CoV-2 and COVID-19: The most important research questions

Deep learning and its applications to machine health monitoring

LSTM network : a deep learning approach for shortterm traffic forecast

Coronavirus disease 2019 (COVID-19): a perspective from China. Radiology

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.