key: cord-0897659-30l210qn authors: Orimoloye, Israel R.; Ololade, Olusola O.; Ekundayo, Olapeju Y.; Busayo, Emmanuel T.; Afuye, Gbenga A.; Kalumba, Ahmed M.; Ekundayo, Temitope C. title: Assessment of global research trends in the application of data science and deep and machine learning to the COVID-19 pandemic date: 2022-01-14 journal: Data Science for COVID-19 DOI: 10.1016/b978-0-323-90769-9.00030-x sha: b3786853815a16e7a96118cc1ef706d04d55f7c5 doc_id: 897659 cord_uid: 30l210qn Researchers around the world have recently used data science and deep and machine learning to assess and combat the coronavirus disease 2019 (COVID-19). The results from this study have presented research on COVID-19 that applied data science, big data, machine learning, deep learning, artificial intelligence, and mathematic and statistical modeling between January 2020 and April 2020 by different researchers across disciplines from different countries of the world. It was noted that the prominent studies used various terms and keywords in COVID-19-related studies include 2019-nCoV, China, COVID-19, epidemic, remdesivir, SARS-CoV-2, coronavirus, epidemiology, infection 2019-nCoV, SARS coronavirus, angiotensin-converting enzyme 2, animal reservoir, cross-species transmission, and human-to-human transmission in COVID-19 studies between January and April 2020. The result reveals the relevance and percentage, as well as the distribution, of data science and modeling techniques used in COVID-19 research before 2020 and during the year 2020 (January to April 2020). More so, author keywords, both keywords, total articles, total citations, and h-index were identified. “Model” has the highest frequency with 19 papers, 11 total citations, and an h-index of 2; on the other hand, it appeared on number one on both keywords with 37 papers, 47 citations, and an h-index of 4. Bibliometrics generally applies Price's law to estimate authors' influence and output in a particular field of study, which can also determine the highest and lowest occurrence of important key terms. The coronavirus disease 2019 (COVID-19) pandemic has changed our daily lives in ways we could not have imagined since December 2019. Scientists and researchers have been employing various techniques including data science (DS), artificial intelligence, big data science, machine learning, deep learning, and other mathematic models to combat the COVID-19 pandemic. Various institutes around the world have turned their skills to help clinicians, researchers, and the medical community in their work on analyzing COVID-19 and its spread as well as the possible solution to its occurrences. Machine learning and other tools can potentially help predict risks around this pandemic. Studies have shown that machine learning research has been conducted and published on COVID-19, and early experiments are promising in terms of using this technique [1e3] . Furthermore, this can focus on how machine learning can be used in related areas and identify how it could help with risk prediction for COVID-19. With the rapid development of innovative technology, programming, which comprises a set of instructions that produce various kinds of outputs, has been widely applied in the medical field, including organ segmentation and image enhancement and repair, providing support for subsequent medical diagnosis [3e7] . Deep learning technologies, such as VGGNet 16, AlexNet, and ResNets with the strong ability of nonlinear modeling, have been utilized extensively in the medical field [8] . Relevant studies have been conducted on the COVID-19 pandemic and pulmonary analysis and disease prognosis globally. Several institutions across the globe are putting efforts together to combat this pandemic. For instance, researchers at the Rensselaer Polytechnic Institute (RPI) are using machine learning to analyze the effects of social distancing, though on a more granular level [9] . Using county data from the New York State Department of Health and Mental Hygiene, RPI researchers have developed machine learning models that can predict local elements of the pandemic. These models show that the projections vary enormously from one city to another. This knowledge could relieve some of the uncertainty that is around in developing policies. To combat and prevent this pandemic, treatment and preventive measures against this virus and gathering knowledge from scholarly and clinical research that are related to COVID-19 are crucial. A good number of clinical and observational information have been published in high-ranked outlets to assess and predict the occurrences of this pandemic. COVID-19-related published studies were retrieved from the scientific database for mapping and research trends on this pandemic using scientometric techniques. A scientometric analysis is a method of investigating published scientific information in a particular field of science through secondary analysis of the knowledge; therefore it can help researchers appreciate the earlier and existing knowledge on a specific area of interest, for example, COVID-19, and this method helps in identifying research hotspot and future research areas [10e13]. Consequently, this study aimed at evaluating global research trends that used DS, big data, machine learning, deep learning, artificial intelligence, and mathematical and statistical modeling to combat or predict COVID-19 pandemic across the globe between January and April 2020. In order to assess the application of DS in the mitigation of COVID-19 pandemic, this study mined COVID-19-related data from the Web of Science (WoS) core collections and Scopus databases on April 18, 2020 (@23:33:22 GMTþ1). First, the two databases were searched using the key term "COVID-19 OR coronavir* OR SARS-CoV-2" and setting the time span to >2019. The title field was consulted in both databases. Second, a within-result search was conducted with "model* OR machine learn* OR deep learn*". The resultant COVID-19 dataset related to DS from WoS and Scopus was downloaded as tab-delimited (Win, UTF-8) and comma-separated formats for analysis. The data was analyzed using a python programming environment based on ScientoPy package [14] . The analysis was done as described by Ruiz-Rosero et al. [14] . The analysis involved a preprocessing step in which authors' names were normalized by removing dots, comma, and special accents from the names. The duplicated dataset from the databases was de-duplicated. In all cases of duplicated publications, the WoS version was kept while the Scopus version was removed from the study's database. Countries' names were also standardized for countries with at least two naming structures, such as the Republic of China: China, USA: United States; England, Scotland, and Wales: The United Kingdom; U Arab Emirates: United Arab Emirates; Russia: Russian Federation; Viet Nam: Vietnam; and Trinid & Tobago: Trinidad and Tobago. The processed database was analyzed to answer the underlisted questions following the procedural steps outlined by Ruiz-Rosero et al. [14] : What are the most cited titles related to the application of DS and modeling techniques to COVID-19 pandemic? Where is the knowledge of DS and modeling techniques applied to COVID-19 in terms of countries and institutions? What are the high-frequency keywords or topics related to the application of DS and modeling techniques to COVID-19 pandemic? Who are the productive authors in the application of DS and modeling techniques to COVID-19 pandemic? What are the distributions of document types related to the application of DS and modeling techniques to COVID-19? What journal titles are the most productive sources related to the application of DS and modeling techniques to COVID-19? How are different disciplines involved in the application of DS and modeling techniques to COVID-19? What is the distribution of DS and modeling key terms in COVID-19 papers? Only three performance indicators, namely, the number of papers, total citations, and h-index, were considered throughout. All analysis is inherently based on Eq. (27.1), as focal topics in a domain do have higher focal mean growth rate [14] . where AGR ¼ average growth rate; 2020 j1 ¼ start month; 2020 k4 ¼ end month, April; and DS i ¼ total count of DS COVID-19-related documents in January 2020 ( Hundreds of research teams around the world are combining their efforts to collect data, analyze, and develop solutions for the COVID-19 pandemic which include identifying where and who is most at risk, diagnosing patients, developing drugs, predicting the spread of the disease, mapping where viruses come from, and understanding the virus better. The results from this study have presented various tools and techniques that have been employed in COVID-19 research between January 2020 and April 2020 by different researchers across disciplines from different countries of the world. The empirical evidence presented in this study shows that the utmost care must be taken in interpreting the result in a comparative evaluation of global research on COVID-19. A total of 82 (417 citations, 6 h-index), 16 (71 citations, 4 h-index), and 1 (no citation, no index) research articles, reviews, and book chapters, respectively, were analyzed in this study. The information in While Italy, India, South Korea, Germany, and Belgium with about 6 (10 citations), 5 (3 citations), 5 (0 citations), 4 (4 citations), and 3 (6 citations) articles stand at sixth, seventh, eighth, ninth, and tenth ranks, respectively. Instructional relevance information in Table 27 .3 reveals that Chinese Acad Sci (4 articles, 278 citations) in China; Hong Kong Polytech Univ in China (3 articles, 23 citations); Univ Hong Kong (3 articles, 23 citations) in China; Univ Oxford United Kingdom (3 articles, 12 citations); Wuhan Univ, China (3 articles, 11 citations); Capital Med Univ, China (2 articles, 0 citation); Chinese Univ Hong Kong, China (2 articles, 23 citations); Georgia State Univ, United States (2 articles, 5 citations); Harbin Engn Univ, China (2 articles, 0 citations); and Hokkaido Univ, Japan (2 articles, 4 citations) ranked first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth, respectively. This study also shows that the top productive nations and institutions are the countries that are most affected by COVID-19 pandemic between January and April 2020. High-frequency author keywords and keyword plus related to the application of DS and modeling techniques on COVID-19-related research during the period of study are presented in Table 27 .4. It was noted that "COVID-19" and "coronavirus" ranked first learning, artificial intelligence, and mathematic and statistical modeling between January and April 2020. The result from the study reveals the top journals or sources with the most published research articles on COVID-19-related studies. These relevant sources cover a range of subjects in their respective articles. From Table 27 .6, it was derived that quite a diversity with respect to the scholarly fields of health and mathematic biosciences, as well as engineering and surveillance. The diversity of outlets is also evident in the results for disciplines, as can be seen in Table 27 Table 27 .6. Discoveries from this study reveal COVID-19 research is more preliminary. Achieving significant insights will require a mix of domain expertise from multiple fields, and there is already a push for better international collaboration and tracking of COVID-19 [17] . For instance, the use of big data, machine learning, deep learning, artificial intelligence, and mathematic and statistical modeling might yield a superficially practical solution but could be ineffective without the involvement of (international) medical and biotechnology expert interpretations. This would also have implications for emerging innovations (as it is unlikely that healthcare practitioners would interact with technologies built without medical expertise). So it is necessary to quickly put together cohorts with complementary expertise. This also brings many challenges, such as ensuring a team is consistent in interpreting things such as ethics, benefits, and risks. The distribution of the top nine disciplines in the application of DS and modeling techniques to COVID-19 studies is listed in Table 27 .7: General and internal medicine is ranked number 1 with 11 articles, 297 total citations, and an h-index of 4, followed by infectious diseases, with 9 articles, 28 total citations, and an h-index of 3. Academic outputs using DS and modeling techniques to COVID-19 focused majorly on health science subject areas, especially the areas concerned with restoring and maintaining human health through the treatment of disease and injury. These disciplines fundamentally address how knowledge is applied to the prevention and cure of diseases in the body systems. The contribution of the top subjects that applied DS and modeling to COVID-19 was expanded in Fig. 27 .2. The analysis shows demography, which is the science of populations as the only subject area focusing on social science; this implies that researchers were focusing on DS application to COVID-19 more from the perspectives of health sciences and natural sciences. Other domain-indicated subjects including public, environmental, and occupational health; virology; research and experimental medicine; and life sciences also published numerous papers using DS because of their sensitivity in these fields to COVID-19, and its major impacts thereof mainly happened in these subject areas. It is essential at this time for health professional researchers to work with social scientists to inform effective policies [17] . The synergy of various research areas would help in addressing salient issues on the outbreak of COVID-19 and perhaps to understand future events of this manner. There is a shortage of studies on subject areas related to human behavioral space, and this should be very crucial in understanding person-to-person transmission of this virus, hence researchers within subject areas related to human behavioral space should do more in identifying and understanding any possible barriers in flattening the curve of the pandemic [18e20]. Future research should address potential variation in social issues very critically, hence the ripple effects of the virus outbreak are basically on the social life of the populace, which has led to the popular preventive measure of COVID-19 tagged "social distancing" i.e., avoiding social meetings, self-isolating, etc. [21] . The outbreak of the virus and its future implications require special attention, which should provide a paradigm shift in different works of life, especially human movements across various territories [22] . Keywords or key terms reflect the authors' aims and objectives and summarize the key intentions and interests of a paper; therefore the distribution of DS and modeling key terms in COVID-19-related papers is key to investigate hot topics and research erudition related to this subject area (Fig. 27.3 ). The information in Fig. 27 .3 reveals the number and percentage, as well as the distribution, of DS and modeling techniques to COVID-19 before 2020 and during 2020 (January to April 2020). In Table 27 .8, the top nine author keywords, both keywords, total articles, total citations, and h-index were summarized. "Model" has the highest frequency with 19 papers, 11 total citations, and an h-index of 2; on the other hand, it appeared number one on both keywords with 37 papers, 47 citations, and an h-index of 4. Bibliometrics generally applies Price's law to estimate authors' influence and output in a particular field of study, which can also determine the highest and lowest occurrence of important key terms [23] . Table 27 .8 shows that "Model" is a taxonomy of keywords with the highest occurrence; this implies that modeling has become increasingly significant in DS application to COVID 19 studies. For example, a "stochastic transmission model," adopted to COVID-19, was used to quantify the possible efficiency of tracing contact and isolation of infected people to control a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-like pathogen [24] . In another instance, validated phenomenological models that have been applied in past outbreaks were used to generate and assess short-term forecasts of the cumulative number of confirmed reported cases in the Hubei province, the epicenter of the pandemic, and for the overall cases in China, not including the province of Hubei, this model was hinged on three other models, which are generalized logistic model, the Richards growth model, and a subepidemic wave model [25] . Modeling is the act of creating physical, philosophical, and conceptual structures and the relationship between data elements [26] . This creates an avenue to establish goals and provide information for end-users. Modeling in the case of COVID-19 requires sufficient historical data. However, we should be cognizant that history often repeats itself just the same way it has happened in the past. Hence all hands must be on deck in modeling future events of similar pandemic as it relates to COVID-19. Answering the questions related to the global impact of the novel coronavirus (COVID-19) requires accurate analysis as well as modeling recorded cases, deaths, and number of recoveries [27, 28] . This computer program focuses on accessing data and automating processes based on inputs without explicit programming. The efficacy of machine learning cannot be overemphasized in DS application to COVID-19 based on its ability to access data and apply it accordingly. Rao and Vazquez [29] proposed to use machine learning algorithms to improve possible case identifications of COVID-19 quicker with the use of a mobile phone-based web survey in order to flatten the curve of the virus in vulnerable populations. The simulation of human intelligence in programmed machines is the thrust of artificial intelligence. This is also applied to machines that solve problems and show traits associated with human actions. Artificial intelligence with the use of deep learning technology has shown great success, with the potential to accurately detect COVID-19 and distinguish it from communal acquired pneumonia and other lung infections [30] . According to Table 27 .8, the other hot keywords in the application of DS to COVID-19 also include mathematical, deep learning, regression, active learning, computational and statistical models, etc. In general, mathematic and computational statistical models are one of the approaches that are vital in data analytics to navigate the nebulous nature of big data. Hence, this area has become a forte of many researchers. Regional limitation of researchers is not a problem with the use of DS in COVID-19 studies; so far the required data is available, and studies can be conducted anywhere across the world. This study shows the significant focus of researchers on the application of DS to address COVID-19 outbreak, and studies recorded in the analysis majorly focused on health science subject areas. Country and institutional participation in the application of DS and modeling techniques to COVID research was identified and presented in the study. The results from the study reveal that China, the United States, the United Kingdom, Australia, and Canada ranked first, second, third, fourth, and fifth, respectively. While Italy, India, South Korea, Germany, and Belgium stand at sixth, seventh, eighth, ninth, and tenth, respectively. This ranking was based on the number of published articles, the number of citations, and h-index during the survey period. The key terms show the classification of the major approach used in addressing COVID-19 in the area of DS application, which was found mainly in nine categories such as model, machine learning, artificial intelligence, mathematic, deep learning, regression, computational, active learning, and statistical model. "Model" is the major keyword found in research published in the year 2020 applying DS to COVID-19; it appears to be the most promising approach for developing physical, theoretical, or computer-based simplifications to address COVID-19. Artificial intelligence and machine learning to fight COVID-19 Coronavirus (Covid-19) Classification Using CT Images by Machine Learning Methods, arXiv Deep learning system to screen coronavirus disease 2019 pneumonia Covid-Resnet: A Deep Learning Framework for Screening of Covid19 From Radiographs A Novel AI-Enabled Framework to Diagnose Coronavirus Covid 19 Using Smartphone Embedded Sensors: Design Study, arXiv Automatic Detection of Coronavirus Disease (Covid-19) Using X-Ray Images and Deep Convolutional Neural Networks, arXiv Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for Covid-19 Transfer learning and fusion model for classification of epileptic PET images Xtelligent Healthcare Media Bibliometric analysis of global environmental assessment research in a 20-year period Past, current and future of biomass energy research: a bibliometric analysis A global bibliometric analysis of Plesiomonas-related research Global trends assessment of environmental health degradation studies from Software survey: ScientoPy, a scientometric tool for topics trend analysis in scientific publications Bibliometric keyword analysis across seventeen years (2000e2016) of intelligence articles Potential implications of gold-mining activities on some environmental components: a global assessment Functional fear predicts public health compliance in the COVID-19 pandemic The effectiveness of moral messages on public health behavioral intentions during the COVID-19 pandemic Multidisciplinary research priorities for the COVID-19 pandemic: a call for action for mental health science The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study Does Social Distancing Matter? AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data Collaboration in an invisible college Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Real-time forecasts of the COVID-19 epidemic in China from Macro-BIM adoption: conceptual structures Can we contain the COVID-19 outbreak with the same measures as for SARS? How simulation modelling can help reduce the impact of COVID-19 Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phoneebased survey when cities and towns are under quarantine Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT