key: cord-0926818-ggk2s8bf authors: Bansal, Agam; Padappayil, Rana Prathap; Garg, Chandan; Singal, Anjali; Gupta, Mohak; Klein, Allan title: Utility of Artificial Intelligence Amidst the COVID 19 Pandemic: A Review date: 2020-08-01 journal: J Med Syst DOI: 10.1007/s10916-020-01617-3 sha: 549bb61a6143cb932df18ec220bd9397d2a08897 doc_id: 926818 cord_uid: ggk2s8bf The term machine learning refers to a collection of tools used for identifying patterns in data. As opposed to traditional methods of pattern identification, machine learning tools relies on artificial intelligence to map out patters from large amounts of data, can self-improve as and when new data becomes available and is quicker in accomplishing these tasks. This review describes various techniques of machine learning that have been used in the past in the prediction, detection and management of infectious diseases, and how these tools are being brought into the battle against COVID-19. In addition, we also discuss their applications in various stages of the pandemic, the advantages, disadvantages and possible pit falls. As the global population grows, even though we are still battling some of the pathogens that have been with us since the advent of known human history such as tuberculosis, we are also witnessing a trend in increasing emergence of novel pathogens from non-human hosts, and this is posing a major threat to the public health [1] . Within the last ten years, we witnessed the emergence of new viruses that could potentially spread across international borders and wreak global havoc, the latest of this being the novel coronavirus . The recent development of machine learning-based tools for healthcare providers allows novel ways to combat such global pandemics. The term machine learning encompasses the collection of tools and techniques for identifying patterns in data [2] . In traditional methods of identifying patterns from data, we approach the system with our presumptions as to which components of the data (age, sex, pre-existing conditions) affect the outcome of interest (patient survival). However, in machine learning, we provide data and the machine identifies trends and patterns, enabling us to formulate a model to predict the outcome of patients. The authors will attempt to provide a narrative review of such tools, how they are useful in healthcare, and how they are being utilized in the prediction, prevention, and management of COVID-19. We will also include discussions on such tools used in past infectious diseases such as the SARS-CoV-1 and MERS-CoV viruses and how they may be translatable to COVID-19. The discussion is structured under the techniques for outbreak detection, prediction models for spread, prevention and vaccine development, early case detection and tracking, prognosis prediction of affected patients, and drug development (Fig. 1 ). Biosurveillance isthescience of early detection and prevention of a disease outbreak in the community [3] . Analytics, machine learning, and natural language processing (NLP) are being increasingly used in biosurveillance [4] . Scanning social media, news reports, and other online data can be used to detect localized disease outbreaks before they even reach the level of pandemics [5] . The Canadian company Blue Dot successfully used machine learning algorithms to detect early outbreaks of COVID-19 in Wuhan, China by the end of December 2019 [6, 7] . Big data analysis of medical records, as well as satellite imaging (eg: cars crowding around a hospital), are some other ways big data analysis has been used in the past to detect localized outbreaks [8, 9] . Google Trends has been used in the past to detect the outbreak of Zika virus infections in populations, using dynamic forecasting models [10] . Sentiment analysis is the technique of using natural language processing in social media to understand the positive and negative emotions of the population [11] . Unsupervised sentiment analysis has been proposed as a method for the early detection of infectious diseases in the population [12] . Also, sentiment analysis could be a valuable tool to understand the public's reactions or overreactions thereof, towards disease outbreaks, and can provide valuable insights to the government in directing efforts towards public education [13] [14] [15] . These techniques of sentinel biosurveillance would help detect pandemics before they become one and can provide valuable time for the health system to prepare for prevention and management. Various statistical, mathematical and dynamic predictive modeling has been used to successfully predict the extent Fig. 1 Machine learning tools, data sources and the interventions that are helpful in different stages of a pandemic. Phase 1: Animal influenza virus has not been known to create diseases in the humans. Phase 2: An animal influenza virus has been known to cause disease in humans and is hence a potential pandemic threat. Phase 3: An animal influenza virus has been known to cause solitary or a cluster of diseases in humans but has not created human-to-human transmission. Phase 4: Human to human transmission, sufficient to maintain community level outbreak has been identified. Phase 5: The virus has created sustained community level outbreak in two countries within the same WHO region. Phase 6: In addition to Phase 5, the virus has created sustained community level outbreak in at least one country in another WHO region. Post Peak Period: Level of pandemic in most countries have dropped below peak levels. Possible New Wave: Level of pandemic in most countries is rising again. Post-Pandemic Period: Level of pandemic has returned to seasonal influenza level in most countries and spread of infectious diseases through the population [16] [17] [18] [19] [20] [21] . As opposed to traditional epidemiological predictive models, big-data-driven models have the added advantage of adaptive learning, trend-based recalibration, flexibility and scope to improve based on a newer understanding of the disease process, as well as estimation of the impact of the interventions, such as social distancing, in curbing its spread [22] . The most common is the Susceptible-Exposed-Infectious-Recovered (SEIR) modeling method which is now being used to predict the areas and extent of COVID-19 spread [23, 24] . These techniques can also be used to determine other parameters of the epidemic, such as under-reporting of cases, the effectiveness of interventions, and the accuracy of testing methods [25, 26] . For example, a modeling algorithm attempted to simulate the conditions in which Ebola could spread in the Chinese society, and the effectiveness of the four levels of governmental interventions was evaluated in such conditions [27] . Similar models have also attempted to predict the outbreak and expansion of the Zika virus in real-time in the Americas and were determined to have close to 85% accuracy in quantitative evaluations [28] . An attempt at validating different machine learning algorithms determined that backward propagation neural network (BPNN) demonstrated highest predictive accuracy is modelling Zika virus transmission [29] . Scientists at the Johns Hopkins University developed a COVID-19 prediction modeling based on a previously published stochastic metapopulation epidemic model [30] . A comparison of the prediction of this modeling with real-life data elucidated the lacunae in the understanding of the virus's dynamics and the model's limitations [31, 32] . However, a predictive model is only as good as the data it is based on, and in the event of a global pandemic, data sharing across communities is of paramount importance. This was one of the major obstacles in learning about and modeling the 2013-2016 Ebola virus outbreak [33] . The World Health Organization (WHO) has proposed a consensus on expedited data sharing on the COVID-19 outbreak to promote intercommunity learning and analytics in this area. Artificial Neural Networks (ANN) were used to predict antigenic regions with a high density of binders (antigenic hotspots) in the viral membrane protein of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) [34, 35] . This information is critical to the development of vaccines. Using machine learning for this purpose allows for rapid scanning of the entire viral proteome, allowing faster and cheaper vaccine development. Reverse vaccinology and machine learning were successfully employed to identify six potential vaccine target proteins in the SARS-CoV-2 proteome [36] . Machine learning has also been used in the past to predict the strains of influenza virus that are more likely to cause infection in a population in an upcoming year, and in turn, should constitute the year's seasonal influenza vaccine. Successful prediction of the future expansion of small subtrees of hemagglutinins (HA) part of the viral antigenic set was possible from training H3N2 and testing on H1N1, using reconstructed timed phylogenetic tree [37] . Machine learning can also be used to predict the hosts of newly discovered viruses based on analysis of nucleoprotein gene sequences and spike gene sequences, and can be a useful additional tool for tracing back viral origins, especially when the data set is large and comparative analysis is difficult or time-consuming [38] . Early case identification, quarantining, and preventing exposure to the communities are crucial pillars in managing an epidemic such as COVID-19. Mobile phone-based surveys can be useful in early identification of cases, especially in quarantined populations [39, 40] . Such methods have shown success in Italy in identifying influenza patients through a web-based survey [41] . As opposed to traditional methods of survey and analysis, the use of artificial intelligence tools can be used to collect and analyze large amounts of data, identify trends, stratify patients based on risk, and propose solutions to population instead of the individual. Digital phenotyping is the novel concept of collecting smartphone-based active (surveys) and passive (text, voice, location, screen use) data to produce an individual phenotype [42, 43] . This technique can be used to obtain multiple data points and allow stratifying individuals based on their risk. The government of India recently launched a mobile application called "Aarogya Setu" which tracks its users' exposure to potentially COVID-19 infected patients, using the Bluetooth functionality to scan the surrounding area for other smartphone users. If a patient is tested positive, then the data from the mobile application can be used to track down every app-user who the patient encountered, within the last 30 days [44] . Such techniques of digital phenotyping can be performed even on entrylevel smartphones and would be especially useful in low and middle-income countries as a cost-effective method of risk stratification, due to the ubiquitous smartphone availability [45] . The close physical and economic proximity with China should have resulted in high morbidity and mortality due to COVID-19 in Taiwan. However, with the help of machine learning, they were able to bring the number of infected patients to far lower than what was initially predicted. They identified the threat early, mobilized their national health insurance database, and customs and immigration database to generate big data for analytics. Machine learning on this big data helped them stratify their population into lower risk or higher risk based on several factors, including travel history. Persons with higher risk were quarantined at home and were tracked through their mobile phones to ensure that they remained in quarantine. This application of big data, in addition to active case finding efforts, ensured that their case numbers were far fewer than what was initially anticipated [46] . Deep learning algorithms have also been used to identify patterns of infectious disease involvement in imaging results such as CT and MRI. With CT scanning showing high correlations to PCR-positive COVID patients, such algorithms have shown great promise in their ability to detect findings consistent with COVID-19 in CT images of patients [47] [48] [49] . Machine learning algorithms have been used previously to predict prognosis in patients affected by the MERS Co-V infection [50] . The patient's age, disease severity on presentation to the healthcare facility, whether the patient was a healthcare worker, and the presence of pre-existing co-morbidities were the four factors that were identified to be the major predictors in the patient's recovery. These findings are consistent with the currently observed trends in the COVID-19 disease [51, 52] . Using the data visualization tool Mirador, a mobile application Ebola CARE (Computational Assignment of Risk Estimates) was developed to predict a patient's outcome after being infected with Ebola [53, 54] . The tool identified 24 clinical and laboratory parameters that possibly affect a patient's prognosis. There is a need for adaptation of these algorithms to assist physicians in their decisionmaking process while managing COVID-19. Recovery prediction tools help determine resource allocation, triage, treatment determination, as well as health system preparedness. Machine learning tools have been used in drug development, drug testing, as well as drug repurposing. They enable us to interpret large gene expression profile data sets to suggest new uses for currently available medications. Deep generation models, also known as AI imagination, can design novel therapeutic agents with possible desired activity [55]. These tools help reduce the cost and time of developing drugs, help in developing novel therapeutic agents, as well as predict possible off-label uses for some therapeutic agents [56] . Bayesian Machine Learning tools have been used to develop drugs against Ebola in in-vitro settings and the findings translated well to in-vivo settings as well [57] . Machine learning provides an exciting array of tools that are flexible enough to allow their deployment in any stage of the pandemic. With the large amount of data that is being generated while studying a disease process, machine learning allows for analysis and rapid identification of patterns that traditional mathematical and statistical tools would take a long time to derive. The flexibility, ability to adapt based on a new understanding of the disease process, self-improvement as and when new data becomes available, and the lack of human prejudice in the approach of analysis makes machine learning a highly versatile novel tool for managing novel infections. However, with such an enhanced ability to derive meaning from large amounts of data, there is a greater demand for higher quality control during the collection, storage, and processing of the data. Besides, standardization of data structures across populations would allow these systems to adapt and learn from data across the globe, which wasn't possible in the past, and is even more important in learning and managing a global pandemic like COVID-19. In addition, with a new pandemic, there tends to be a lot more "noise" in the data, and hence blindly feeding this immature data, which is ridden with outliers into an AI algorithm should always be approached with caution. Conflict of interest The authors declare no conflicts of interest. Ethical approval This article does not contain any studies with human participants or animal performed by any of the authors. Global trends in emerging infectious diseases Machine Learning for Healthcare: On the verge of a major shift in healthcare epidemiology Biosurveillance, biodefense, and biotechnology Machine Learning & NLP -use in BioSurveillance and Public Health practice Using online social networks to track a pandemic: A systematic review COVID-19 and artificial intelligence: protecting healthcare workers and curbing the spread Pneumonia of unknown aetiology in Wuhan, China: potential for international spread via commercial air travel Realtime forecasting of infectious disease dynamics with a stochastic semi-mechanistic model An open challenge to advance probabilistic forecasting for dengue epidemics Dynamic Forecasting of Zika Epidemics Using Google Trends Sentiment analysis of health care tweets: review of the methods used An unsupervised machine learning model for discovering latent infectious diseases using social media data Large-scale machine learning of media outlets for understanding public reactions to nation-wide viral infection outbreaks Rethinking social amplification of risk: Social media and Zika in three languages Understanding the Patterns of Health Information Dissemination on Social Media during the Zika Outbreak Predicting Infectious disease using deep learning and big data A Novel Data-Driven Model for Real-Time Influenza Forecasting Risk assessment of Ebola virus disease spreading in Uganda using a twolayer temporal network Early dynamics of transmission and control of COVID-19: a mathematical modelling study Infectious disease modeling methods as tools for informing response to novel influenza viruses of unknown pandemic potential Using results from infectious disease modeling to improve the response to a potential H7N9 influenza pandemic Interventions to mitigate early spread of SARS-CoV-2 in Singapore: a modelling study Worldwide COVID-19 Outbreak Data Analysis and Prediction Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak Analysis of COVID-19 infection spread in Japan based on stochastic transition model The large scale machine learning in an artificial society: Prediction of the ebola outbreak in beijing A dynamic neural network model for predicting risk of Zika in real time Mapping the transmission risk of Zika virus using machine learning models A decision-support framework to optimize border control for global outbreak mitigation An interactive web-based dashboard to track COVID-19 in real time Developing Global Norms for Sharing Data and Results during Public Health Emergencies Neural models for predicting viral vaccine targets Improved prediction of MHC class I binders/non-binders peptides through artificial neural network using variable learning rate: SARS corona virus, a case study COVID-19 coronavirus vaccine design using reverse vaccinology and machine learning. bioRxiv Predicting the short-term success of human influenza virus variants with machine learning Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences Identification of COVID-19 Can be quicker through artificial intelligence framework using a mobile phone-based survey in the populations when cities/towns are under quarantine Mobile research methods Web-based participatory surveillance of infectious diseases: the Influenzanet participatory surveillance experience Digital phenotyping: Technology for a new science of behavior Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health Mobile health use in low-and highincome countries: an overview of the peer-reviewed literature Response to COVID-19 in Taiwan: Big data analytics, new technology, and proactive testing A deep learning algorithm using CT images to screen for Corona Virus D i s e a s e ( C O V I D -1 9 Correlation of chest CT and RT-PCR Testing in coronavirus disease 2019 (COVID-19) in China: A report of 1014 Cases Rapid ai development cycle for the coronavirus (COVID-19) pandemic: Initial results for automated detection & patient monitoring using deep learning CT image analysis. ArXiv200305037 Cs Main factors influencing recovery in MERS Co-V patients using machine learning Rapid progression to acute respiratory distress syndrome: Review of current understanding of critical illness from COVID-19 Infection Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Transforming clinical data into actionable prognosis models: Machine-learning framework and field-deployable app to predict outcome of ebola patients Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry Machine learning applications in drug development Ebola Virus Bayesian Machine Learning Models Enable New in Vitro Leads Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations