key: cord-154587-qbmm5st9
authors: Nguyen, Thanh Thi
title: Artificial Intelligence in the Battle against Coronavirus (COVID-19): A Survey and Future Research Directions
date: 2020-07-30
journal: nan
DOI: nan
sha: 
doc_id: 154587
cord_uid: qbmm5st9

Artificial intelligence (AI) has been applied widely in our daily lives in a variety of ways with numerous successful stories. AI has also contributed to dealing with the coronavirus disease (COVID-19) pandemic, which has been happening around the globe. This paper presents a survey of AI methods being used in various applications in the fight against the COVID-19 outbreak and outlines the crucial roles of AI research in this unprecedented battle. We touch on a number of areas where AI plays as an essential component, from medical image processing, data analytics, text mining and natural language processing, the Internet of Things, to computational biology and medicine. A summary of COVID-19 related data sources that are available for research purposes is also presented. Research directions on exploring the potentials of AI and enhancing its capabilities and power in the battle are thoroughly discussed. We highlight 13 groups of problems related to the COVID-19 pandemic and point out promising AI methods and tools that can be used to solve those problems. It is envisaged that this study will provide AI researchers and the wider community an overview of the current status of AI applications and motivate researchers in harnessing AI potentials in the fight against COVID-19.

T HE novel coronavirus disease (COVID-19) has created tremendous chaos around the world, affecting people's lives and causing a large number of deaths. Since the first cases were detected, the disease has spread to almost every country, causing deaths of over 580,000 people among nearly 13,379,000 confirmed cases based on statistics of the World Health Organization in the middle of July 2020 [1] . Governments of many countries have proposed intervention policies to mitigate the impacts of the COVID-19 pandemic. Science and technology have contributed significantly to the implementations of these policies during this unprecedented and chaotic time. For example, robots are used in hospitals to deliver food and medicine to coronavirus patients or drones are used to disinfect streets and public spaces. Many medical researchers are rushing to investigate drugs and medicines to treat infected patients whilst others are attempting to develop vaccines to prevent the virus. Computer science researchers on the other hand have managed to early detect infectious patients using techniques that can process and understand medical imaging data such as X-ray images and computed tomography (CT) scans. These computational techniques are part of artificial intelligence (AI), which has been applied successfully in various fields. This paper focuses on the roles of AI technologies in the battle against the COVID-19 pandemic. We provide a comprehensive survey of AI T. T. Nguyen is with the School of Information Technology, Deakin University, Victoria, 3216, Australia. E-mail: thanh.nguyen@deakin.edu.au. applications that support humans to reduce and suppress the substantial impacts of the outbreak. Recent advances in AI have contributed significantly to improving humans' lives and thus there is a strong belief that proper AI research plans will fully exploit the power of AI in helping humans to defeat this challenging battle. We discuss about these possible plans and highlight AI research areas that could bring great benefits and contributions to overcome the battle. In addition, we present a summary of COVID-19 related data sources to facilitate future studies using AI methods to deal with the pandemic.

An overview of common AI methods is presented in Fig. 1 where recent AI development is highlighted. Machine learning, especially deep learning, has made great advances and substantial progress in long-standing fields such as computer vision, natural language processing (NLP), speech recognition, and video games. A significant advantage of deep learning over traditional machine learning techniques is its ability to deal with and make sense of different types of data, especially big and unstructured data, e.g. text, image, video and audio data. A number of industries, e.g. electronics, automotive, security, retail, agriculture, healthcare and medical research, have achieved better outcomes and benefits by using deep learning and AI methods. It is thus expected that AI technologies can contribute to the fight against the COVID-19 pandemic, such as those surveyed in the next section.

We separate surveyed papers into different groups that include: deep learning algorithms for medical image processing, data science methods for pandemic modelling, AI and the Internet of Things (IoT), AI for text mining and NLP, and AI in computational biology and medicine.

While radiologists and clinical doctors can learn to detect COVID-19 cases based on chest CT examinations, their tasks are manual and time consuming, especially when required to examine a lot of cases. Bai et al. [7] convenes three Chinese and four United States radiologists to differentiate COVID-19 from other viral pneumonia based on chest CT images obtained from a cohort of 424 cases, in which 205 cases are from the United States with non-COVID-19 pneumonia whilst 219 cases are from China positive with COVID-19. Results obtained show that radiologists can achieve high specificity (which refers to the proportion of actual positives that are correctly identified as such) in distinguishing COVID-19 from other causes of viral pneumonia using chest CT imaging data. However, their performance in terms of sensitivity (which Fig. 1 . An overview of common AI methods where machine learning constitutes a great proportion. The development of deep learning, a subset of machine learning, has contributed significantly to improving the power and capabilities of recent AI applications. A number of deep learning-based convolutional neural network (CNN) architectures, e.g. LeNet [2] , AlexNet [3] , GoogLeNet [4] , Visual Geometry Group (VGG) Net [5] and ResNet [6] , have been proposed and applied successfully in different areas, especially in the computer vision domain. Other techniques such as autoencoders and recurrent neural networks are crucial components of many prominent natural language processing tools. The deep learning methods in particular and AI in general may thus be employed to create useful applications to deal with various aspects of the COVID-19 pandemic.

refers to the proportion of actual negatives that are correctly identified as such) is just moderate for the same task. AI methods, especially deep learning, have been used to process and analyse medical imaging data to support radiologists and doctors to improve diagnosis performance. Likewise, the current COVID-19 pandemic has witnessed a number of studies focusing on automatic detection of COVID-19 using deep learning systems.

A three-dimensional deep learning method, namely COVID-19 detection neural network (COVNet), is introduced in [8] to detect COVID-19 based on volumetric chest CT images. Three kinds of CT images, including COVID-19, community acquired pneumonia (CAP) and other non-pneumonia cases, are mixed to test the robustness of the proposed model, which is illustrated in Fig. 2 . These images are collected from 6 hospitals in China and the detection method is evaluated by the area under the receiver operating characteristic curve (AUC). COVNet is a convolutional ResNet-50 model [6] that takes a series of CT slices as inputs and predicts the class labels of the CT images via its outputs. The AUC value obtained is at 0.96, which shows a great ability of the proposed model for detecting COVID-19 cases.

Another deep learning method based on the concatenation between the location-attention mechanism and the threedimensional CNN ResNet-18 network [6] is proposed in [9] to detect coronavirus cases using pulmonary CT images. Distinct manifestations of CT images of COVID-19 found in previous [8] for COVID-19 detection using CT images. Max pooling operation is used to combine features extracted by ResNet-50 CNNs whose inputs are CT slices. The combined features are fed into a fully connected layer to compute probabilities for three classes, i.e. non-pneumonia, community acquired pneumonia (CAP) and COVID-19. Predicted class is the one that has highest probability among the three classes.

studies [10] , [11] and their differences with those of other types of viral pneumonia such as influenza-A are exploited through the proposed deep learning system. A dataset comprising CT images of COVID-19 cases, influenza-A viral pneumonia patients and healthy cases is used to validate the performance of the proposed method. The method's overall accuracy of approximately 86% is obtained on this dataset, which exhibits its ability to help clinical doctors to early screen COVID-19 patients using chest CT images.

In line with the studies described above, we have found a number of papers also applying deep learning for COVID-19 diagnosis using radiology images. They are summarized in Table I for comparisons.

A modified stacked autoencoder deep learning model is used in [26] to forecast in real-time the COVID-19 confirmed cases across China. This modified autoencoder network includes four layers, i.e. input, first latent layer, second latent layer and output layer, with the number of nodes correspondingly is 8, 32, 4 and 1. A series of 8 data points (8 days) are used as inputs of the network. The latent variables obtained from the second latent layer of the autoencoder model are processed by the singular value decomposition method before being fed into clustering algorithms in order to group the cases into provinces or cities to investigate the transmission dynamics of the pandemic. The resultant errors of the model are low, which give confidence that it can be applied to forecast accurately the transmission dynamics of the virus as a helpful tool for public health planning and policy-making.

On the other hand, a prototype of an AI-based system, namely α-Satellite, is proposed in [27] to assess the infectious risk of a given geographical area at community levels. The system collects various types of large-scale and real-time data from heterogeneous sources, such as number of cases and deaths, demographic data, traffic density and social media data, e.g., Reddit posts. The social media data available for a given area may be limited so that they are enriched by the conditional generative adversarial networks (GANs) [28] to learn the public awareness of COVID-19. A heterogeneous graph autoencoder model is then devised to aggregate information from neighbourhood areas of the given area in order to estimate its risk indexes. This risk information enables residents to select appropriate actions to prevent them from the virus infection with minimum disruptions in their daily lives. It is also useful for authorities to implement appropriate mitigation strategies to combat the fast evolving pandemic.

Chang et al. [29] modify a discrete-time and stochastic agent-based model, namely ACEMod (Australian Censusbased Epidemic Model), previously used for influenza pandemic simulation [30] , [31] , for modelling the COVID-19 pandemic across Australia over time. Each agent exemplifies an individual characterized by a number of attributes such as age, occupation, gender, susceptibility and immunity to diseases and contact rates. The ACEMod is calibrated to model specifics of the COVID-19 pandemic based on key disease transmission parameters. Several intervention strategies including social distancing, school closures, travel bans, and case isolation are then evaluated using this tuned model. Results obtained from the experiments show that a combination of several strategies is needed to mitigate and suppress the COVID-19 pandemic. The best approach suggested by the model is to combine international arrival restrictions, case isolation and social distancing in at least 13 weeks with the compliance level of 80% or above.

A framework for COVID-19 detection using data obtained from smartphones' onboard sensors such as cameras, microphones, temperature and inertial sensors is proposed in [32] . Machine learning methods are employed for learning and acquiring knowledge about the disease symptoms based on the collected data. This offers a low-cost and quick approach to coronavirus detection compared to medical Kits or CT scan methods. This is arguably plausible because data obtained from the smartphones' sensors have been utilized effectively in different individual applications and the proposed approach integrates these applications together in a unique framework. For instance, data obtained from the temperature-fingerprint sensor can be used for fever level prediction [33] . Images and videos taken by smartphones' camera or data collected by the onboard inertial sensors can be used for human fatigue detection [34] , [35] . Likewise, Story et al. [36] use smartphone's videos for nausea prediction whilst Lawanont et al. [37] use camera images and inertial sensors' measurements for neck posture monitoring and human headache level prediction. Alternatively, audio data obtained from smartphone's microphone are used for cough type detection in [38] , [39] .

An approach to collecting individuals' basic travel history and their common manifestations using a phone-based online survey is proposed in [40] . These data are valuable for machine learning algorithms to learn and predict the infection risk of each individual, thus help to early identify high-risk cases for quarantine purpose. This contributes to reducing the spread of the virus to the susceptible populations. In another work, Allam and Jones [41] suggest the use of AI and data sharing standardization protocols for better global understanding and management of urban health during the COVID-19 pandemic. For example, added benefits can be obtained if AI is integrated with thermal cameras, which might have been installed in many smart cities, for early detection of the outbreak. AI methods can also demonstrate their great effectiveness in supporting managers to make better decisions for virus containment when loads of urban health data are collected by data sharing across and between smart cities using the proposed protocols.

A hybrid AI model for COVID-19 infection rate forecasting is proposed in [42] , which combines the epidemic susceptible infected (SI) model, NLP and deep learning tools. The SI model and its extension, i.e. susceptible infected recovered (SIR), are traditional epidemic models for modelling and predicting the development of infectious diseases where S represents the number of susceptible people, I denotes the number of infected people and R specifies the recovered cases. Using differential equations to characterize the relationship between I, S and R, these models have been used to predict successfully SARS and Ebola infected cases, as reported in [43] and [44] respectively. NLP is employed to extract semantic features from related news such as epidemic control measures of governments or residents' disease prevention awareness. These features are then served as inputs to the long short-term memory (LSTM) deep learning model [45] to revise the infection rate predictions of the SI model (detailed in Fig. 3 ). Epidemic data of Wuhan, Beijing, Shanghai and the whole China are used for experiments, which demonstrate the great accuracy of the proposed hybrid model. It can be applied to predict the COVID-19 transmission law and development trend, and thus useful for establishing prevention and control measures for future pandemics. That study also shows the importance of public awareness of governmental epidemic prevention policies and the significant role of transparency and openness of epidemic reports and news in containing the development of infectious diseases. Fig. 3 . An AI-based approach to COVID-19 prediction that combines traditional epidemic SI model, NLP and machine learning tools as introduced in [42] . A pre-trained NLP model is used to extract NLP features from text data, i.e. the pandemic news, reports, prevention and control measures. These features are integrated with infection rate features obtained from the SI model via multilayer perceptron (MLP) networks before being fed into LSTM model for COVID-19 case modelling and prediction.

In another work, Lopez et al. [46] recommend the use of network analysis techniques as well as NLP and text mining to analyse a multilanguage Twitter dataset to understand changing policies and common responses to the COVID-19 outbreak across time and countries. Since the beginning of the pandemic, governments of many countries have tried to implement policies to mitigate the spread of the virus. Responses of people to the pandemic and to the governmental policies may be collected from social media platforms such as Twitter. Much of information and misinformation is posted through these platforms. When stricter policies such as social distancing and country lockdowns are applied, people's lives are changed considerably and part of that can be observed and captured via people's reflections on social media platforms as well. Analysis results of these data can be helpful for governmental decision makers to mitigate the impacts of the current pandemic and prepare better policies for possible future pandemics.

Likewise, three machine learning methods including support vector machine (SVM), naive Bayes and random forest are used in [47] to classify 3,000 COVID-19 related posts collected from Sina Weibo, which is the Chinese equivalent of Twitter, into seven types of situational information. Identifying situational information is important for authorities because it helps them to predict its propagation scale, sense the mood of the public and understand the situation during the crisis. This contributes to creating proper response strategies throughout the COVID-19 pandemic.

Being able to predict structures of a protein will help understand its functions. Google DeepMind is using the latest version of their protein structure prediction system, namely AlphaFold [48] , to predict structures of several proteins associated with COVID-19 based on their corresponding amino acid sequences. They have released the predicted structures in [49] , but these structures still need to be experimentally verified. Nevertheless, it is expected that these predictions will help understand how the coronavirus functions and potentially lead to future development of therapeutics against COVID-19.

An AI-based generative chemistry approach to design novel molecules that can inhibit COVID-19 is proposed in [50] . Several generative machine learning models, e.g. generative autoencoders, GANs, genetic algorithms and language models, are used to exploit molecular representations to generate structures, which are then optimized using reinforcement learning methods. This is an ongoing work as the authors are synthesising and testing the obtained molecules. However, it is a promising approach because these AI methods can exploit the large drug-like chemical space and automatically extract useful information from high-dimensional data. It is thus able to construct molecules without manually designing features and learning the relationships between molecular structures and their pharmacological properties. The proposed approach is cost-effective and time-efficient and has a great potential to generate novel drug compounds in the COVID-19 fight.

On the other hand, Randhawa et al. [51] aim to predict the taxonomy of COVID-19 based on an alignment-free machine learning method [52] using genomic signatures and a decision tree approach. The alignment-free method is a computationally inexpensive approach that can give rapid taxonomic classification of novel pathogens by processing only raw DNA sequence data. By analysing over 5000 unique viral sequences, the authors are able to confirm the taxonomy of COVID-19 as belonging to the subgenus Sarbecovirus of the genus Betacoronavirus, as previously found in [53] . The proposed method also provides quantitative evidence that supports a hypothesis about a bat origin for COVID-19 as indicated in [53] , [54] . Recently, Nguyen et al. [55] propose the use of AI-based clustering methods and more than 300 genome sequences to search for the origin of the COVID-19 virus. Numerous clustering experiments are performed on datasets that combine sequences of the COVID-19 virus and those of reference viruses of various types. Results obtained show that COVID-19 virus genomes consistently form a cluster with those of bat and pangolin coronaviruses. That provides quantitative evidences to support the hypotheses that bats and pangolins may have served as the hosts for the COVID-19 virus. Their findings also suggest that bats are the more probable origin of the virus than pangolins. AI methods thus have demonstrated their capabilities and power for mining big biological datasets in an efficient and intelligent manner, which contributes to the progress of finding vaccines, therapeutics or medicines for COVID-19.

This section summarises available data sources relevant to COVID-19, ranging from numerical data of infection cases, radiology images [56] , Twitter, text, natural language to biological sequence data (Table II) , and highlights potential AI methods for modelling different types of data. The data are helpful for research purposes to exploit the capabilities and power of AI technologies in the battle against COVID-19. Different data types have different characteristics and thus require different AI methods to handle. For example, numerical time series data of infection cases can be dealt with by traditional machine learning methods such as naive Bayes, logistic regression, k-nearest neighbors (KNN), SVM, MLP, fuzzy logic system [57] , nonparametric Gaussian process [58] , decision tree, random forest, and ensemble learning algorithms [59] . Deep learning recurrent neural networks such as LSTM [45] can be used for regression prediction problems if a large amount of training data are available. The deeper the models, the more data are needed to enable the models to learn effectively from data. Based on their ability to characterize temporal dynamic behaviours, recurrent networks are well suited for modelling infection case time series data.

Radiology images such as chest X-ray and CT scans are high-dimensional data that require processing capabilities of deep learning methods in which CNN-based models are common and most suitable (e.g. LeNet [2] , AlexNet [3] , GoogLeNet [4] , VGG Net [5] and ResNet [6] ). CNNs were inspired by biological processes of visual cortex of human and animal brains where each cortical neuron is activated within its receptive field when stimulated. A receptive field of a neuron covers a specific subarea of the visual field and thus the entire visual field can be captured by a partial overlap of receptive fields. A CNN consists of multiple layers where each neuron of a subsequent (higher) layer connects to a subset of neurons in the previous (lower) layer. This allows the receptive field of a neuron of a higher layer covers a larger portion of images compared to that of a lower layer. The higher layer is able to learn more abstract features of images than the lower layer by taking into account the spatial relationships between different receptive fields. This use of receptive fields enables CNNs to recognize visual patterns and capture features from images without prior knowledge or making hand-crafted features as in traditional machine learning approaches. This principle is applied to different CNN architectures although they may differ in the number of layers, number of neurons in each layer, the use of activation and loss functions as well as regularization and learning algorithms [60] . Transfer learning methods can be used to customize CNN models, which have been pretrained on large medical image datasets, for the COVID-19 diagnosis problem. This would avoid training a CNN from scratch and thus reduce training time and the need for COVID-19 radiology images, which may not be sufficiently available in the early stage of the pandemic.

Alternatively, unstructured natural language data need text mining tools, e.g. Natural Language ToolKit (NLTK) [61] , and advanced NLP and natural language generation (NLG) 

Data Type Links Johns Hopkins University [78] Web-based mapping global cases https://systems.jhu.edu/research/public-health/ncov/ C. R. Wells's GitHub [79] Daily incidence data and airport connectivity from China [68] , Text-to-Text Transfer Transformer (T5) [69] , Binary-Partitioning Transformer (BPT) [70] and OpenAIs Generative Pretrained Transformer 2 (GPT-2) [71] .

The core components of these tools are deep learning and transfer learning methods. For example, ELMo and ULM-FiT are built using LSTM-based language models while Transformer utilizes an encoder-decoder structure. Likewise, BERT and ERNIE use multi-layer Transformer as a basic encoder while XLNet is a generalized autoregressive pretraining method inherited from Transformer-XL. Transformer also serves as a basic model for T5, BPT and GPT-2. These are excellent tools for many NLP and NLG tasks to handle text and natural language data related to COVID-19.

Analysing biological sequence data such as viral genomic and proteomic sequences requires either traditional machine learning or advanced deep learning or a combination of both depending on problems being addressed and data pipelines used. As an example, traditional clustering methods, e.g. hierarchical clustering and density-based spatial clustering of applications with noise (DBSCAN) [72] , can be employed to find the virus origin using genomic sequences [55] . Alternatively, a fuzzy logic system can be used to predict protein secondary structures based on quantitative properties of amino acids, which are used to encode the twenty common amino acids [73] . A combination between principal component analysis and lasso (least absolute shrinkage and selection operator) can be used as a supervised approach for analysing single-nucleotide polymorphism genetic variation data [74] .

Advances in deep learning may be utilized for protein structure prediction using protein amino acid sequences as in [48] , [75] . An overview on the use of various types of machine learning and deep learning methods for analysing genetic and genomic data can be referred to [76] , [77] . Typical applications may include, for example, recognizing the locations of transcription start sites, identifying splice sites, promoters, enhancers, or positioned nucleosomes in a genome sequence, analysing gene expression data for finding disease biomarkers, assigning functional annotations to genes, predicting the expression of a gene [76] , identifying splicing junction at the DNA level, predicting the sequence specificities of DNA-and RNA-binding proteins, modelling structural features of RNA-binding protein targets, predicting DNA-protein binding, or annotating the pathogenicity of genetic variants [77] . These applications can be utilized for analysing genomic and genetic data of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the highly pathogenic virus that has caused the global COVID-19 pandemic.

The COVID-19 pandemic has considerably affected lives of people around the globe and the number of deaths related to the disease keeps increasing worldwide. While AI technologies have penetrated into our daily lives with many successes, they have also contributed to helping humans in the tough fight against COVID-19. This paper has presented a survey of AI applications so far in the literature relevant to the COVID-19 crisis's responses and control strategies. These applications range from medical diagnosis based on chest radiology images, virus transmission modelling and forecasting based on number of cases time series and IoT data, text mining and NLP to capture the public awareness of virus prevention measures, to biological data analysis for drug discovery. Although various studies have been published, we observe that there are still relatively limited applications and contributions of AI in this battle. This is partly due to the limited availability of data about COVID-19 whilst AI methods normally require large amounts of data for computational models to learn and acquire knowledge. However, we expect that the number of AI studies related to COVID-19 will increase significantly in the months to come when more COVID-19 data such as medical images and biological sequences are available. Current available datasets as summarized in Table II are stored in various formats and standards that would hinder the development of COVID-19 related AI research. A future work on creating, hosting and benchmarking COVID-19 related datasets is essential because it will help to accelerate discoveries useful for tackling the disease. Repositories for this goal should be created following standardized protocols and allow researchers and scientists across the world to contribute to and utilize them freely for research purposes.

Among the published works, the use of deep learning techniques for COVID-19 diagnosis based on radiology imaging data appears to be dominant. As summarized in Table  1 , numerous studies have used various deep learning methods, applying on different datasets and utilizing a number of evaluation criteria. This creates an immediate concern about difficulties when utilizing these approaches to the realworld clinical practice. Accordingly, there is a demand for a future work on developing a benchmark framework to validate and compare the existing methods. This framework should facilitate the same computing hardware infrastructure, (universal) datasets covering same patient cohorts, same data pre-processing procedures and evaluation criteria across AI methods being evaluated.

Furthermore, as Li et al. [8] pointed out, although their model obtained great accuracy in distinguishing COVID-19 with other types of viral pneumonia using radiology images, it still lacks of transparency and interpretability. For example, they do not know which imaging features have unique effects on the output computation. The benefit that black box deep learning methods can provide to clinical doctors is therefore questionable. A future study on explainable AI to explain the deep learning models' performance as well as features of images that contribute to the distinction between COVID-19 and other types of pneumonia is necessary. This would help radiologists and clinical doctors to gain insights about the virus and examine future coronavirus CT and X-ray images more effectively.

In the field of computational biology and medicine, AI has been used to partially understand COVID-19 or discover novel drug compounds against the virus [49] , [50] . These are just initial results and thus there is a great demand for AI research in this field, e.g., to investigate genetics and chemistry of the virus and suggest ways to quickly produce vaccines and treatment drugs. With a strong computational power that is able to deal with large amounts of data, AI can help scientists to gain knowledge about the coronavirus quickly. For example, by exploring and analyzing protein structures of virus, medical researchers would be able to find components necessary for a vaccine or drug more effectively. This process would be very time consuming and expensive with conventional methods [85] . Recent astonishing success of deep learning in identifying powerful new kinds of antibiotic from a pool of more than 100 million molecules as published in [86] gives a strong hope to this line of research in the battle against COVID-9.

Compared to the 1918 Spanish flu pandemic [87] , we are now fortunately living in the age of exponential technology. When everybody, organization and government try their best in the battle against the pandemic, the power of AI should be fully exploited and employed to support humans to combat this battle. AI can be utilized for the preparedness and response activities against the unprecedented national and global crisis. For example, AI can be used to create more effective robots and autonomous machines for disinfection, working in hospitals, delivering food and medicine to patients. AIbased NLP tools can be used to create systems that help understand the public responses to intervention strategies, e.g. lockdown and physical distancing, to detect problems such as mental health and social anxiety, and to aid governments in making better public policies. NLP technologies can also be employed to develop chatbot systems that are able to remotely communicate and provide consultations to people and patients about the coronavirus. AI can be used to eradicate fake news on social media platforms to ensure clear, responsible and reliable information about the pandemic such as scientific evidences relevant to the virus, governmental social distancing policies or other pandemic prevention and control measures. In Table III , we point out 13 groups of problems related to COVID-19 along with types of data needed and potential AI methods that can be used to solve those problems. We do not aim to cover all possible AI applications but emphasize on realistic applications that can be achieved along with their technical challenges. Those challenges need to be addressed effectively for AI methods to bring satisfactory results.

It is great to observe the increasing number of AI applications against the COVID-19 pandemic. AI methods however are not silver bullets but they have limitations and challenges such as inadequate training and validation data or when data are abundantly available, they are normally in poor quality. Huge efforts are needed for an AI system to be effective and useful. They may include appropriate data processing pipelines, model selection, efficient algorithm development, remodelling and retraining, continuous performance monitoring and validation to facilitate continuous deployment and so on. There are AI ethics principles and guidelines [88] , [89] that each phase of the AI system lifecycle, i.e. design, -Challenging to collect physiological characteristics and therapeutic outcomes of patients.

-Low-quality data would make biased and inaccurate predictions. -Uncertainty of AI models outcomes.

-Privacy and confidentiality issues.

[90]- [96] Machine learning techniques, e.g. naive Bayes, logistic regression, KNN, SVM, MLP, fuzzy logic system, ElasticNet regression [97] , decision tree, random forest, nonparametric Gaussian process [58] , deep learning techniques such as LSTM [45] and other recurrent networks, and optimization methods.

Predict number of infected cases, infection rate and spreading trend.

Time series case data, population density, demographic data.

-Insufficient time series data, leading to unreliable results. -Complex models may not be more reliable than simple models [98] . [26] , [99] , [100] COVID-19 early diagnosis using medical images.

Radiology images, e.g. chest X-ray and CT scans.

-Imbalanced datasets due to insufficient COVID-19 medical image data. -Long training time and unable to explain the results.

-Generalisation problem and vulnerable to false negatives.

[101]- [119] and works in Table I .

Deep learning CNN-based models (e.g. AlexNet [3] , GoogLeNet [4] , VGG network [5] , ResNet [6] , DenseNet [23] , ResNeXt [24] , and ZFNet [120] ), AIbased computer vision camera systems, and facial recognition systems.

Scan crowds for people with high temperature, and monitor people for social distancing and mask-wearing or during lockdown.

Infrared camera images, thermal scans.

-Cannot measure inner-body temperature and a proportion of patients are asymptomatic, leading to imprecise results.

-Privacy invasion issues.

[121]- [123] Analyse viral genomes, create evolutionary (phylogenetic) tree, find virus origin, track physiological and genetic changes, predict protein secondary and tertiary structures. Viral genome and protein sequence data -Computational expenses are huge for aligning a large dataset of genomic or proteomic sequences. -Deep learning models take long training time, especially for large datasets, and are normally unexplainable.

[55], [75] , DeepMinds AlphaFold [48] , [49] -Sequence alignment, e.g. dynamic

programming, heuristic and probabilistic methods. -Clustering algorithms, e.g. hierarchical clustering, k-means, DBSCAN [72] and supervised deep learning. Discover vaccine and drug biochemical compounds and candidates, and optimize clinical trials.

Viral genome and protein sequences, transcriptome data, drug-target interactions, protein-protein interactions, crystal structure of protein, cocrystalized ligands, homology model of proteins, and clinical data.

-Dealing with big genomic and proteomic data.

-Results need to be verified with experimental studies. -It can take long time for a promising candidate to become a viable vaccine or treatment method.

[50], [124] - [132] Heuristic algorithm, graph theory, combinatorics, and machine learning such as adversarial autoencoders [50] , multitask CNN [124] , GAN [50] , [125] , deep reinforcement learning [50] , [126] , [127] . Making drones and robots for disinfection, cleaning, obtaining patients vital signs, distance treatment, and deliver medication.

Simulation environments and demonstration data for training autonomous agents.

-Safety must be guaranteed at the highest level.

-Trust in autonomous systems. -Huge efforts from training agents to implementing them to real machines.

[133]- [136] Deep learning, computer vision, optimization and control, transfer learning, deep reinforcement learning [137] , learning from demonstrations. Track and predict economic recovery via, e.g. detection of solar panel installations, counting cars in parking lots.

Satellite images, GPS data (e.g. daily anonymized data from mobile phone users to count the number of commuters in cities).

-Difficult to obtain satellite data in some regions. -Noise in satellite images. -Anonymized mobile phone data security.

[138], [139] Deep learning, e.g. autoencoder models for feature extraction and dimensionality reduction, and CNN-based models for object detection.

Types of Data Challenges Related AI Methods

Real-time spread tracking, surveillance, early warning and alerts for particular geographical locations, like the global Zika virus spread model BlueDot [140] .

Anonymized location data from cellphones, flight itinerary data, ecological data, animal and plant disease networks, temperature profiles, foreign-language news reports, public announcements, and population distribution data, e.g. LandScan datasets [141] .

-Insufficient data in some regions of the world, leading to skewed results.

-Inaccurate predictions may lead to mass hysteria in public health.

-Privacy issues to ensure cellphone data remain anonymous.

BlueDot [142] , Metabiota Epidemic Tracker [143] , HealthMap [144] Deep learning (e.g. autoencoders and recurrent networks), transfer learning, and NLG and NLP tools (e.g. NLTK [61] , ELMo [62] , ULMFiT [63] , Transformer [64] , Googles BERT [65] , Transformer-XL [66] , XLNet [67] , ERNIE [68] , T5 [69] , BPT [70] and OpenAIs GPT-2 [71] ) for various natural language related tasks such as terminology and information extraction, automatic summarization, relationship extraction, text classification, text and semantic annotation, sentiment analysis, named entity recognition, topic segmentation and modelling, machine translation, speech recognition and synthesis, automated question and answering.

Understand communities' responses to intervention strategies, e.g. physical distancing or lockdown, to aid public policy makers and detect problems such as mental health.

News outlets, forums, healthcare reports, travel data, and social media posts in multiple languages across the world.

-Social media data and news reports may be low-quality, multidimensional, and highly unstructured.

-Issues related to language translation. -Data cannot be collected from populations with limited internet access.

[145]- [147] Mining text to obtain knowledge about COVID-19 transmission modes, incubation, risk factors, nonpharmaceutical interventions, medical care, virus genetics, origin, and evolution.

Text data on COVID-19 virus such as scholarly articles in CORD-19 dataset [84] .

-Dealing with inaccurate and ambiguous information in the text data.

-Large volume of data from heterogeneous sources.

-Excessive amount of data make difficult to extract important pieces of information.

[148]- [150] Mining text to discover candidates for vaccines, antiviral drugs, therapeutics, and drug repurposing through searching for elements similar to COVID-19 virus.

Text data about treatment effectiveness, therapeutics and vaccines on scholarly articles, e.g. CORD-19 dataset [84] and libraries of drug compounds.

-Need to involve medical experts knowledge.

-Typographical errors in text data need to be rectified carefully.

[132], [151] - [155] Making chatbots to consult patients and communities, and combat misinformation (fake news) about COVID-19.

Medical expert guidelines and information.

-Unable to deal with unsaved query.

-Require a large amount of data and information from medical experts.

-Users are uncomfortable with chatbots being machines.

-Irregularities in language expression such as accents and mistakes.

[156]- [164] development, implementation and ongoing maintenance, may need to adhere to, especially when most AI applications against COVID-19 involve or affect human beings. The more AI applications are proposed, the more these applications need to ensure fairness, safety, explainability, accountability, privacy protection and data security, be aligned with human values, and have positive impacts on societal and environmental wellbeing.

WHO coronavirus disease (COVID-19) dashboard

Gradientbased learning applied to document recognition

Imagenet classification with deep convolutional neural networks

Going deeper with convolutions

Very deep convolutional networks for large-scale image recognition

Deep residual learning for image recognition

Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT. Radiology

Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology

Deep learning system to screen coronavirus disease 2019 pneumonia

Chest CT findings in 2019 novel coronavirus (2019-nCoV) infections from Wuhan, China: key points for the radiologist. Radiology

CT imaging features of 2019 novel coronavirus (2019-nCoV). Radiology

Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection

A deep learning algorithm using CT images to screen for corona virus disease (COVID-19). medRxiv

Predicting COVID-19 malignant progression with AI techniques. medRxiv

Development and evaluation of an AI system for COVID-19. medRxiv

AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system in four weeks. medRxiv

Unet++: A nested u-net architecture for medical image segmentation

Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks

COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest radiography images

Rapid AI development cycle for the coronavirus (COVID-19) pandemic: initial results for automated detection and patient monitoring using deep learning CT image analysis

Can AI help in screening viral and COVID-19 pneumonia

Diagnosing COVID-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms

Densely connected convolutional networks

Aggregated residual transformations for deep neural networks

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Artificial intelligence forecasting of Covid-19 in China

α-Satellite: An AI-driven system and benchmark datasets for hierarchical community-level risk assessment to help combat COVID-19

Conditional generative adversarial nets

Modelling transmission and control of the COVID-19 pandemic in Australia

Urbanization affects peak timing, prevalence, and bimodality of influenza pandemics in Australia: Results of a census-calibrated model

Investigating spatiotemporal dynamics and synchrony of influenza epidemics in Australia: An agent-based modelling approach. Simulation Modelling Practice and Theory

A novel AI-enabled framework to diagnose coronavirus COVID-19 using smartphone embedded sensors: design study

Use of a smartphone thermometer to monitor thermal conductivity changes in diabetic foot ulcers: a pilot study

Smartphone-based human fatigue detection in an industrial environment using gait analysis

Fatigue detection during sit-to-stand test based on surface electromyography and acceleration: a case study

Smartphone-enabled videoobserved versus directly observed treatment for tuberculosis: a multicentre, analyst-blinded, randomised, controlled superiority trial

Neck posture monitoring system based on image detection and smartphone sensors using the prolonged usage classification concept

A comprehensive approach for cough type detection

Nocturnal cough and snore detection in noisy environments using smartphone-microphones

Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone-based survey in the populations when cities/towns are under quarantine

On the coronavirus (COVID-19) outbreak and the smart city network: universal data sharing standards coupled with artificial intelligence (AI) to benefit urban health monitoring and management

Predicting COVID-19 using hybrid AI model

A double epidemic model for the SARS propagation

A simple mathematical model for Ebola in Africa

Long short-term memory

Understanding the perception of COVID-19 policies by mining a

Characterizing the propagation of situational information in social media during COVID-19 epidemic: A case study on Weibo

Improved protein structure prediction using potentials from deep learning

Computational predictions of protein structures associated with COVID-19, DeepMind website

Potential COVID-2019 3C-like protease inhibitors designed using generative deep learning approaches

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study

MLDSP-GUI: an alignment-free standalone tool with an interactive graphical user interface for DNA sequence comparison and analysis

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Origin of novel coronavirus (COVID-19): a computational biology study using artificial intelligence. bioRxiv

COVID-19: a survey on public medical imaging data resources

Epidemiological dynamics modeling by fusion of soft computing techniques

Gaussian Processes for Machine Learning

Neural network ensemble operators for time series forecasting

A survey of the recent architectures of deep convolutional neural networks

Natural Language Processing with Python

Deep contextualized word representations

Universal language model fine-tuning for text classification

Attention is all you need

BERT: Pre-training of deep bidirectional transformers for language understanding

Transformer-XL: attentive language models beyond a fixed-length context

XLNet: Generalized autoregressive pretraining for language understanding

ERNIE: Enhanced Language Representation with Informative Entities

Exploring the limits of transfer learning with a unified text-to-text transformer

BPtransformer: modelling long-range context via binary partitioning

Language models are unsupervised multitask learners

A density-based algorithm for discovering clusters in large spatial databases with noise

Multi-output interval type-2 fuzzy logic system for protein secondary structure prediction

A hybrid supervised approach to human population identification using genomics data

Genomic mutations and changes in protein secondary structure and solvent accessibility of SARS-CoV-2 (COVID-19 virus), bioRxiv

Machine learning applications in genetics and genomics

Applications of deep learning and reinforcement learning to biological data

An interactive web-based dashboard to track COVID-19 in real time

Impact of international travel and border control measures on the global spread of the novel 2019 coronavirus outbreak

COVID-19 image data collection

COVID-CT-Dataset: a CT scan dataset about COVID-19

POCOVID-Net: automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS)

A Twitter dataset of 70+ million tweets related to COVID-19 (Version 2.0) [Data set

CORD-19). (2020) Version 2020-03-20

AI can help scientists find a Covid-19 vaccine

A deep learning approach to antibiotic discovery

Reassessing the global mortality burden of the 1918 influenza pandemic

Ethics guidelines for trustworthy AI

A novel high specificity COVID-19 screening method based on simple blood exams and artificial intelligence. medRxiv

Predicting mortality risk in patients with COVID-19 using artificial intelligence to help medical decision-making. medRxiv

A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected COVID-19 pneumonia in fever clinics

The role of artificial intelligence in management of critical COVID-19 patients

Towards an artificial intelligence framework for datadriven prediction of coronavirus clinical severity

Artificial intelligence in the intensive care unit

Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learningbased prognostic model with clinical data in Wuhan. medRxiv

Regularization and variable selection via the elastic net

Why is it difficult to accurately predict the COVID-19 epidemic?

Predicting Covid-19 in china using hybrid AI model

Modified SEIR and AI prediction of the epidemics trend of Covid-19 in China under public health interventions

Artificial intelligenceenabled rapid diagnosis of patients with COVID-19

Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of Covid-19 pneumonia using computed tomography

Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study

COVID-19 pneumonia: what has CT taught us

Diagnosis of coronavirus disease 2019 (Covid-19) with structured latent multi-view representation learning

Automated detection of COVID-19 cases using deep neural networks with X-ray images

Medical image analysis using wavelet transform and deep belief networks

CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest X-ray images

Coronavirus (COVID-19) outbreak: what the department of radiology should know

Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks

Deep learning covid-19 features on CXR using limited training data sets

COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios

Inf-Net: automatic COVID-19 lung infection segmentation from CT images

Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays

CovidGAN: data augmentation using auxiliary classifier GAN for improved covid-19 detection

A hybrid COVID-19 detection model using an improved marine predators algorithm and a ranking-based diversity reduction strategy

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Sample-efficient deep learning for COVID-19 diagnosis based on CT scans

Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study

Visualizing and understanding convolutional networks

Coronavirus France: cameras to monitor masks and social distancing

As coronavirus surveillance escalates, personal privacy plummets. The New York Times

How Russia is using facial recognition to police its coronavirus lockdown

Potentially highly potent drugs for 2019-nCoV. bioRxiv

MathDL: mathematical deep learning for D3R Grand Challenge 4

AIaided design of novel targeted covalent inhibitors against SARS-CoV-2. bioRxiv

De novo design of new chemical entities (NCEs) for SARS-CoV-2 using artificial intelligence

Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the SARS-CoV-2 (2019-nCoV, COVID-19) coronavirus

Network bioinformatics analysis provides insight into drug repurposing for COVID-2019

COVID-19 coronavirus vaccine design using reverse vaccinology and machine learning

How artificial intelligence is changing drug discovery

A data-driven drug repositioning framework discovered a potential therapeutic agent targeting COVID-19

Combating COVID-19-The role of robotics in managing public health and infectious diseases

The uses of drones in case of massive epidemics contagious diseases relief humanitarian aid: Wuhan-COVID-19 crisis

From high-touch to hightech: COVID-19 drives robotics adoption

Robotics, smart wearable technologies, and autonomous intelligent systems for healthcare during the COVID19 pandemic: An analysis of the state of the art and future vision

Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications

SolarNet: a deep learning framework to map solar power plants in China from satellite imagery

Satellites and AI monitor Chinese economys reaction to coronavirus

Anticipating the international spread of Zika virus from Brazil

LandScan Global Population Database

Pneumonia of unknown aetiology in Wuhan, China: potential for international spread via commercial air travel

Metabiota Epidemic Tracker

Contagious Disease Surveillance

Top concerns of tweeters during the COVID-19 pandemic: infoveillance study

The impact of COVID-19 epidemic declaration on psychological consequences: a study on active Weibo users

The outbreak of COVID-19 coronavirus and its impact on global mental health

deepMINE-Natural language processing based automatic literature mining and research summarization for early stage comprehension in pandemic situations specifically for COVID-19. bioRxiv

CovidNLP: A Web application for distilling systemic implications of COVID-19 pandemic with natural language processing

Understand research hotspots surrounding COVID-19 and other coronavirus infections using topic modeling. medRxiv

Word embedding mining for SARS-CoV-2 and COVID-19 drug repurposing

Text and networkmining for COVID-19 intervention studies. chemRxiv

Information mining for covid-19 research from a large volume of scientific literature

Identification of pulmonary comorbid diseases network based repurposing effective drugs for COVID-19

Bioactivity profile similarities to expand the repertoire of COVID-19 drugs

An artificial intelligence-based first-line defence against COVID-19: digitally screening citizens for risks via a chatbot

WHO Health Alerts Facebook Messenger Chatbot

WHO Viber Interactive Chatbot

WHO's Health Alert on WhatsApp

UNICEFs Europe and Central Asia Regional Office and the WHO Regional Office for Europe (2020)

IBM Watson Assistant for Citizens

COVID-19 Risk Assessment Chatbot

Rona (COVID-19 Bot)

COVID 19 Chat Bot

Dr. Nguyen has been recognized as a leading researcher in Australia in the field of Artificial Intelligence by The Australian Newspaper in a report published in 2018. He is currently a Senior Lecturer in the School of Information Technology, Deakin University, Victoria, Australia.