key: cord-0718243-3mxdiapo authors: Khanday, Nadeem Yousuf; Sofi, Shabir Ahmad title: Deep Insight: Convolutional Neural Network and its Applications for COVID-19 Prognosis date: 2021-05-28 journal: Biomed Signal Process Control DOI: 10.1016/j.bspc.2021.102814 sha: adb139121612fa06e6aeb551464f6187d52a7116 doc_id: 718243 cord_uid: 3mxdiapo BACKGROUND AND OBJECTIVE: SARS-CoV-2, a novel strain of coronavirus’ also called coronavirus disease 19 (COVID-19), a highly contagious pathogenic respiratory viral infection emerged in December 2019 in Wuhan, a city in China's Hubei province without an obvious cause. Very rapidly it spread across the globe (over 200 countries and territories) and finally on 11 March 2020 World Health Organisation characterized it as a ”pandemic”. Although it has low mortality of around 3% as of 18 May 2021 it has already infected 164,316,270 humans with 3,406,027 unfortunate deaths. Undoubtedly the world was rocked by the COVID-19 pandemic, but researchers rose to all manner of challenges to tackle this pandemic by adopting the shreds of evidence of ML and AI in previous epidemics to develop novel models, methods, and strategies. We aim to provide a deeper insight into the Convolutional Neural Network which is the most notable and extensively adopted technique on radiographic visual imagery to help expert medical practitioners and researchers to design and finetune their state-of-the-art models for their applicability in the arena of COVID-19. METHOD: In this study, a deep convolutional neural network, its layers, activation and loss functions, regularization techniques, tools, methods, variants, and recent developments were explored to find its applications for COVID-19 prognosis. RESULT: This paper highlights recent studies of deep CNN and its applications for better prognosis, detection, classification, and screening of COVID-19 to help researchers and expert medical community in multiple directions. It also addresses a few challenges, limitations, and outlooks while using such methods for COVID-19 prognosis. CONCLUSION: The recent and ongoing developments in AI, MI, and deep learning (Deep CNN) has shown promising results and significantly improved performance metrics for screening, prediction, detection, classification, forecasting, medication, treatment, contact tracing, etc. to curtail the manual intervention in medical practice. However, the research community of medical experts is yet to recognize and label the benchmark of the deep learning framework for effective detection of COVID-19 positive cases from radiology imagery. Machine learning is a field of study in which computers are enriched with the capability of acting without being explicitly programmed. Science and engineering are professionally fused in machine learning. As a science, it is concerned with achieving high-level interpretation from video sequences, digital imagery, multi-view camera geometry, or multidimensional data [1] . As an engineering and technological discipline, it is concerned with the construction of learning systems and applications by applying its theories and models. Hence, the main aim and focus is to develop computer models that make computers able to learn independently, after being fed with supervised training labels, generalize and realize signatures and relations. Increased chip computation like GPU units, improvement and development of highly efficient learning algorithms, and cheaper computer hardware are the main reasons for the blooming of machine learning today [2] . Over the years, due to the work of enthusiastic researchers a number of overview articles have explored aspects of learning models and their learning algorithms. Continuous improvements and modifications have been carried out to boost the performance metrics of different machine learning models and algorithms. However, with dynamic phenomenons of nature, new problems arise from time to time which challenges human intelligence and experience. But without breaking down, each time humans try to resolve the problems by accepting challenges and came up with the best possible solutions. Wuhan, the city of China gave birth to such a problem when a new coronavirus in late 2019 called Coronavirus 2019 (COVID- 19) crop up which is actually a contagious respiratory infectious disease induced by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Very rapidly it spread across the globe and finally on 11-March 2020, WHO characterizes it as a Pandemic [3] , which is the first-ever pandemic due to any class of coronavirus because it was having alarming levels of spread, severity, and inaction. After its outbreak, the count of cases started to upsurge exponentially in all the countries and shock the world with the number of deaths and newly infected cases on daily basis. By 18 May 2021, it has already infected 164,316,270 individuals across the globe with 3,406,027 unfortunate deaths. Not only it has crushed human lives, rattled markets, and disclosed the competence of governments but it could be the straw that breaks the camel's back of economic globalization and results in permanent shifts in economic and political power. World Health Organisation responds with the best of its capabilities across the world on prevention, surveillance, containment, treatment, and coordination to combat COVID-19. Apart from WHO, every other country including NGO's adopted their national health missions, policies and utilized their research laboratories, resources to tackle the situation. At the same time, enthusiastic researchers left no stone unturned in their search for new strategies, ideas, techniques, and system models to combat COVID-19. The recent and ongoing progress in AI and ML has improved significantly the performance metrics to detect coronavirus from a CT scan or X-ray images at early stages. It has also improved analysis, medication, prediction, contact tracing, and vaccine/drug evolution process to curtail the human intervention in medical practice. Although in this global urgency, machine learning and artificial intelligence models play a vital role to help medical practitioners and researchers in the detection and prediction of coronavirus but still it is far away to shoot-out it completely. But to combat COVID-19 researchers are tirelessly working and putting in their effort, time and energy. Different state-of-the-art machine learning algorithms like support vector machines (SVM) [4] [5] [6] , Relevance vector machines, Random forest [7, 8] , Extreme machine learning, AdaBoost [9] , different variants of regression [4] , LSTM [10, 11] , and Deep neural networks [5, 10, [12] [13] [14] [15] [16] including Deep convolutional neural networks and deep belief networks are adopted for automatic diagnosis, detection and classification (SARS, MERS, COVID-19) purposes and gained popularity by creating end-to-end modeling to achieve promised results. But COVID-19's alarming levels of spread and severity have necessitated the need for expertise. AI-based detection systems find great need and demand to assist radiologists to obtain an accurate diagnosis. Efforts are being continuously made to improve the performance metrics of learning models regarding COVID-19. It is hard to cover the contribution of machine learning or AI towards COVID-19 in a single paper. However, this work aims at highlighting the comprehensive overview and contribution of only Convolutional neural networks and their applications for COVID-19. A substantial explanation of CNN has been provided in the following sections to clearly understand the CNN and its modules and finally how it can be used to automatic prognosis of COVID- 19 . The rest organisation of this paper is structured as follows. Section 2 expounds the Convolutional Neural Network and its three layers specifically convolutional layer (section 2.1) and its types, pooling layer (section 2.2) and its types and finally fully connected layer (section 2.3). Different activation functions (section 2.4), loss functions (section 2.5) and regularization techniques (section 2.6) for CNNs are also illustrated. Section 3 briefly explores alternate deep learning techniques. In Section 4 the pipeline of general architecture for COVID-19 prognosis has been proposed. Section 5 highlights applications of Deep CNNs for COVID-19. Finally, section 6 covers challenges, limitations and outlooks to be explored in the future. In diverse computer vision applications, the convolutional neural network is the most notable visual learning algorithm with acceptable performance in processing 2D data with grid-like topology like images and videos. The field of CNN is encouraged by receptive fields in the visual cortex of animals by [17] and later with multilayer artificial neural network LeNet-5 by [18] [19] [20] . CNN differs from Fukushima's neocognitron [21] by sharing weights in temporal dimensions to reduce computation and complexity using time-delay neural networks. There exists three main layers in any CNN model namely convolutional layer, pooling layer, and fully connected layer, each having different functions. CNN model is trained like any standard neural network using backpropagation and involves two steps, a forward step, and a backward step. In the forward step; in each layer, the input image is represented with parameters like bias and weights to finally compute loss cost, and in the backward step; chain rules are applied to compute gradients of each parameter to fine-tune the network. Various works have been proposed since 2006 to solve the problems in training deep CNNs [22] [23] [24] [25] and finally [26] proposed CNN architecture, AlexNet [25, 27] , similar to LeCun's LeNet-5 but with deeper structure to reduce the training problems and has been strongly applied to diverse computer vision tasks [26, [28] [29] [30] [31] [32] [33] [34] . Later improved and advanced developments like VGGNet [23] , ZFNet [35] , GoogleNet [24] , ResNet [36] , Inception [37] , and SENet [38] raised the performance of CNNs to new heights. The general architecture of Convolutional Neural Network is shown in figure 1. Computer vision has achieved success with the improvements and variants of CNN. Recursive CNN (RCNN) was introduced by Eigen et al. [39] in 2013 which has tied filter weights across layers and the same number of feature maps in all layers. Jarret et al. [40] proposed a novel model by fusing convolution with an autoassociator for feature extraction which is useful for object recognition. Advanced stacked convolutional autoencoder was developed by Masci et al. [41] for unsupervised feature learning which achieved satisfying CNN initializations by averting the local minima of highly non-convex objective functions. Desjardins and Benjio [42] used RBM as a kernel to compute convolution in a CNN which composed convolutional restricted Boltzmann machine (CRBM). They achieved a higher convergence rate and unlike RBMs, CRBM's complexity depends only on the count of features to be extracted and the magnitude of the receptive fields. For hierarchical representations convolutional deep belief neural network (CDBN) are developed [43] to be applied on scaled unsupervised learning. To avoid over-fitting, the need is to train the model with more and more training labels and to deal with large datasets, Mathieu and Henaff [44] , proposed an algorithm using Fast Fourier transform (FFT) to quicken the training process with a significance factor by determining convolutions as products in the Fourier domain. The cuDNN [45] and fbfft [46] are some GPU based libraries that are introduced to increase the speed in the training and testing process. Using data management, data mining, data evaluation, data dispersion components, a novel dynamic graph convolutional network GCN is proposed which presents a never-ending learning platform called CUImage [47] . A convolution operator is a generalized linear model and is considered as the central building block of convolutional neural networks which makes networks capable to construct informative features within local receptive fields at each layer by combining both spatial and channel-wise information [38] . It involves the multiplication between a 2D array of weights and an array of input data, called a kernel or a filter to convolve the full image as well as the intervening feature maps to generate various complete feature maps by different kernels. Mathematically, in i th feature map of n th layer, the feature value V l x,y,k at location (x,y) , is calculated by: (1) where b and W are the bias term and weight vector of the i th filter of the n th layer respectively, and Y is the input patch centred at location (x,y) of the n th layer. Activation functions (typically Sigmoid, tanh [48] , ReLU [49] ) of activation value z l x,y,i of convolutional feature V l x,y,i are applied to introduce non-linearity to CNN which can detect non-linear features desirable to multi-layered networks. The kernel is kept intentionally smaller than the input data. This allows it to remultiply the set of weights number of times with input array at different points on the input hence gives the capability of translational invariance to the model, which is not possible otherwise if the dimensionality of the filter output equals that of the inputs. The convolutional operation has many advantages [50] like i) Parameter sharing mechanism to learn one set of parameters over the entire image instead of an isolated set of parameters at each location to scale down the number of parameters per response, hence reduces the overall complexity of the model and makes the training easier. ii) Sparse interaction to reduce computational burden by learning correlations between neighboring pixels. iii) Invariance translation or equivariance, a useful property to care about the presence of some feature than exactly where it is or to shift the responses according to the shifting of an object in an input image. Some supporting works to enhance the representational ability of the convolutional layer are also introduced and showed in figure 2. Tiled convolutional neural network [51] is a variation of CNN which with multiple maps gives the advantage of representing complex invariances like rotational and scale-invariant features by pooling over untied weights and decreases the number of learnable parameters. Switching the backward and forward passes of a convolution, constitutes transposed convolution also called fractionally strided convolutions [52] or deconvolution [35, [53] [54] [55] which identifies a single activation with numerous output activations. For the input feature map, dilation factor is given by the stride of deconvolution with padding to upsample the input and then perform the convolution operation on it. Application areas of deconvolution are visualization [35] , recognition [56, 57] , super-resolution [58] , semantic segmentation [59] and localization [60] . By introducing one or more hyper-parameter to the convolutional layer, a new variant called dilated convolutional neural network [61] has been developed. The receptive field of output units is cheaply increased without increasing the kernel size which allows the network to gain more relevant information for making better predictions. Scene segmentation [61] , speech synthesis [62] , recognition [63] and machine translation [64] are some highlighted tasks achieved with dilated convolution. Lin et al. [65] introduced a general network structure to increase the representational power of the neural network by replacing the linear filter of the convolutional layer with micronetwork like multi-layer perceptron convolution to boost model discriminability for local patches within the receptive fields. Subsampling layers can be added in between mlpconv layers and stacking of such mlpconv layers gives the overall structure of Network in Network (NIN), on top of which lie the objective cost layer and the global average pooling. Using the structure of the network in networks [65] , Szegedy et al. [24] proposed inception module, which uses 1x1 convolutions to increase both depth and width of their network without a significant performance penalty. Using the inception module, the drastic dimensional reduction is achieved by bringing down the number of network parameters to 5 million which are much less than those of AlexNet (60M) and ZFNet (75M), hence reduces the computational requirements. Architectures of [28] and [66] adopts inception module as the base network which helps them in effective object detection and localization. Generally, a pooling layer is a new layer added after the convolutional layer and is successfully used for the dimensional reduction of feature maps and network parameters. As pooling operation involves neighboring pixels for computation, so pooling layer is also translation-invariant like the convolutional layer. Boureau et al. [68] and Scherer et al. [69] provide details about the most commonly used pooling strategies, Average pooling, and Max pooling. Based on their analytics it was concluded that Max pooling can lead to faster convergence, improves generalization, and is utilized by most of the GPU implementations of various CNN variants [24, 70] There are several variations related to the pooling layers each with a different purpose, some are mentioned below. Inspired by [71] and theoretically analysed by [72] , suggests that L p pooling contributes satisfactory generalization than max pooling. L p pooling is shown as: where Z p,q,i is result of the pooling operator at position (p,q) in i th feature map, and f a,b,i is the feature value at position (a,b) within the pooling region R p,q in i th feature map. Max and Avg. pooling can be derived from L p pooling by substituting n = 1, L p corresponds to Avg. pool and n = ∞, L p reduces to Max. pooling. A new variant of L p unit called L p -norm is proposed by [73] which replaces the max operator in a maxout unit and is more able in depicting complex, non-linear separating boundaries. Stochastic pooling [74] , proposed by Zeiler et al., is a dropout-inspired pooling method that replaces the traditional deterministic pooling operations with a stochastic procedure, using multinomial distribution within each pooling region by selecting the activation randomly. Unlike Max. pooling, stochastic pooling avoids over-fitting because of the stochastic component. Spectral pooling [75] achieves reduction in dimensionality by truncating the representation of input onto the frequency basis set. For the same number of parameters, spectral pooling preserves considerably more information and structure features for network discriminability than any other pooling strategy like max-pooling by exploiting the non-uniformity of particular inputs using linear low-pass filtering operation in the frequency domain. As spectral pooling only requires matrix truncation, which provides the advantage of being implemented at a negligible additional computational cost in CNNs that use Fast Fourier Transformations for convolution filters. A new approach of pooling called Hartley spectral pooling [76] , is introduced for less lossy in the dimensionality reduction and reduces the computation by avoiding the use of complex arithmetic for frequency representation. He et al. [77] introduced spatial pyramid pooling (SPP) which takes an image of any size as input hence provides flexibility by not only allowing arbitrary aspect ratio but also arbitrary scales [78, 79] to generate fixed-length representation. Unlike sliding window pooling, SPP improves Bag-of-words (BoW) by pooling input feature maps in local spatial bins with size proportional to the image size. Hence regardless of the image size, the number of bins will be fixed. Computer vision tasks like object detection and recognition faces big challenge of handling deformations. Although Max pooling and average pooling handles deformations somehow fail to handle the geometrical model of object parts and deformation constraint. Ouyang et al. [80] offers a new deformation limited pooling layer called def pooling which handles deformation more efficiently by enabling the deep model to learn the deformation of visual signatures. As deformation stability at each layer changes significantly throughout training and often decreases, however, filter smoothness contributes significantly to achieving deformation stability in CNN's [81] and the level of learned deformation stability is determined by joint distribution of outputs and inputs. The pooling approaches are designed for different purposes and involve different procedures, so diverse pooling methods could be fused to enhance the performance metrics of a convolutional neural network. Certain fully-connected layers follow the extreme pooling layer of the network as shown in figure 1 . In FCL all the inputs from one layer are connected to every activation unit of the next coming layer by converting the 2D feature maps into a 1D feature vector, which could be fed forward into a certain number of classes for classification [26] . 1D feature vector could be considered for further processing [28] like a compilation of the data extracted from previous layers to produce the final output. Few fully connected layers are compulsory in shallow CNN models where the final convolutional layer generated the features corresponding to a portion of the input image. Fully connected layers perform like a traditional neural network and comprise most of the parameters (90%) of the network (CNN) to fit complex non-linear discriminant functions in the feature domain into which the input data parts are mapped. The operation of the fully-connected layer is illustrated by figure 3. AlexNet [27] has 60M parameters among which 58M parameters are in fully connected layers. Similarly, VGGNet [26] has total 138M parameters, among which 123M are in FC layers. The existence of a large number of parameters may end up in over-fitting the CNN model. However, techniques like stochastic pooling, dropout, data augmentation are introduced to avoid over-fitting. Moreover, [82] introduces a CNN architecture called SparseConnect which sparses the links between FC layers to reduce the problem of over-fitting. Normally the structure and design of the fully connected layer is not changed, however, in the transferred learning method of image representation [83] , the FC layers of the CNN model produce a means to learn rich mid-level image features transferable to diverse visual recognition tasks by preserving the learned parameters. The performance of a CNN for particular vision tasks can be significantly improved by a proper activation function. Some recently introduced activation functions in CNNs are mentioned below. Rectified Linear Unit (ReLU) [84] is most evident unsaturated and successfully used activation function [85, 86] in various tasks of CNN including image classification [24, 26, 87] , speech recognition, natural language processes, and game intelligence [88] , to name a few. For P x,y,i as input to activation function on the i th channel at location (x,y) (see figure 4 (a)), the activation function for ReLU is calculated as: ReLU is linear for all positive values, and zero for all negative values. Compared to Sigmoid and tanh activations, ReLU empirically works better [89, 90] and is much faster to compute, makes the model to train even without pre-training [25] or run in less time and allows the network to converge very quickly due to linearity. Since ReLU prunes the negative part to zero which allows the network to be easily sparsely activated. The problem of ReLU neurons is becoming inactive by having zero gradients whenever the unit is not active, refers to dying ReLU which drastically slows down the training process of the model. Once the ReLU neuron becomes inactive, it remains in a dead state for the whole procedure because gradient-based optimization never allows adjusting their weights. Deep ReLU network finally dies in probability, as the depth goes infinite. To alleviate this problem, recently L.Lu et al. [91] proposed a randomized asymmetric initialization procedure with erotically designed parameters to effectively prevent the dying ReLU. Two other variants, called Leaky ReLU (LReLU) and Parametric ReLU (PReLU) introduced by Maas et al. [89] and He et al. [92] respectively (see figure 4 (b)), are proposed to resolve the problem of dying ReLU. Leaky ReLU is defined as: where λ is predefined and ranges in (0,1). Leaky ReLU has a small gradient for negative values instead of mapping it to constant 0 which enables backpropagation. This not only fixes the dying problem but makes the activation function more balanced to speed-up the training process. Randomized ReLU (RReLU) [93] is variant of Leaky ReLU, proposed and successfully used in Kaggle NDSB competition, where a random number λ xy is taken from a uniform distribution [94, 95] in training and then applied in testing to avoid zero gradients in negative part. Mathematically; Where P x,y,i denotes the input, at location (x,y) on i th channel, of activation function (see figure 4(c)). To get deterministic and stable outputs, method of dropout [96] can be used by taking average of all λ xy in test phase only. Apart from improving performance, over-fitting can also be reduced due to randomized sampling of parameters. Parametric Rectified Linear Unit (PReLU) boosts accuracy at negligible extra computational cost by adaptively learning the parameters of the rectifiers. PReLU function is computed as: z x,y,i = Max(P x,y,i ,0) + α Min(P x,y,i ,0) (6) where α is the learned parameter common to all channels of one layer. The risk of over-fitting is eliminated by PReLU as it introduces very few parameters proportionate to the total number of channels. Backpropagation is used for training [19] and is optimized simultaneously with other layers as well. Exponential Linear Unit (ELU) proposed by Clevert et al. [97] for faster and more accurate learning in CNN. Apart from fast learning, ELUs also lead to significantly better generalization performance than other variants of ReLUs. ELUs avoid vanishing gradient problem like ReLU, Leaky ReLU and Parametric ReLU do, by setting the positive part to identity, so their derivative part is one and not contractive. Unlike ReLU, the negative part is present in ELUs, which is responsible for faster learning as it decreases the gap between the normal gradient and the unit natural gradient likewise in batch normalization. ELUs saturate to a negative value when the argument gets smaller, hence decreases the information propagation and variation to the next layer, unlike Leaky ReLU and Parametric ReLU. Mathematically ELU is computed as: where λ is a predefined hyperparameter for regulating the value to which an ELU saturates for negative inputs (see figure 4(d)). Parametric ELU (PELU) [98] , variant of ELU, proposed that learning a parameterization of ELU improves its performance. Using gradient-based framework on ImageNet [26] with different network architectures, PELU has relative error improvements over ELU. The maxout model is feed-forward architecture, using maxout unit as activation function [99] which returns the maximum across K affine feature maps, designed to be used in conjunction with the dropout which eases the training and makes it robust to achieve excellent performance. Maxout activation function is computed as; Where P x,y,i is the input on i th channel of the feature map. ReLU and its variants like Leaky ReLU are specific cases of maxout, hence it relishes all the profits of ReLU family without facing the dying ReLU problem, however number of parameters for every neuron increases. [100] proposed a new type of maxout, called Probout, by replacing the max operation with a probabilistic sampling procedure, which is partially invariant to changes in its input. Probout successfully achieved the balance between retaining the tempting properties of maxout units and boosting their invariance properties. Models learn through loss function. It calculates the penalty by comparing the actual value and predicted value hence helps in optimizing the parameters of a neural network. For a specific task selecting an appropriate loss function is important. For regression problems, Mean Squared Error (MSE) is a commonly adopted loss function and is computed as the mean of the squared differences between the actual and predicted values [101] . Mean Squared Logarithmic Error (MSLE) is used when the model is forecasting unscaled quantities directly. Other variants Mean Absolute Error Loss (MAE) is used when the target variable is associated with outliers. For binary classification problems, cross-entropy is the default loss function. It computes the value that compiles the mean difference between the real and expected probability distributions for predicting the particular class. Variant of cross-entropy for binary classification is called Hinge loss, primarily utilized for training large margin classifiers most notably for Support Vector Machines [102, 103] . Hinge loss is a convex function and has many extensions. For an expected result t = ± 1 and a classifier score Y, then Hinge loss can be mathematically represented as: (1-t) .Z ] n (9) Where Z = Y T X i + b and Y is weight vector of classifier. If n=1, then it is simple Hinge-Loss (L 1 -Loss); while if n=2, then it is squared Hinge-Loss (L 2 -Loss) [104] , which on investigation [105] proves to be more effective than Softmax on MNIST [106] . Robustness is also significantly improved as demonstrated by [107] . Cross-entropy also called Logarithmic loss are also used for multiclass classification problems which are fundamental in computer vision, where the expected values are in the set 0,1,2,3,...,n , where each class is accredited an exclusive integer value. Multiclassification may suffer from space complexity because of one hot encoded vector for the expected element of each training example. However, sparse-cross entropy, variant of cross-entropy resolves this problem by computing loss without requiring one-hot encoding for the target variable prior to training. Kullback Leibler Divergence is a non-symmetrical measure of how a particular probability distribution P(x) varies from the baseline distribution q(x) over the similar values of discrete variable x. KL divergence is most notably used in autoencoders [108] [109] [110] . The symmetrical form of KLD is called Jensen-Shannon Divergence (JSD) which determines the analogy between P(x) and q(x), by minimizing which two distributions can be as close as possible, to be used in Generative Adversarial Networks (GANs) [111] [112] [113] . Softmax Loss: Due to probabilistic interpretation and simplicity of softmax function, Softmax loss fuses it with a multinomial logistic loss to form arguably one of the most commonly used components in CNN architectures. Recently, [114] propose Large-Margin Softmax (L-Softmax) loss, which defines a flexible learning task with hyperparameter to adjust the margin which can effectively avoid the problem of over-fitting. Contrastive Loss: To train Siamese network [115] [116] [117] [118] , contrastive loss function is most commonly used, which is distance-based loss function for learning a correlation measure from pairs of data points representation tagged as matching or non-matching. Lin et al. [119] proposed double margin loss function using additional margin parameter which can be interpreted as learning local large-margin classifiers to differentiate matching and non-matching elements. Triplet Loss: Triplet loss [120] considers three instances namely an anchor instance (x a ), positive instance (x p ) from the same class of (x a ) as well and a negative instance (x n ). Mathematically triplet loss is computed as: Where f(x) takes x as input, λ is a margin that is enforced between positive and negative pairs, i denotes i th input, a, p, n denotes the anchor, positive and negative instances respectively. The primary goal of triplet loss is to decrease the distance between an anchor and a positive both having the same identity and maximize the distance between the negative and an anchor of a different identity. Liu et al. [121] proposed Coupled Cluster (CC) which is characterized over the negative set and the positive set. CC works well in backpropagation as well by changing the randomly selected anchor with the cluster center as it aggregates the samples in the negative set and samples in the positive set to increase the reliability. A new loss function for CNN classifiers is introduced by Zhu et al [122] , inspired by predefined evenly-distributed class centroids (PEDCC), called PEDCC-loss, which makes inter-class distance maximum and intra-class distance short enough in hidden feature space by replacing the classification linear layer weight with PEDCC weights in CNNs achieved best results in image classification tasks, improves stability in network training, and results in fast convergence. Regularization is one of the key elements of machine learning and a crucial ingredient of CNNs, which makes the model generalize better as it intends to reduce the test error but not training error. As the introduction of a large number of parameters often leads to over-fitting, which can be effectively reduced by regularization. Various variants of regularization techniques have emerged in defense of over-fitting e.g sparse pooling, Large-Margin softmax, L p -norm, Dropout, Dropconnect, data augmentation, transfer learning, batch normalization, and Shakeout are notable ones. Normally regularization adds term λ R(θ) to the objective function which keeps and considers all the features but reduces their magnitude of parameters θ j to penalize the model complexity. Mathematically, if the loss function is J(θ,x,y), then the regularized loss will be: J Reg (θ,x,y) + λR(θ) (11) where R(θ) is regularization term. λ is regularization parameter (to be chosen carefully). For L p -norm, regularization function is computed as: R(θ) = j θ j p p (12) when p ≥ 1, the L p -norm is convex and has been widely used to induce structured sparsity in the solutions to various optimization problems [123, 124] . For p = 2, it is referred to as weight decay. Hinton et al. [125] introduced Dropout, and expounded in-depth by Baldi et al. [126] , which reduces over-fitting effectively by randomly dropping units (hidden and visible). The links are also released from the neural network during training to avoid complicated co-adaptations [125] on the training data and boosts the generalization ability. In Dropout the probability P is assigned to each element of a layer's output, otherwise is set to zero with probability (1-P). Similar to standard neural networks, Dropout is trained using stochastic gradient descent and provide a means to combine exponentially many heterogeneous neural network architectures effectively. Dropout holds the accuracy even though some information is missing and makes the network independent from anyone or a small fusion of neurons. Several variants of Dropout have been introduced for further improvements including the fast Dropout method by Wang et al. [127] , adaptive Dropout by Ba et al. [128] , and their detailed representation in [129] [130] [131] [132] . Spatial Dropout [133] which expands the Dropout values across the feature map and works efficiently when the size of training data is limited. Inspired by Dropout, Dropconnect provides the generalization of Dropout by randomly setting the elements of weight matrix W to zero, rather than randomly setting each output unit to zero. In other words, instead of activations it randomly drop the weights. Although relatively slower Dropconnect achieves satisfactory results on a variety of standard benchmarks. Dropconnect also introduces dynamic sparsity on the weights W instead of output vectors of a layer. Additionally, biases are also masked out during training. Recently, Guoliang Kang et al. [134] introduced a new variant of regularization, called Shakeout, by slightly modifying the concept of Dropout. Unlike Dropout, which randomly discards units, Shakeout randomly chooses to boost or reverse each unit's improvement to the next layer in the training stage. Regularizer of Shakeout enjoys the benefits of L 0 , L 1 and L 2 regularization terms adaptively. Shakeout performs better than Dropout when the data is deficient. Much sparser weights [135] are obtained by Shakeout, and it also leads to better generalization. The instability of the training process is reduced by Shakeout for deeper architectures. Data augmentation, a data-space solution to an issue of deficient data by transforming existing data into advanced data without changing their natures, is applied notably for various computer vision tasks. It prevents over-fitting by modifying limited training dataset [136] . Popular data augmentation methods include geometrical transformations such as mirroring [137] , sampling [25] , shifting [138] , rotating [139] , color space augmentations, color transformations, random erasing, kernel filters, mixing, feature space augmentation, adversarial training, GANs, meta-learning and various photometric transformations [140] . To select the best suitable transformation from a bag of candidate transformations is done by greedy strategy proposed by Paulin et al [141] . Dosovitskiy et al. [142] proposed data augmentation based unsupervised feature learning, [143] and [144] introduces a way of gathering images from online sources to improve learning in different visual recognition tasks. Apart from above-described regularization methods, there are some other methods like weight decay, weight tying and many more in [145] which avoids over-fitting by decreasing the number of parameters by having better representations of the input data [51] or help to have better generalization [26] . The important thing about these regularization techniques is that they can be fused to boost performance as they are not mutually exclusive. Although CNN is the prominent deep learning technique to be used for different computer vision applications [?] like detection, classification etc. However, there exist other stateof-the art techniques which justify prominent results for these applications in different domains. Some of these techniques are briefly mentioned below. • Deep Restricted Boltzmann Machine: Hinton et al. [146, 147] proposed a generative stochastic neural network which is an energy-based model and primary variant of Boltzmann machine [148] , called Restricted Boltzmann machine (RBM) [149, 150] . RBM is a special variant of BM with restriction of forming bipartite graph between hidden and visible units. Due to flexibility of RBMs, they play vital role in various applications such as dimensional reduction, topic-modelling, collaborative filtering, classification and feature learning. Using RBMs as basic building block, novel models can be build which include: Deep Belief Networks:(DBN) [151, 152] , Deep Boltzmann Machine: (DBM) [153] ; and Deep Energy models: (DEM) [154] . Although RBMs does not promise better performance than CNNs related to computer vision tasks, however the developments and variants which adopt RBMs as building blocks leads to better performance. • Autoencoder: An autoencoder (AE, also called Autoassociator) is a typical unsupervised learning algorithm of artificial neural network that learns a non-linear mapping function between data and its feature space. AE is used for learning efficient encodings [155] mainly for the purpose of dimensional reduction [156] . The output vectors dimensionality is identical to input vector because an autoencoder is trained to rebuild its own input X than to predict some target value Y. • Recurrent Neural Network: The basic essence of RNNs [157] is to consider the influence of past information for generating the output. LSTM, BiLSTM, and GRU are prominent and powerful RNN models that are efficient for time-dependent in time-series data. • Sparse coding: Sparse coding [135, 158] is powerful and an effective data representation method. Its representations are considered as a strong technique for modelling high dimensional data. Due to applications of sparse coding in image classification , image restoration, image retrieval, image denoising, image clustering, recognition tasks, image and video processing and much more it has attracted considerable attention in recent years. • Transfer Learning: Transfer learning aims to provide a framework to utilize previously-acquired knowledge to solve new but similar problems much more quickly and effectively [159] . Taking advantage of a pre-trained model (CNN) on a huge database to help with the learning of the target task (prognosis of COVID-19) that has limited training data. • Random Forest algorithms [7, 8] , Ensemble techniques (ensemble-stacking, bagging, boosting), GANs are other prominent deep learning techniques which have been adopted in different segments for COVID-19 prognosis. Although the world was rocked by the COVID-19 pandemic, but researchers rose to all manner of challenges to tackle this pandemic by adopting the shreds of evidence of ML and AI in previous epidemics to develop novel models, methods, and strategies. Enthusiastic researchers left no stone unturned in their search for new strategies, ideas, techniques, and system models to combat COVID-19. But due to dynamic behaviour and mutations of SARS-CoV-2 different variants have emerged and more are expected to occur over time. Moreover, asymptomatic cases and persons with newly found syndromes like joint pain, gastrointestinal symptoms, eyeball stress etc. left medical practitioners baffled. Because of the development of new strains due to mutations in the virus, the predominance of organ involvement can change. So this makes it more challenging for researchers to combat COVID-19. Despite the fact that ML, AI, and deep learning has shown promising results and significantly improved the performance metrics for COVID-19 prognosis but the research community of medical experts is yet to recognize and label the benchmark of the deep learning framework for effective detection of COVID-19 positive cases from radiology imagery. So there is need to enhance the architecture of learning models which can provide flexibility towards newly variants to be detected and can boost the performance metrics as well. The pipeline of such general architecture which can be adopted for COVID-19 prognosis has been proposed and shown in figure 5 . The above architecture illustrates the general approach to be adopted for COVID-19 prognosis, however the techniques, tools and methods may vary from one domain to another. Preprocessing involves noise filtering, region of interest, resizing, and conversion from grey to RGB or viceversa. Normalization helps to learn faster and improves convergence stability. Although methods like rotation, flipping, scaling, and Gaussian noise helps in data augmentation to increase the volume of data but still learning models need more data to get trained efficiently. Hence, transfer learning technique takes advantage of a pre-trained model on a huge database to help with the learning of the target task that has limited training data. Layers of adopted learning model also needs to be adjusted with proper activation functions, loss functions and regularization techniques for optimized results. Lastly in evaluation phase, detected part needs to be classified, segmented and analysed for better prognosis. In this portion, we survey some recent works that leveraged deep learning methods particularly Convolutional neural network and its types to achieve state-of-the-art performance in diverse tasks to combat COVID-19 such as CT Scan/X-ray image classification (SARS, MERS, COVID-19), detection and recognition of coronavirus from imagery (CT or Xray), instance segmentation of infected part in medical images, forecasting, prediction and contact tracing to name a few. The deep learning techniques adopts different types of visual radiography images for training and testing purpose, hence all are categorized as supervised learning class. Much valuable state-of-the-art of machine learning and artificial intelligence with novel models and algorithms were proposed by different researchers which show promising results in many applications regarding COVID-19 [160] and augmented the researchers on multiple angles to fight against the novel coronavirus outbreak. In earlier stages of pandemic (COVID-19) [3, [161] [162] [163] [164] [165] [166] [167] contribute a lot more about the earlier insight of COVID-19, however with time and deeper insight of coronavirus like its symptoms, possible anatomy, behavior, more visual chest CT/X-ray imagery of COVID infected patients, etc novel algorithms and models were introduced by noble researchers. Using AI techniques, [14] distinguished the COVID-19 from Non-COVID pneumonia by adopting 10 well known CNNs including AlexNet, SqueezeNet, GoogleNet, VGG-19, VGG-16, ResNet-101, ResNet-50, ResNet-18, MobileNet-V2, and Xception. Among all networks, the highest performance in classifying or categorizing COVID-19 and Non-COVID-19 infections was obtained in Xception but it didn't have the best sensitivity while ResNet-101 has the highest sensitivity although lower specificity. [13] introduced a model called Dark-CovidNet adopting DarkNet and YOLO models for automated detection of COVID-19 from X-ray images of chest using deep neural network. They provide binary classification (COVID vs. No findings) with 98.08% accuracy and multi-class classification (COVID vs. No finding vs. pneumonia) with 87.02% accuracy. Lin li et. al. [168] developed a 3-Dimensional deep learning framework called COVNet consists of ResNet-50 as backbone for the detection of COVID-19 from CT scans of chest. It extracts both 3D global and 2D local representative features effectively from CT scans of the chest to differentiate COVID-19 and Community-acquired-pneumonia (CAP) with high sensitivity of 90% and specificity 96%. Based on SqueezeNet, a light CNN design was proposed by [12] for the efficient detection of COVID-19 from CT scan images. Amongst a pool of deep CNNs (MobileNet, Xception, DenseNet, ResNet, Inception V3, InceptionResNet v2, NASNet, VGGNet) and different machine learning algorithms, the best combination of feature extractor and high-performance metric learners were obtained for the computer-aided prognosis of COVID-19 Pneumonia in [169] . From their work, DenseNet121 feature extractor equipped with bagging tree classifier gains a high performance with 99% accuracy in classification followed by a hybrid of the ResNet50 feature extractor trained by LightGBM with an accuracy metric of 98% on their dataset. On chest X-ray imagery, [170] proposed 5 pre-trained CNN based models (ResNet152, ResNet101, ResNet50, Inception V3 and Inception-ResNet V2) for effective detection of coronavirus pneumonia patient. According to their work, the ResNet50 model achieves the highest performance in classification on different datasets. Integrated stacking InstaCovNet-19 model was proposed by [171] , utilized different pre-trained models like Xception v3, ResNet101, etc to compensate for a relatively small amount of training data. They achieved binary classification (COVID-19, Non-Covid) with 99.53% accuracy and 3-class (COVID-19, normal, pneumonia) classification with 99.08% accuracy. Inspired by AlexNet, a deep Bayes-squeezeNet decision-making system is used as backbone model by [172] for the COVID-19 prognosis from X-ray images. Bayesian optimization augmented dataset and fine-tuned hyperparameters boost the network performance in diagnosis and outperform its competitors. CoroNet, a deep CNN model was proposed by [173] which without any human intervention detect COVID-19 viral infection from chest X-ray images. CoroNet adopts Xception architecture which is pre-trained on the ImageNet dataset and gains an overall accuracy of 89.6% for 4-class classification (covid vs. bacterial pneumonia vs. viral pneumonia vs. Normal) and 95% accuracy for 3-class classification (covid vs Normal vs pneumonia). Making use of a lightweight residual projection-expansion-projection-extension (PEPX) design pattern, [174] proposed a diverse architecture called COVID-Net which enhanced representational capacity while reducing computational complexity to generate a network for the prognosis of COVID-19 cases from chest X-ray images. Inspired by capsule networks, a new framework called COVID-CAPS was proposed by [175] which overcomes the problem of transformations of X-ray imagery of chest to preserve spatial information for effective identification of COVID-19 cases and achieved an accuracy of 95.7%, the specificity of 95.8% and sensitivity of 90% respectively. Evaluation of state-of-the-art CNN architectures was carried out in [176] using transfer learning [177] and achieved remarkable results on small datasets as well with an accuracy of 96.78%, the sensitivity of 98.66% and specificity of 96.46% respectively. Likewise [10] proposed two architectures (i)mAlexNet and (ii) hybrid of mAlexNet and Bidirectional Long Short Term Memories (BiLSTM). They contribute to performing ANN-based automatic lung segmentation [178] from x-ray images to obtain robust features and then for early detection of COVID-19 infection they developed a CNN-based transfer learning-BiLSTM network which achieved 98.7% accuracy. Adopting transfer learning technique and image augmentation for the training and validation of various pre-trained deep CNNs were utilized by [179] to introduce a robust technique for the automatic prognosis of COVID-19 infection. For 2-class classification (COVID and Normal) and 3-class classification (Normal, COVID-19 Pneumonia, and Viral) they achieve classification accuracy, precision, sensitivity and specificity as 99.7%, 99.7%, 99.7%, 99.55% and 97.9%, 97.95%, 97.9%, 98.8% respectively. A deep learning framework called COVID-ResNet was proposed by [180] which utilizes state-of-the-art training techniques like progressive resizing, discriminative learning rates, and cyclical learning rate finding to train accurately, quickly residual neural networks, and fine-tuned the pre-trained ResNet-50 model for better performance metrics to reduce training time. Newly deep learning framework called COVIDx-Net was proposed by [181] which includes seven different architectures of deep CNN models (including DenseNet, MobileNet, VGG19). Amongst all VGG19 and DenseNet perform better for the classification of COVID-19 with 89% and 91% accuracy on their dataset. A promising attempt of [16] by proposing OxfordNet network-based faster regions with CNN ( COVID Faster R-CNN framework to diagnose novel coronavirus disease from X-ray chest imagery. They achieved an accuracy of 97.36%, a sensitivity of 97.65%, and a precision of 99.28%. A deep CNN model called CVDNet adopting residual neural network was proposed by [182] to classify COVID-19 from healthy and other pneumonia infected cases using X-ray images of the chest. Although the dataset used was small but still, achieves an accuracy of 97.20% for the prognosis of COVID-19, and for 3-class classification (Covid vs. Viral Pneumonia vs. Normal) it achieves 96.69% accuracy. An ensemble model by fusing StackNet metamodeling and deep CNN called CovStackNet was proposed by [183] for fast prognosis of COVID-19 using chest X-ray images. CovStackNet reached an accuracy score of 98%. Although a lot more has been suggested, proposed, and implemented regarding COVID-19 as shown in table 1 1 , however, there still exists a huge space and gap for researchers to work on so that state-of-the-art deep learning models could be officially recognized by the research community of medical expertise for COVID-19 positive case detection from radiology images. Although Ramdesiver, Lopinavir, Ritonavir, oseltanivir, and Pfizer significantly blocked the COVID-19 infection but researchers are continuously and tirelessly working to develop efficient therapeutic strategies and trials to cope with and complete deportation of human coronavirus. Apart from the recent developments and promising performance deep CNNs has achieved in prognosis COVID-19, the research literature indicates several vital challenges and problems to be explored, some of which are described below. • Theoretical understanding is lagging to define which architecture should perform better than others. The various architectures are defined by the context from one study to another and so it is a challenge to define general architecture from one domain to another in terms of parameters of each layer. • It is difficult to inform the choice of architecture with the statistics of the data. • Computational expense to train the models. • Training a noise-tolerant model remains a challenge. • Determination of hyperparameters and fine-tuning the model is a big challenge. Retraining models to track changes in data distribution is challenging. • Scaling computations, reducing the overhead of optimizing parameters, avoiding expensive inference and sampling. • In the absence of labeled data, selecting features in an unsupervised way is a very challenging, although essential task. • Constructing optimum codebooks and low-complexity high-accuracy coding and decoding algorithms is demanding. • Rescaling of input images to certain dimensions is needed when pre-trained networks are employed as feature extractors which may discard valuable information. • Obtaining and maintaining large-scale datasets to improve the efficiency of learning models is a challenge. Fewer data leads to over-fitting and more data leads to time complexity to train the model infeasible amount of time. • Datasets not only are essential for a fair comparison of different algorithms but also bring more challenges and complexity through their expansion and improvement. • Handling high dimensional datasets is a challenging task. • Big data and data modalities bring technical challenges. • Limited availability of COVID-19 imagery datasets restricts the performance metrics of most of the AI/ML-based models. Some of the most common and open source datasets compiled by research work [184] [185] [186] included in this paper are listed in table 2. • Although data augmentation helps to develop more training data but developing generalizable augmentation policies for data augmentation is still a challenge. • Mostly chest X-ray radiography is used for COVID-19 research purposes due to its low cost, low radiation, and wider accessibility, however, it is less sensitive than CT scan imagery. CT scan, magnetic resonance imaging (MRI), and ultrasound imaging need to be explored to make a well-annotated dataset. • Selecting a worthy classifier, resampling algorithm, feature representation, fusion strategies, labels, etc. is a big challenge for a particular task like classification or detection. • Large-scale classification is hard and obtaining class-labels are expensive. • Achieving efficiency in both feature extraction and classifier training without compromising performance is a challenge. • Training of traditional models over large-scale datasets like large ImageNet classification is difficult and leads to poor and slow convergence. • Fine-grained image classification is a big challenge. • Image classification of non-rigid instances is also a challenge. • Intra-class variation in appearance, computational complexity with a huge number of object categories, lighting backgrounds, deformation, non-rigid instances, large range of scales, and aspect ratios are some of the main challenges in object detection. • Weakly supervised, multi-domain and 3D object detection are still challenging problems in the detection domain. • Adopting specific pre-processing strategies with or without GPU acceleration can be used to increase performance metrics. • Apart from X-ray and CT scan imagery, magnetic resonance imaging (MRI) and ultrasound imaging can be explored to make potential usage of novel approaches of deep CNNs in the prediction and diagnosis of COVID-19 infection. • If more laboratory parameters are collected, ML algorithms may extract more valuable candidate makers to identify COVID-19. • Age, gender, genetics, medical history of patients (chronic liver, kidney, diabetes, etc), geolocation data can be utilized to boost performance metrics of diagnosis, classification, detection, prediction, spread, and forecasting of COVID-19. • To detect and trace transmissibility from asymptomatic individuals of COVID-19 is still a challenge. • Although most of the models are up to the mark to tackle the pandemic, but still there is no recognition from the research association of medical proficiency for COVID-19 positive case detection from radiology imagery using a deep learning framework. This paper presents the deeper insight of deep Convolutional neural networks with their latest developments and adopts a categorization scheme to analyze the existing literature. Some widely-used techniques and variants of CNN are investigated. More specifically, stateof-the-art approaches of Convolutional Neural Network are analyzed, expounded, and illustrated in detail because it is the most extensively utilized technique for radiographic visual imagery applications. Selected applications of deep CNN to COVID-19 are also highlighted. Its applications related to COVID-19 like image classification based on different classes of Pneumonia, recognition, and detection of different classes of coronavirus', instance segmentation of the region of interest (infected lung) are also explored. The pipeline of the general architecture for COVID-19 prognosis has also been proposed and illustrated. Finally, some important challenges, limitations, and outlooks are also discussed for better design, modeling, and training of learning modules along with several directions that may be further explored in the future. J o u r n a l P r e -p r o o f COVIDx dataset [174] has been used. COVIDx dataset [174] has been used. [180] 27 J o u r n a l P r e -p r o o f [189] X-ray and CT scan COVID-19 data set augmentation https://data.mendeley.com/datasets/8h65ywd2jr/3 [190] Ultrasound COVID-19 diagnosis https://tinyurl.com/yckfqrcg [191] Segmented CT scans COVID-19 infected area segmentation NA [192] Segmented CT scans COVID-19 infected area segmen- Negative results in computer vision: A perspective Deep learning for visual understanding: A review World health organization updates: Coronavirus disease (covid-19) -events as they happen Covid-19 future forecasting using supervised machine learning models Deep learning approaches for covid-19 detection based on chest x-ray images Combination of four clinical indicators predicts the severe/critical symptom of patients infected covid-19 Role of intelligent computing in covid-19 prognosis: A stateof-the-art review Rapid and accurate identification of covid-19 infection through machine learning based on clinical available blood test results Machine learning based approaches for detecting covid-19 using clinical text data Cnn-based transfer learning-bilstm network: A novel approach for covid-19 infection detection An interpretable mortality prediction model for covid-19 patients A light cnn for detecting covid-19 from ct scans of the chest Automated detection of covid-19 cases using deep neural networks with x-ray images Application of deep learning technique to manage covid-19 in routine clinical practice using ct images: Results of 10 convolutional neural networks Cvdnet: A novel deep learning architecture for detection of coronavirus (covid-19) from chest x-ray images Covid faster r-cnn: A novel framework to diagnose novel coronavirus disease (covid-19) in x-ray images Receptive fields of single neurones in the cat's striate cortex Handwritten zip code recognition with multilayer networks Backpropagation applied to handwritten zip code recognition Gradient-based learning applied to document recognition Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position A novel hybrid cnn-svm classifier for recognizing handwritten digits Very deep convolutional networks for large-scale image recognition Going deeper with convolutions Imagenet large scale visual recognition challenge Imagenet classification with deep convolutional neural networks The history began from alexnet: A comprehensive survey on deep learning approaches Rich feature hierarchies for accurate object detection and semantic segmentation Largescale video classification with convolutional neural networks Learning a deep convolutional network for image super-resolution Learning a deep compact image representation for visual tracking Deeppose: Human pose estimation via deep neural networks Fully convolutional networks for semantic segmentation Faster r-cnn: Towards real-time object detection with region proposal networks Visualizing and understanding convolutional networks Deep residual learning for image recognition Rethinking the inception architecture for computer vision Squeeze-and-excitation networks Understanding deep architectures using a recursive convolutional network What is the best multistage architecture for object recognition? Stacked convolutional autoencoders for hierarchical feature extraction Empirical evaluation of convolutional rbms for vision Convolutional deep belief networks on cifar-10 Fast training of convolutional networks through ffts cudnn: Efficient primitives for deep learning Fast convolutional nets with fbfft: A gpu performance evaluation Cuimage: A neverending learning platform on a convolutional knowledge graph of billion web images Learning invariant feature hierarchies Saturating auto-encoders Hierarchical convolutional deep learning in computer vision Tiled convolutional neural networks Reseg: A recurrent neural network for object segmentation Deconvolutional networks Adaptive deconvolutional networks for mid and high level feature learning Fully convolutional networks for semantic segmentation Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks Augmenting supervised neural networks with unsupervised objectives for large-scale image classification Image super-resolution using deep convolutional networks Learning deconvolution network for semantic segmentation Learning deep features for discriminative localization Multi-scale context aggregation by dilated convolutions Wavenet: A generative model for raw audio Dense prediction on sequences with time-dilated convolutions for speech recognition Neural machine translation in linear time Network in network Scalable object detection using deep neural networks Recent advances in convolutional neural networks A theoretical analysis of feature pooling in visual recognition Evaluation of pooling operations in convolutional architectures for object recognition High-performance neural networks for visual object classification Complex cell pooling and the statistics of natural images Signal recovery from pooling representations Learned-norm pooling for deep feedforward and recurrent neural networks Stochastic pooling for regularization of deep convolutional neural networks Spectral representations for convolutional neural networks Hartley spectral pooling for deep learning Spatial pyramid pooling in deep convolutional networks for visual recognition Distinctive image features from scale-invariant keypoints The devil is in the details: an evaluation of recent feature encoding methods Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection Learned deformation stability in convolutional neural networks Overfitting remedy by sparsifying regularization on fully-connected layers of cnns Learning and transferring mid-level image representations using convolutional neural networks Rectified linear units improve restricted boltzmann machines Searching for activation functions Rectified linear units improve restricted boltzmann machines Deep learning using rectified linear units (relu) Mastering the game of go with deep neural networks and tree search Rectifier nonlinearities improve neural network acoustic models On rectified linear units for speech processing Dying relu and initialization: Theory and numerical examples Delving deep into rectifiers: Surpassing human-level performance on imagenet classification Empirical evaluation of rectified activations in convolutional network On a generalisation of uniform distribution and its properties Uniform distributions on curves and optimal quantization Dropout: a simple way to prevent neural networks from overfitting Fast and accurate deep network learning by exponential linear units (elus) Parametric exponential linear unit for deep convolutional neural networks Maxout networks Improving deep neural networks with probabilistic maxout units Deep learning for minimum mean-square error approaches to speech enhancement The nature of statistical learning theory An overview of statistical learning theory Solving large scale linear prediction problems using stochastic gradient descent algorithms Deep learning using linear support vector machines The mnist database of handwritten digit images for machine learning research Robust one-class support vector machine with rescaled hinge loss function Extracting and composing robust features with denoising autoencoders Dual autoencoders features for imbalance classification problem Rodeo: robust de-aliasing autoencoder for real-time medical image reconstruction Generative adversarial nets Generative adversarial text to image synthesis Deep generative image models using a laplacian pyramid of adversarial networks Large-margin softmax loss for convolutional neural networks Signature verification using a" siamese" time delay neural network Learning a similarity metric discriminatively, with application to face verification Dimensionality reduction by learning an invariant mapping Learning by coincidence: Siamese networks and common variable learning Deephash for image instance retrieval: Getting regularization, depth and fine-tuning right Facenet: A unified embedding for face recognition and clustering Deep relative distance learning: Tell the difference between similar vehicles A new loss function for cnn classifier based on predefined evenly-distributed class centroids L p-norm regularization algorithms for optimization over permutation matrices Modified lp-norm regularization minimization for sparse signal recovery Improving neural networks by preventing co-adaptation of feature detectors Understanding dropout Fast dropout training Adaptive dropout for training deep neural networks A pac-bayesian tutorial with a dropout bound Dropout training as adaptive regularization Dropout: a simple way to prevent neural networks from overfitting An empirical analysis of dropout in piecewise linear networks Efficient object localization using convolutional networks Shakeout: A new approach to regularized deep neural network training Sparse coding with an overcomplete basis set: A strategy employed by v1? A survey on image data augmentation for deep learning Mirror, mirror on the wall, tell me, is the error small? Deep convolutional neural networks and data augmentation for environmental sound classification Holistically-nested edge detection Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture Transformation pursuit for image classification Unsupervised feature learning by augmenting single images Hyper-class augmented and regularized deep learning for fine-grained image classification Augmenting strong supervision using web data for fine-grained categorization Deep learning in neural networks: An overview Learning and relearning in boltzmann machines Training products of experts by minimizing contrastive divergence A learning algorithm for boltzmann machines An overview on restricted boltzmann machines An overview of restricted boltzmann machines Reducing the dimensionality of data with neural networks Greedy layer-wise training of deep networks Deep boltzmann machines Learning deep energy models Autoencoder for words Autoencoders, minimum description length and helmholtz free energy A critical review of recurrent neural networks for sequence learning Efficient sparse coding algorithms Transfer learning using computational intelligence: A survey Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: A review Radiological society of north america: Rsna pneumonia detection challenge Deep learning enables accurate diagnosis of novel coronavirus (covid-19) with ct images A deep learning algorithm using ct images to screen for corona virus disease A deep learning system to screen novel coronavirus disease 2019 pneumonia Deep learning-based detection for covid-19 from chest ct using weak label A critic evaluation of methods for covid-19 automatic detection from x-ray images Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal Using artificial intelligence to detect covid-19 and community-acquired pneumonia based on pulmonary ct: evaluation of the diagnostic accuracy Automatic detection of coronavirus disease (covid-19) in x-ray and ct images: A machine learning-based approach Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Instacovnet-19: A deep learning classification model for the detection of covid-19 patients using chest x-ray Covidiagnosis-net: Deep bayes-squeezenet based diagnostic of the coronavirus disease 2019 (covid-19) from x-ray images Coronet: A deep neural network for detection and diagnosis of covid-19 from chest x-ray images Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks What makes imagenet good for transfer learning? Enhancement and segmentation of lung ct images for efficient identification of cancerous cells Can ai help in screening viral and covid-19 pneumonia? Covid-resnet: A deep learning framework for screening of covid19 from radiographs Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images Cvdnet: A novel deep learning architecture for detection of coronavirus (covid-19) from chest x-ray images Automated detection of covid-19 using ensemble of transfer learning with deep convolutional neural network based on ct scans Covid-19 open source data sets: A comprehensive survey Covid-19 datasets: A survey and future challenges Covid-19 image data collection: Prospective predictions are the future Covid-ct-dataset: a ct scan dataset about covid-19 Covid-19 ct lung and infection segmentation dataset El-shafai, walid; abd el-samie, fathi (2020), "extensive and augmented covid-19 xray and ct chest images dataset Pocovid-net: automatic detection of covid-19 from a new lung ultrasound imaging dataset (pocus) Lung infection quantification of covid-19 in ct images with deep learning Covid-19 ct segmentation dataset