key: cord-0829670-5ip3yzm2
authors: Hu, Kai; Huang, Yingjie; Huang, Wei; Tan, Hui; Chen, Zhineng; Zhong, Zheng; Li, Xuanya; Zhang, Yuan; Gao, Xieping
title: Deep Supervised Learning Using Self-Adaptive Auxiliary Loss for COVID-19 Diagnosis from Imbalanced CT Images
date: 2021-06-07
journal: Neurocomputing
DOI: 10.1016/j.neucom.2021.06.012
sha: f91bad5cb8267f07e5ceb2d7babc49c1223d98b0
doc_id: 829670
cord_uid: 5ip3yzm2

The outbreak and rapid spread of coronavirus disease 2019 (COVID-19) has had a huge impact on the lives and safety of people around the world. Chest CT is considered an effective tool for the diagnosis and follow-up of COVID-19. For faster examination, automatic COVID-19 diagnostic techniques using deep learning on CT images have received increasing attention. However, the number and category of existing datasets for COVID-19 diagnosis that can be used for training are limited, and the number of initial COVID-19 samples is much smaller than the normal’s, which leads to the problem of class imbalance. It makes the classification algorithms difficult to learn the discriminative boundaries since the data of some classes are rich while others are scarce. Therefore, training robust deep neural networks with imbalanced data is a fundamental challenging but important task in the diagnosis of COVID-19. In this paper, we create a challenging clinical dataset (named COVID19-Diag) with category diversity and propose a novel imbalanced data classification method using deep supervised learning with a self-adaptive auxiliary loss (DSN-SAAL) for COVID-19 diagnosis. The loss function considers both the effects of data overlap between CT slices and possible noisy labels in clinical datasets on a multi-scale, deep supervised network framework by integrating the effective number of samples and a weighting regularization item. The learning process jointly and automatically optimizes all parameters over the deep supervised network, making our model generally applicable to a wide range of datasets. Extensive experiments are conducted on COVID19-Diag and three public COVID-19 diagnosis datasets. The results show that our DSN-SAAL outperforms the state-of-the-art methods and is effective for the diagnosis of COVID-19 in varying degrees of data imbalance.

The coronavirus disease 2019 (COVID- 19) is spreading all over the world and is a serious threat to human life and health. By 2 May 2021, the cumulative number of confirmed cases around the world was close to 153 million, with nearly 3.2 million deaths. Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test is considered as the gold standard of confirming COVID-19 patients, which needs 4-6 hours to obtain the results and tends to be inadequate in many areas where the disease is severe [1] . In clinical diagnosis, as easily available imaging equipment, chest CT provides huge assistance to clinicians when characteristic manifestations such as ground glass opacity (GGO) or bilateral patchy shadows in CT scans were observed [2] . However, the rapidly increasing demand for medical imaging reading has brought a heavy burden to clinicians. Meanwhile, due to the complexity of medical imaging, the long and tedious reading of medical imaging may cause misinterpretation and misjudgment to clinicians.

In recent years, the explosion of all kinds of data has made convolutional neural networks (CNNs) achieve great success in many fields such as computer vision [3, 4] . Similarly, CNNs have been shown to be effective in assisting in the diagnosis of COVID-19. However, due to the fact that CNNs is a data-driven approach, this means high demand for data. On the one hand, the number and size of datasets available for COVID-19 diagnosis are limited. Moreover, most datasets are used to distinguish COVID-19 from non-COVID-19, that is, the diversity of classes is limited, so the trained models do not have good migration ability. On the other hand, in clinical practice, like most medical image datasets, the original COVID-19 samples are much smaller than the normal ones, so there are different degrees of imbalance between target and non-target samples. The problem of data imbalance means that some classes in the training set have much more samples than others [5] , which makes it challenging to build a well-performed classifier. The classification algorithms are often forced to be biased towards the majority classes and neglect the minority classes, resulting in low classification accuracy. Therefore, it is essential to study the diagnosis of COVID-19 based on imbalanced data.

In general, two categories of approaches have been proposed to tackle the data imbalance problem, i.e., data-level methods and algorithm-level methods [5] . To make the distribution balanced in the data-level, the prior distribution is either modified by under-sampling the majority classes, or over-sampling the minority classes, or a combination of both [6] . It is well known that under-sampling might discard useful information, but medical image datasets are usually smaller and under-sampling side effects are more obvious, so over-sampling is preferred in most methods. For example, Xu et al. expanded the number of minority classes by 3 times to balance the majority class so as to weaken the influence of the imbalance of different image types [7] . Gozes et al. used image rotations, horizontal flips, and cropping to overcome the limited numbers of COVID-19 cases [8] . However, over-sampling often makes the training procedure computationally burdensome by increasing the size of training data. Besides, simple forms of oversampling such as random replication only increase the number of images without increasing the diversity of features, and the models are susceptible to overfitting when using over-sampling. Thus, this paper focuses on the algorithm-level method. While most existing algorithm-level approaches attempt to affect the loss functions by considering more prior information for the minority class, including the class-to-class separability [9] and the sample distribution of the raw datasets. However, the learning of the prior distribution of data for different classes is not automatic and the model parameters need to be adjusted manually. Furthermore, data overlap due to similarity between CT slices and possible mislabeling by clinicians may further degrade model performance.

To address the above problems, in this paper we construct a challenging COVID-19 diagnosis clinical dataset (named COVID19-Diag), and propose a novel Deep Supervised Network with a Self-Adaptive Auxiliary Loss (DSN-SAAL) for COVID-19 diagnosis with imbalanced CT images. Our COVID19-Diag is collected from different models of a hospital, including three categories of normal, bacterial pneumonia, and COVID-19. Although bacterial pneumonia and COVID- 19 have their own characteristics in image performance, they also have a large part of overlapping characteristics. Therefore, our dataset has a certain diversity, which is more challenging than the binary dataset. Moreover, the data scale is relatively large, which is reflected in the number of cases and the number of CT slices. Our DSN-SAAL is used as a deep learning method to solve the problem of COVID-19 diagnosis with imbalanced CT images. With several new proposed techniques, DSN-SAAL can effectively learn the features for the samples of both majority classes and minority classes. Specifically, we first present a novel deep supervised learning framework for multi-scale feature learning for image classification. Then, we introduce an effective self-adaptive auxiliary loss for automatically learning the data distributions of both majority and minority classes. The loss function considers the data overlap by measuring the effective number of samples for reweighting the cross entropy (CE) loss. Meanwhile, a reverse cross entropy (RCE) as a regularization term is further proposed to handle incorrect labels. DSN-SAAL is self-adaptive since all model parameters are learned through the network automatically, and thus can be applied to various datasets. To verify the effectiveness of our proposed method, we conduct extensive experiments on our COVID19-Diag and three public COVID-19 diagnosis datasets. The experimental results demonstrate that our method outperforms the state-of-the-art approaches on both balanced and imbalanced datasets and is effective for the diagnosis of COVID-19. Overall, the key contributions of this paper can be summarized as follows:

• We propose a novel deep supervised learning model with self-adaptive auxiliary loss, called DSN-SAAL, for the diagnosis of COVID-19 based on CT scans with the problem of data imbalance.

• We design a self-adaptive auxiliary loss, which considers both data overlap between CT slices and possible noisy labels during data collection for imbalanced data classification.

• We create a new COVID-19 diagnosis dataset (named COVID19-Diag) consisting of 6982 CT slices from 225 clinical cases in three categories:

COVID-19, normal, and bacterial pneumonia. Extensive experiments are conducted on this dataset to verify the effectiveness of our DSN-SAAL in varying degrees of imbalance.

• We evaluate DSN-SAAL on three additional publicly available COVID-19 datasets. The results show that DSN-SAAL outperforms the stateof-the-art methods and can achieve significant performance on generalization ability.

There has been plenty of studies on the diagnosis of COVID-19 on medical images (such as CT scans) using traditional machine learning methods or deep learning methods since the outbreak of COVID-19. Shi et al. first used a VB-Net to segment COVID-19 infection regions and then trained a random forest model with some hand-crafted features for the diagnosis [10] . He et al. combined self-supervised deep learning with transfer learning and proposed a Self-Trans approach, which can achieve high diagnosis accuracy of COVID-19 with limited training data [11] . Ying et al. proposed a deep learningbased CT diagnosis system called DeepPneumonia to identify patients with COVID-19 [12] . Li et al. proposed the Transfer-CheXNet which used the pre-trained network CheXNet for the COVID-19 classification task to better help the parameter learning of small and medium-sized datasets in the target task [13] . Gunraj et al. introduced a deep convolutional neural network architecture named COVIDNet-CT, which explored a machine-driven method for the diagnosis of COVID-19 from CT images [14] . Besides, there are some methods to diagnose COVID-19 by deep learning based on the extraction of regions of interest (ROIs). For example, Chen et al. trained a UNet++ to segment COVID-19 related lesions to help the diagnosis of COVID-19 [15] . Jin et al. proposed a CNN to segment the lung and then identify slices of COVID-19 cases [16] . In [7] , a deep learning model based on V-Net is firstly designed to segment the infection regions, and a ResNet-18 network is then used to diagnose COVID-19.

Three dimensional (3D) models have also been applied to address the diagnosis of COVID-19. Considering the high cost of manual labeling of COVID-19, Wang et al. proposed a 3D Deep CNN (DeCoVNET) for the detection of weakly labeled COVID-19 [17] . However, the number of positive and negative samples used in these studies mentioned above are approximately equal, that is, the data is relatively balanced. Moreover, Wang et al. collected 1065 CT images, including 325 images of COVID-19 cases and 740 images of typical viral pneumonia, to train their deep learning model [18] . The raw data was somewhat imbalanced, but they used 160 images from both classes for training, which means the training data was absolutely balanced. The data compositions of part of the researches mentioned above are shown in Table 1 . Although these studies achieved good results in their own datasets, they did not take into account the impact of imbalanced data in the clinical diagnosis of COVID-19. In most cases, the number of open COVID-19 samples from CT scans are much less than that of normal samples. However, the COVID-19 characteristics obtained from the models trained with these imbalanced samples are often inadequate, which easily leads to a poor diagnosis of COVID-19. Moreover, the robustness of the model may not be good enough to be applied to other scenarios of COVID-19 diagnosis. Therefore, it is necessary to study the diagnosis of COVID-19 based on imbalanced data, which is more suitable for clinical practice.

Since data-level methods consume more additional resources and are not stable in execution, this paper focuses on algorithm-level methods. Among them, the design of the loss function is the main embodiment of algorithmlevel methods.

Generally, there are two main categories of loss functions for solving the data imbalance problem. The first one is a single form such as hinge loss, soft-max loss, Euclidean loss, and contrastive loss [20] . The other attempts to improve those methods, including class-balanced loss [21] , focal loss [22] , and cost-sensitive CE loss [9] . These methods adopt the weighted term based on CE [23] , which makes the decision boundary of the classifier be biased to minority classes.

Due to the limited performance of these loss functions in realizing the identifiability of feature space, recent studies have begun to explore better combinations of multiple loss functions to solve the data imbalance problem. Inspired by the symmetric KL-divergence, Wang et al. [24] proposed a symmetric cross entropy to address the under learning problem of minority classes and the overfitting problem of noisy labels. Zhang et al. [25] and Deng et al. [26] proposed the combination of CE and center loss or range loss to concurrently enforce intra-class compactness and inter-class separability.

Nevertheless, the existing combined loss functions are mostly limited to the adjustment of hyper-parameters and do not have good generalization because they cannot judge the degree of action between multiple items. Recently, Shu et al. proposed a new meta-learning method, called Meta-Weight-Net [27] , which adaptively extracts sample weights to ensure robust deep learning in the case of training data biases. In particular, considering that different depths of the network can learn diverse expressions of characteristics, the supervised role of loss functions should be various in different stages of the network, which is reflected in the extent of partiality for minority classes in solving the problem of data imbalance. Our approach considers these issues simultaneously and constructs an adaptive auxiliary loss in the deep supervised network to effectively combine the learning degree of different types of characteristics in each stage of the network, so as to promote the feature learning of minority classes in the network.

In this section, we present our method in detail, including the image preprocessing, the DSN-SAAL architecture, self-adaptive auxiliary loss, and the parameter optimization algorithm. The overall flowchart of the proposed method is shown in Fig. 1 . 

It is well known that the images obtained by different types of CT machines are different. For example, the CT scans in some cases do not have cylindrical scan boundaries. Besides, in the whole CT image, other parts except the lung parenchyma are not effective for the diagnosis of COVID-19. Therefore, we adopt some preprocessing operations before the training of our model. First of all, we adjust the window level (WL) and window width (WW) of all clinical CT images to be consistent (WL: -400, WW: 1500). Then, we normalize them linearly to [0, 255] to fit into the digital image format. After that, some morphological operations are used to extract the area of lung parenchyma as following. (1) We use binarization to obtain the binarized images with pixel value equals 170. (2) The unrelated regions are removed by erosion, dilation, floodfill, and other operations to obtain the largest connected regions. (3) We adopt the convex hull operation to create the mask of the lung parenchyma and then multiply them by the images before binarization to obtain the final regions of lung parenchyma, which are used as the input images for model training [28] .

In general, the shallow layers of CNNs contain more local information, while the deep layers contribute more abstract information. As the number of network layers increases, the influence of the shallow layers on the deep layer decreases gradually. Thus, in the process of image analysis using deep learning methods, the effective combination of shallow and deep information is conducive to the full learning of the features including minority and majority classes. To comprehensively extract the multi-scale feature information of an image, we propose a deep supervised network by considering several stages in the network architecture. Fig. 2 illustrates the proposed network, which is extended from the VGG-16 network [29] with five stages. Each stage contains several convolutional layers with a 3×3 filter. Different stages are connected by a 2×2 max pooling layer. For the richer convolutional features [30] , we add a 21×1×1 convolution operation after each intermediate convolutional layer, and fuse these features with a 1×1 convolution operation in each stage [31] . In our architecture, each convolutional layer is closely followed by batch normalization (BN), which can effectively accelerate the convergence of the networks and improve the stability of training. Finally, an adaptive average pooling layer is used to modify the size of the feature map in each stage to 7×7, and then the classification is performed by a fully connected layer. The sum of all six losses, including the final classification loss of the VGG-16 network, is set as the final loss under deep supervised learning.

In particular, given a sample vector x with class label y, where y ∈ {1, · · · , n, · · · , C} and C is the number of the classes, the loss of the archi-tecture can be represented by

where ℓ i is the loss of the i th auxiliary classifier, ℓ f inal is the final loss of the network, and N is the number of auxiliary losses. Since ℓ i and ℓ f inal are set as the same form, Equation (1) can be rewritten as

which is used to minimize the difference between the network prediction and the ground-truth. Its optimization objective is arg min

As shown in Equation (3), ℓ i can be set as a suitable surrogate loss, such as cross entropy. In this study, we design a self-adaptive auxiliary loss as ℓ (x, y).

A novel self-adaptive auxiliary loss is proposed to help the training with imbalanced data by introducing a self-adaptive factor, which reflects the feature distribution and emphasizes minority classes. It is measured as the ratio of the effective number of samples to the total number of samples with a weighting regularization item, which is used for addressing the problem of noisy labels. Both of them are based on the cross entropy, as introduced below.

The cross entropy for measuring the divergence between the output and ground-truth in the label space is computed by

Assuming {x, y} is an input-label pair, the ground-truth distribution over labels for sample x is in the term of q (n|x), and it satisfies C n=1 q (n|x) = 1.

The inputs are transformed into a feature space representation with models and produces a set of probabilities as the output p (n|x) via the Softmax function.

The cross entropy cannot solve the data imbalance problem in classification, since it treats minority classes in the same way as majority ones, resulting in the trained model biased towards majority classes. We further use the following techniques to address the problem.

We first leverage a novel theory proposed by [21] which argues that the sum of information from all samples in the dataset cannot be measured by the total number of samples. Particularly in medical images, such as CT scans, multiple images can be obtained after one CT scan of the same patient. On the one hand, because the slices are very similar to each other, especially in thin-slice scanning, where the slice thickness is only 1mm, there is a great deal of overlap between the features provided by each image. On the other hand, each slice cannot be easily discarded to avoid the loss of information. Thus, the data overlap can be measured by the effective number of samples as follows:

where α is the effective sample factor to measure the ratio of the effective number of samples, and k n is the number of samples in k th class. In fact, α is used to control the rate of increase of the effective number of samples when k n increases. In our model, α is updated in the network as a learnable parameter. After random initialization, α is activated and modified by Sigmoid function as an effective sample factor. The detailed process of parameter optimization is described in Section 3.4. As shown in Equation (5), there is asymptotic property that E (n) = 1 if α = 0, which means there is only one valid sample of the class and all other samples of the class provide the same features. And

The number of valid samples of the class is approximately equal to the total number of samples of the class, that is, the features provided by each sample are different and valid. We skip the detailed proof here, as it has been given in [21] . Different from previous methods weighting the loss by the inverse of the number of samples of the class, we set the weight as the inverse of E n , which can obtain better performance. Considering the effective number of samples, the formulation of loss can be defined by

In clinical datasets, mislabeling is a problem that can easily occur, such as labeling other pneumonia as COVID-19, which is mainly reflected in the fact that COVID-19 shares some common characteristics with other pneumonia caused by similar viruses. Meanwhile, in a group of CT slices, not every slice has the discriminative characteristics of COVID-19, which leads to a poor generalization of the models learned by mislabeling. In the context of noisy labels, some samples' labels are incorrect, namely q (n|x) does not represent the true class distribution. Instead, p (n|x) can reflect the true distribution to some extent. Inspired by the study in [24] , we utilize p (n|x) as the groundtruth and q (n|x) as the class probability of the outputs, referring as the reverse cross entropy. The reverse cross entropy for the sample x is computed as follows:

Same as the ℓ DCE , we use the inverse of E n as the weight by

When labels are one-hot, computational problems might exist as the distribution q (n|x) = 0. To solve this problem, we set log 0 = A, where A is a certain constant and it satisfies A < 0 [24] . This approach uses less bias into the model at finite number of points like q (n|x) = 0, but no bias at q (n|x) = 1.

Since ℓ DRCE and ℓ DCE play different roles in calculating the difference between the prediction and the ground-truth, a hyper-parameter β is used to balance them. Due to the difference of the ground-truth between ℓ DRCE and ℓ DCE , the sample size of each class corresponding to the ground-truth is also discrepant, which is the same as the effective number of samples. While the process of reconstructing the RCE label is not feasible, β can help to adjust it with the network accordingly.

Based on the above presentation, the proposed self-adaptive auxiliary loss can be formulated by combining Equations (7) and (9) as follows:

The first term on the right of Equation (10) is the cross entropy weighted by the effective number of samples, and the second term is the reverse cross entropy weighted by the effective sample number as the regularization term. α and β are the hyper-parameters, which can be learned automatically through the network. The proposed self-adaptive auxiliary loss can learn the data distributions of both majority and minority classes by considering the effective number of samples to reweight the cross entropy loss and introducing a reverse cross entropy as a regularization term to handle incorrect labels.

Our goal is to jointly learn the network weight θ and the hyper-parameters α and β. As shown in Equation (10), α and β represent the effective sample factor and weight of ℓ DRCE respectively. ℓ DRCE is based on cross entropy changing the position of p (n|x) and q (n|x) in form. Given the case of noise labels, the prediction p (n|x) of the network can reflect the distribution of original data more than the ground-truth q (n|x). Since the output in different stages have different levels of reflection to data distribution, we set up a set of weighted item β in the different stages. Different with β, the value of α is not related to the training process. It is the effective number of samples distribution reflecting the original data, so we only set it as a single parameter. For both types of parameters, we use the stochastic gradient descent with the back-propagation of error to update them. The whole iterative optimization process for the parameters is shown in Algorithm 1. 

θ, α, β ← θ * , α * , β *

end for 16: end for 17: return θ * , α * , β *

In the experiments, we transfer the parameters α and β to 1/(1 + e −α ) and 1/ 1 + e −β respectively, since they can increase the corresponding loss to a large value potentially. During the training of the network, the loss can make the training procedure unstable and lead to the non-convergence of the loss function. Therefore, we introduce the form of exponential function to compress the weight to [0, 1).

In this section, we describe the details of our COVID-19-Diag dataset and three publicly available datasets including the COVIDx-CT [14] , COVID19-CT [11] , and SARS-CoV-2 CT-scan [19] datasets, as well as evaluation metrics and implementation details.

In this study, we create a new COVID-19 dataset, named COVID19-Diag, which consists of 69 CT volumes of COVID-19, 95 CT volumes of normal cases, and 62 CT volumes of bacterial pneumonia from the First Hospital of Changsha. The CT volumes of all the cases are performed on a CT scanner as SIEMENS or GE MEDICAL SYSTEMS with 5mm of slice thickness and 512×512 of the matrix. We extract 1769, 3824, 1389 two dimensional (2D) CT axial slices from COVID-19, normal cases, and bacterial pneumonia respectively, in which the number of slices selected for each CT scan ranges from 2 to 54. A training set and a test set are randomly divided by cases with a ratio of 7 to 3. The size of the images are set to 1×224×224 to accommodate the input of our model. The statistics of the dataset are shown in Table 2 . It is worth mentioning that in the process of collecting and constructing our COVID19-Diag dataset, we do not deliberately enlarge the imbalance ratio of the number of samples in each category, but the problem of data imbalance has a great impact on the process of using data-driven deep learning methods to the diagnosis of COVID-19. In order to further verify the classification performance of our method for imbalanced data, we adopt different proportions of COVID-19 images for experiments, as shown in Section 5.3. Fig. 3 shows the samples of the three classes from the dataset. The images we collected come from various positions in CT scans of the lungs, and each section contains different sizes of lung regions. We can find that some images such as the last column in bacterial pneumonia and COVID-19 samples have focal areas but are not obvious. There are also similar features between bacterial pneumonia and COVID-19 such as the first and second column, both of which have GGO. Therefore, our dataset is challenging and representative for the diagnosis of COVID-19 and can be well used to verify the classification performance of the algorithm.

To verify the generalization ability of the proposed method in the diagnosis of COVID-19, three additional public challenging datasets are used. Among them, the first is the COVIDx-CT dataset [14] , which is derived from CT imaging data collected by the China National Center for Bioinformation comprising 104,009 images across 1,489 patient cases. The COVIDx-CT dataset contains 21395 COVID-19 (NCP) slices (12520 for training, 4529 for validation, and 4346 for testing), 36856 Common pneumonia (CP) slices (22061 for training, 7400 for validation, and 7395 for testing), and 45758 Normal slices (27201 for training, 9107 for validation, and 9450 for testing). The COVIDx-CT dataset is currently the largest dataset used for the diagnosis of COVID-19 with two-dimensional slices, which contains three categories and has a certain diversity.

The second is the COVID19-CT dataset [11] , which contains 349 COVID-19 images (191 for training, 60 for validation, and 98 for testing) and 397 non-COVID-19 images (234 for training, 58 for validation, and 105 for testing). The COVID19-CT dataset was collected from some articles about COVID-19 diagnosis on medRxiv and bioRxiv services. The data volume is relatively small, and there is a large difference between these images since they come from different sources. Therefore, this dataset has certain challenges and research value.

The last is the SARS-CoV-2 CT-scan dataset [19] , which is the clinical dataset from a hospital including 1252 for positive novel coronavirus infection and 1229 for patients non-infected, where 80% of images is used for training and the remaining is for validation. For a fair comparison, we apply five-fold cross-validation to report the results on the SARS-CoV-2 CT-scan dataset.

We use the overall classification accuracy (ACC), F1-score, and G-mean as the main evaluation metrics, where the F1-score and G-mean are important indexes to evaluate the problem of data imbalance. In addition, the area under the ROC curve (AUC), Sensitivity (SEN), Specificity (SPE), and Precision(PRE) are required to evaluate the corresponding datasets to be consistent with other methods. These metrics are defined by Equations (11)-(16), respectively.

P RE = T P T P + F P

where the Right and All represent the number of correctly classified samples and total samples, respectively. T P , T N , F P , and F N are corresponding to true positives, true negatives, false positives, and false negatives, respectively.

We use our proposed deep supervised network to learn the discriminative feature representations for the image classification task (see Fig. 2 for details). The backbone of the network is consists of a VGG-16 network with batch normalization layers. We initialize them using pre-trained models on ImageNet [32] , which is of great significance for the improvement of the convergence and performance of the model with the help of parameters trained under a large dataset [33] . Like the other layers of the network, random initialization of the weights and biases are adopted.

In the experiments, we construct a baseline CNN for comparison using the VGG-16 with batch normalization but without the adaptive auxiliary loss for network training. Similarly, we use the pre-trained model on Ima-geNet to initialize the weights and biases of the baseline, and cross entropy is adopted as the loss to help the network convergence. We calculate the confidence intervals, including the mean and standard deviation, when conducting experiments on our COVID19-Diag dataset. Each value is a comprehensive evaluation of the results of 10 experiments.

For the COVIDx-CT, COVID19-CT, and SARS-CoV-2 CT-scan datasets, because the size of the images is various and not suitable for the input of the network, we resize them to 3×224×224 using the bilinear interpolation, respectively. The output of the network depends on the number of categories in each dataset. Stochastic gradient descent (SGD) with a momentum 0.9 and weight decay 5 × 10 −4 are adopted as the optimizer. The learning rate is initialized as 0.001 and divided by 10 every 40 epochs (120 epochs in total). We implement our model based on PyTorch 1.3.0, and all experiments are performed on an NVIDIA GeForce RTX 2080Ti 11G.

In order to verify the performance of DSN-SAAL on our COVID19-Diag dataset, we compare our method with some common used classification models with CE including VGG-16 [29] , ResNet-50 [34] , DenseNet-169 [35] , MobileNet-V2 [36] , and ResNeXt-50 [37] . As seen from Table 3 , DSN-SAAL outperforms all competing methods in all evaluation metrics, and it improves the ACC, F1-score, and G-mean by 5.3%-8.4%, 7.6%-12.5%, and 5.2%-8.7%, respectively when compared with other models. The ROC curves of all models are shown in Fig. 4 . We can observe that the red curve of DSN-SAAL is clearly above all the other curves.

To evaluate the effectiveness of our DSN-SAAL, we have also reproduced three works (i.e., Self-Trans [11] , Transfer-CheXNet [13] , and Meta-Weight-Net [27] ) for automatic diagnosis of COVID-19, as shown in Table 3 . The codes for all of them are publicly available. For a fair comparison, the settings of the parameters remain the same in accordance with the relevant official codes. We only adjust the input of the model to single-channel images, so as to adapt to our COVID19-Diag dataset. As to the Meta-Weight-Net, we have made some modifications according to its main architecture and add some convolutional layers and sub-sampling layers to adapt to the 224×224 input. The size of the final output feature map is 7×7. From Table 3 , we can see that our DSN-SAAL achieves high performance in the diagnosis of COVID-19 and outperforms the competing methods in all metrics.

To better validate the role of each component of our model in the diagnosis of COVID-19, we design a set of ablation experiments, including a deep supervised network (DSN) and a self-adaptive auxiliary loss (SAAL) study.

We consider four commonly used CNNs including VGG-16, ResNet-50, DenseNet-169, and ResNeXt-50, which are staged or modular and suitable for joining the auxiliary supervised chain. The comparison results between baseline and DSN are shown in Table 4 . The results show that ACC, F1score, and G-mean are significantly improved compared with the baseline after the addition of auxiliary supervision chain in the network, while the number of parameters is only slightly increased, not more than 0.1 M (Flopscounter.pytorch 2 ). We believe that a slight increase in the number of parameters is acceptable compared to a large increase in the diagnosis result. We can conclude that the performance of deep neural networks can be effectively improved by adding auxiliary supervision.

Moreover, it can be seen that it is more beneficial to increase the depth supervision mechanism on the shallow and simple layer network than the deep layer network, because the complex network structure has the learning ability of different layer characteristics to a certain extent. The comparison results of our SAAL and some other loss functions are shown in Table 5 . From the results, we can observe that SAAL outperforms other loss functions by providing more accurate diagnosis performance on the COVID19-Diag dataset. Especially for the F1-score, our results increase even higher than other loss functions, showing the learning effect of SAAL on the features of each category. Furthermore, when DSN and SAAL are used in VGG-16, the ACC result is 0.920±0.004, F1-score is 0.873±0.007, and G-mean is 0.925±0.006, all of which are higher than either of them alone, reflecting the advantages of DSN-SAAL. In summary, our DSN-SAAL effectively integrates the shallow and deep information through the auxiliary supervision, which promotes the feature learning of the model. Besides, SAAL makes the model pay more attention to the learning of the minority classes, thus effectively solving the problem of data imbalance.

To further demonstrate the effectiveness of our model on the data imbalance problem, we increase the imbalance ratio of our COVID19-Diag dataset for comparison. Our goal is to get better results with a small number of COVID-19 samples, and thus we reduce the samples of the target class as COVID-19 to 25%, 10%, 5%, and 1% respectively, to evaluate our model against the baseline. The results are shown in Table 6 . In order to better evaluate the classification results of the target class and non-target classes, sensitivity and specificity are used. In particular, we record changes in sensitivity at different levels of deletion with the samples of COVID-19 compared to the standard distribution in brackets. We observe that all indicators decline when the sample size of the target class decreases. Especially for F1score and G-mean, which are the comprehensive indicators focusing on the learning of each class, show a large decline. However, our DSN-SAAL still has some advantages compared to baseline. With the increase of the imbalance ratio, the decline in sensitivity of DSN-SAAL is slower than that of baseline. Besides, the specificity is slightly improved. Fig. 5 shows the confusion matrix obtained by different degrees of imbalance ratios for the COVID-19 samples separately. Overall, the results show that our DSN-SAAL can effectively maintain the classification accuracy of minority classes without affecting the feature learning of majority classes.

The studies reported that the pulmonary abnormalities on COVID-19 CT scans include bilateral and subpleural GGO, bronchovascular thickening, air space consolidation, traction bronchiectasis, pleural effusion, and crazy paving appearance. However, there are some overlaps between the biological characteristics of COVID-19 and other pneumonia in CT slices, such as GGO, space consolidation, and frantic pavement, which are common findings of COVID-19 and bacterial pneumonia on CT images. Besides, for some patients with mild disease in the early stage, there is no obvious lesion area in the CT slices, which is easy to cause false negative in the diagnosis process and cannot effectively prevent the progression of the disease. These conditions will affect the diagnosis of COVID-19. Fig. 6 shows some samples that are diagnosed using our model. Figs. 6a and 6b are the slices where bacterial pneumonia samples are diagnosed as normal. It can be seen that there is no obvious lesion area, especially for samples like Fig. 6b , which is located at the top or bottom of the CT scans. The pulmonary parenchyma area itself is very small, but the lesion features cannot be ignored, which becomes a difficulty in model learning. The results also show that our model has higher sensitivity in the diagnosis of COVID-19, which also indicates that the model has fewer false negatives and is less likely to be misdiagnosed. Although the deep learning model can distinguish COVID-19 from bacterial pneumonia and normal cases to a certain extent, the model is limited by the diversity and class imbalance of the training data, and our work is trying to solve these problems.

As shown in Fig. 7 , the class activation mapping (CAM) [38] is used to visualize the attention regions on our COVID19-Diag dataset for VGG-16 Figure 6 : Examples of the diagnostic results obtained using our DSN-SAAL.

with CE and DSN-SAAL. This can be obtained by the convolutional layer at the end of the models. As seen from the first and fifth columns, the raw images of the lesion area are not obvious, VGG-16 can not accurately distinguish between pulmonary vessels and lesion areas, resulting in a large area of the red area covering the pulmonary area. While our method can notice them more accurately. As shown in the second and fourth columns, we find that our method can better separate the lung lobes from the background and find the GGO more precisely. For a challenging sample as the third and sixth columns, DSN-SAAL can still distinguish the lung parenchyma and lesion areas to obtain more accurate results than VGG-16.

To verify the generalization of DSN-SAAL, we conduct comparative experiments on other three publicly available COVID-19 datasets. We use the data division which is mentioned in Section 4.1, and the results are shown in Tables 7, 8 and 9 . For each dataset, we compare our method with the state-of-the-art approaches, and the results of the comparison methods in the tables are all from the original papers.

For the COVIDx-CT dataset, we first conduct experiments under the original data distribution and compare them with relevant methods. It can be seen that our DSN-SAAL performs better than the other two approaches in terms of the overall accuracy and the SEN and PRE of the three categories. COVIDNet-CT [14] is the method proposed in conjunction with the original COVIDx-CT dataset, and the VisionPro [39] is deep learning software that has been widely used in various fields from factory automation to life science. As the COVIDx-CT dataset itself has a large number of slices, the samples in the three categories are relatively rich for the classification tasks, so the three methods can all get good results on this dataset, as shown in the first part of Table 7 .

In order to better verify the effectiveness of our DSN-SAAL, we extract 1000 CT slices from each of the three categories of training sets for training, that is, only about 5% of the original training set is used for training, and the test set is kept unchanged for a fair evaluation. The samples taken are consistent with that in reference [40] . As shown in the second part of Table 7 , COVID-CT-MaskNet [40] , Two Stage Model [41] , Lightweight Model [42] , and One Shot Model [43] all adopted a two-stage learning strategy. First, the relevant regions of interest (ROI) containing GGO and consolidation shadows were obtained from the images by detection and segmentation methods, and then each category was distinguished by a classification network. From Table 7 , we can observe that our DSN-SAAL outperforms the state-of-theart approaches and can still maintain a high degree of differentiation for each category even when the dataset is greatly reduced. Besides, our DSN-SAAL is a single-stage network, which makes the training process more convenient than a multi-stage model. For the COVID19-CT dataset, as shown in Table 8 , it can be seen that although the AUC value of our DSN-SAAL is slightly lower than that of Self-Trans, it still ranks 2nd, and our results in ACC and F1-score are higher than that of Self-Trans. It is worth mentioning that, for Self-Trans, 1000 additional unlabeled CT slices from the Lung Nodule Analysis (LUNA) database were trained with the pre-trained model on the ImageNet dataset, and then the COVID19-CT dataset was trained on the obtained model to complete the final classification task. However, our method does not use additional data for training. In addition, our ACC value is slightly lower than Cross-datasets Analysis [45] , but we get the highest F1-score. Furthermore, we can find that our DSN-SAAL significantly outperforms the existing methods including Cross-datasets Analysis [45] in each index, when conducting experiments on the SARS-CoV-2 CT-scan dataset (see Table 9 ). It effectively verifies that our DSN-SAAL has better generalization performance than the state-of-theart methods. We further conduct experiments on another dataset, i.e., SARS-CoV-2 CT-scan. Table 9 shows the comparison of results between DSN-SAAL and a series of classical traditional machine learning methods and existing deep learning models. Among them, xDNN combined deep neural networks with prototype learning, aiming to propose an interpretable deep learning model for the automatic diagnosis of COVID-19. MAD-DBM [46] used a deep bidirectional long short-term memory network with a mixture density network model as a real-time COVID-19 diagnostic system. It can be found from Table 9 that DSN-SAAL is still superior to other popular traditional machine learning and deep learning methods in each evaluation index, which effectively verifies the performance of our model.

Deep learning has proven to be an effective tool for assisting the diagnosis of COVID-19 due to its rapid and accurate characteristics. However, data volume and diversity have a profound impact on the performance of deep learning models [49] (see Table 6 ). On the one hand, there are relatively few public datasets available on the diagnosis of COVID-19, most of which are intended to distinguish between normal and COVID-19. Although CT scan images of COVID-19 are distinctly different from normal images, there are many common manifestations with other pneumonia. On the other hand, the sample size of the original target class is much smaller than that of the non-target class. In the diagnosis of COVID-19, there are often few COVID-19 targets, which leads to the problem of data imbalance, but the clinical need to distinguish them effectively. However, most of the existing deep learning methods for diagnosing COVID-19 did not consider the problem of data imbalance or simply amplify the data through affine transformation, and the diversity of samples did not increase. Taking the problems mentioned above into account, we first collected and created a category diversity dataset for COVID-19 diagnosis. Second, we proposed a novel method called DSN-SAAL, which can better distinguish COVID-19 from normal and bacterial pneumonia in the case of data imbalance.

The image features learned by the network at different stages are diverse and need to be effectively utilized. In this paper, we integrated shallow features and deep features through auxiliary supervision to promote the simultaneous learning of minority and majority class features. Furthermore, considering the similarity between different slices of CT scans and the possible mislabeling of clinical data, we designed an adaptive auxiliary loss for supervision, which is effectively combined with the deep supervision network to promote the learning of minority class features. Tables 4 and 5 show the advantages of deep supervision network and adaptive auxiliary loss over baseline, respectively. The results illustrated in Table 3 and Fig. 4 also show that our method has great advantages when compared to popular deep learning models.

To fully demonstrate the superiority of our method under the imbalance problem, we designed a series of comparison experiments under the imbalanced ratio (see Table 6 ). It can be seen that our method has better stability than the baseline in the case of an increased imbalance ratio. We also show the confusion matrix of different proportions of COVID-19 samples (see Fig. 5 ). It can be found that with the decrease of COVID-19 samples, the performance of our model decreases correspondingly. However, it is worth mentioning that when the COVID-19 samples are 1% of the original ones, that is, only 13 samples are used for training at this time, our model still has a certain recognition ability of the COVID-19 in the same test set. To verify the generalization ability of our model, some experiments were conducted on other three public COVID-19 diagnosis datasets. The results as shown in Tables 7, 8 and 9 verified the superiority of the proposed method. Besides, CAMs showed that our model can focus on the pulmonary parenchymal areas to further find relevant lesion areas more accurately, even if the lesion area is not obvious (see Fig. 7 ).

Although we have demonstrated that our model performed well in the COVID-19 diagnosis, there are still some limitations. First, the COVID19-Diag dataset is limited. Compared with existing COVID-19 diagnostic datasets, our COVID19-Diag has some advantages in data volume, but it is still not enough, which will have a certain impact on the training of deep learning models. We plan to evaluate our method using additional CT scans from more centers in the future. Second, our method only focuses on the identification of COVID-19 and does not quantify the lesion area for analyzing the severity to help clinicians make further diagnoses. Thus, we are going to look at that to help with monitoring and treatment in future research. We also plan to replicate our model on the open source deep learning platform paddle.

In this paper, we create a challenging clinical dataset named COVID19-Diag and propose a novel deep supervised learning using self-adaptive auxiliary loss for COVID-19 diagnosis from imbalanced CT images. We first present a novel deep supervised network for multi-scale feature learning of imbalanced data (i.e., the equivalence learning of majority and minority classes). Then, we propose an efficient self-adaptive auxiliary loss by considering the effective number of samples and the regularization item with an RCE. Our method can be applied to different datasets since all model parameters are automatically learned through the network iteration. Finally, the results on our COVID19-Diag and three publicly available COVID-19 diagnosis datasets show that using a convolutional neural network without any data amplification can effectively identify COVID-19 from imbalanced CT images.

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Imaging and clinical features of patients with 2019 novel coronavirus sars-cov-2

Automatic segmentation of retinal layer boundaries in oct images using multiscale convolutional neural network and graph search

Automatic segmentation of intracerebral hemorrhage in ct images using encoderdecoder convolutional neural network

A systematic study of the class imbalance problem in convolutional neural networks

Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data

A deep learning system to screen novel coronavirus disease 2019 pneumonia

Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis

Cost-sensitive learning of deep feature representations from imbalanced data

Large-scale screening of covid-19 from community acquired pneumonia using infection size-aware classification

Sampleefficient deep learning for covid-19 diagnosis based on ct scans

Deep learning enables accurate diagnosis of novel coronavirus (covid-19) with ct images

Transfer learning for establishment of recognition of covid-19 on ct imaging using small-sized training datasets

Covidnet-ct: A tailored deep convolutional neural network design for detection of covid-19 cases from chest ct images

Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study, medRxiv

Ai-assisted ct imaging analysis for covid-19 screening: Building and deploying a medical ai system in four weeks

A weakly-supervised framework for covid-19 classification and lesion localization from chest ct

A deep learning algorithm using ct images to screen for corona virus disease

Sars-cov-2 ct-scan dataset: A large dataset of real patients ct scans for sars-cov-2 identification

Gaussian affinity for max-margin class imbalanced learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Focal loss for dense object detection

A discriminative feature learning approach for deep face recognition

Symmetric cross entropy for robust learning with noisy labels

Range loss for deep face recognition with long-tailed training data

Arcface: Additive angular margin loss for deep face recognition

Meta-weightnet: Learning an explicit mapping for sample weighting

A deep learning system that generates quantitative ct reports for diagnosing pulmonary tuberculosis

Very deep convolutional networks for largescale image recognition

Richer convolutional features for edge detection

Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function

2009 IEEE Conference on Computer Vision and Pattern Recognition

Rich feature hierarchies for accurate object detection and semantic segmentation

Deep residual learning for image recognition

Densely connected convolutional networks

Mobilenetv2: Inverted residuals and linear bottlenecks

Aggregated residual transformations for deep neural networks

Grad-cam: Visual explanations from deep networks via gradient-based localization

Detection of covid-19 from chest computed tomography (ct) images using deep learning: Comparing cognex visionpro deep learning 1.0 software with open source convolutional neural networks

Covid-ct-mask-net: Prediction of covid-19 from ct scans using regional features, medRxiv

Detection and segmentation of lesion areas in chest ct scans for the prediction of covid-19, medRxiv

Lightweight model for the prediction of covid-19 through the detection and segmentation of lesions in chest ct scans

One shot model for covid-19 classification and lesions segmentation in chest ct scans using lstm with attention mechanism, medRxiv

Contrastive cross-site learning with redesigned net for covid-19 ct classification

Covid-19 detection in ct images with deep learning: A voting-based scheme and cross-datasets analysis

Deep bidirectional classification model for covid-19 disease infected patients

Imagenet classification with deep convolutional neural networks

Going deeper with convolutions

Deep neural network for automatic characterization of lesions on 68 ga-psma-11 pet/ct