key: cord-0563343-s2iwhzem
authors: Li, Jingxiong; Wang, Yaqi; Wang, Shuai; Wang, Jun; Liu, Jun; Jin, Qun; Sun, Lingling
title: Multiscale Attention Guided Network for COVID-19 Detection Using Chest X-ray Images
date: 2020-11-11
journal: nan
DOI: nan
sha: 3531c7d64b513e594e8418a84c9b6e755102c483
doc_id: 563343
cord_uid: s2iwhzem

Coronavirus disease 2019 (COVID-19) is one of the most destructive pandemic after millennium, forcing the world to tackle a health crisis. Automated classification of lung infections from chest X-ray (CXR) images strengthened traditional healthcare strategy to handle COVID-19. However, classifying COVID-19 from pneumonia cases using CXR image is challenging because of shared spatial characteristics, high feature variation in infections and contrast diversity between cases. Moreover, massive data collection is impractical for a newly emerged disease, which limited the performance of common deep learning models. To address this challenging topic, Multiscale Attention Guided deep network with Soft Distance regularization (MAG-SD) is proposed to automatically classify COVID-19 from pneumonia CXR images. In MAG-SD, MA-Net is used to produce prediction vector and attention map from multiscale feature maps. To relieve the shortage of training data, attention guided augmentations along with a soft distance regularization are posed, which requires a few labeled data to generate meaningful augmentations and reduce noise. Our multiscale attention model achieves better classification performance on our pneumonia CXR image dataset. Plentiful experiments are proposed for MAG-SD which demonstrates that it has its unique advantage in pneumonia classification over cuttingedge models. The code is available at https://github.com/ JasonLeeGHub/MAG-SD.

art. We propose a novel Multiscale Attention Guided deep network with Soft Distance regularization (MAG-SD) for COVID-19 CXR image classification. To balance the quantity of different data, a weakly-supervised method is introduced, which only needs few amount labeled data to do effective augmentations. Multiscale strategy is applied to attention generator to produce detailed scalar matrix for prediction. Our classification model is motivated from the fact that clinical diagnosis of COVID-19 follows a procedure which firstly evaluates the regional appearance, then makes diagnosis exclusively. Thus, we propose a multiscale attention module which estimates both shallower and deeper layers. Comparing with using feature maps from only highest convolution layers, the utilization of lower features may increase the accuracy in incidental findings. Moreover, our weakly-supervised system integrated a soft distance regularization method which refines classification result by adaptively adjusting classification loss. In a nutshell, contribution of this paper is threefold:

We design a novel deep network, MA-Net, treating COVID-19 detection as a fine-grained image classification problem. Multiscale attention is introduced into the proposed model to assess attention maps on both high-level target feature and low-level texture feature. Composed attention maps are used as guidance for following steps. Attention pooling is proposed to utilize attention maps for classification.

2) We address data shortage by proposing attention guided data augmentation and multi-shot training phase, as COVID-19 is newly occurred and lack of data comparing with abundant database of known diseases. It includes attention mix-up, attention patching and attention dimming, which focuses on enhancing and searching local feature before generating data. Models are trained on an imbalanced COVID-19 dataset and achieve the state-of-the-art.

3) Without introducing other modules or parameters, we formulate prediction loss using soft distance between predictions. Specifically, a new regularization term, soft distance regularization, is proposed to work together with cross entropy loss. Soft distance regularization works as a constraint between predictions, forcing the classifiers to produce similar output for one target.

The paper is organized as follows. In Section. 2, we introduce insightful works which have high relevance with our contribution. Section. 3 presents the proposed method. In section. 4, database and experimental setup are introduced in detail, then results are presented and discussed individually. The last section concludes this study and highlights the future work.

In this section, related works are reviewed, including X-ray appearance for typical pneumonia, fine-grained visual classification, attention mechanism for CNNs and multiscale feature fusion utilized in computer vision.

Chest X-ray is a widely used imaging modality providing high-resolution pictures to visualize the pathological changes of thoracic diseases. Diagnosis could be made according to the visual patterns demonstrated on CXR images. Clinical research from Katz and Leung [23] demonstrates that typical image pattern for bacterial pneumonia includes opacification of single lobe and pleural effusion. Viral pneumonia also has radiological appearance such as pulmonary edema, small area of effusions, consolidation or lobe mass. Reports from [24] shows that the most common pattern on CXR in COVID-19 is consolidation or ground-glass opacity. It is notable that COVID-19 shares some visual feature with viral pneumonial while viral and bacterial pneumonia can hardly be differentiated because of similar spatial appearance.

Mass application of CNNs revealed its advantage on solving large scale image classification problem [25] , which enlightened using CNN for FGVC tasks, forcing CNN models to explore inconspicuous local features. Some models rely on local annotations to train part-based detectors, localizing certain parts before prediction [26] [27] . However, local feature annotation requires expensive human labor, which limits its reproducibility in real-world application. In recent years, approaches only require labels also emerge whose motivation is to first localize the corresponding parts and then compare their local features [28] . Fu et al. [29] introduce WS-DAN, which is a weakly supervised deep network handling FGVC by posing attention to enhance local feature and guide augmentation. FGVC is also a common problem in medical image as the spatial similarity of infections. Qin et al. [30] propose a fine-grained classification CNN for different types of lung cancer in PET and CT Images.

For visual task, attention usually indicates a scalar matrix representing the relative importance and inner relevance of local feature [31] . This nonuniform representation is produced by special designed modules [32] . Works have shown that generating the attention map for classification CNN provides a intuitional way to localize the target object, helping to identify visual properties through local representation. An attention guided method demonstrated by Gondal et al. [33] report that attention mechanism is helpful in Diabetic Retinopathy (DR) localization and recognition. Zhang et al. [34] regulate the attention of deep model by training self-attention blocks for skin lesion classification and surpass the models without attention. Generally, attention mechanism force models to analyze global and local feature simultaneously to generate believable classification with localization results.

Extracting hybrid feature maps from multi-resolution input image is a common strategy in computer vision since the the era of hand-engineered features. CNNs calculating a feature hierarchy layer by layer has an inherent multiscale feature hierarchy in pyramidal shape. Multiscale feature has an advantage in producing semantically strong representations if effective feature fusion is operated. Explorations on multiscale feature fusion includes U-Net [35] and V-Net [36] , which exploit skip connections to associate feature maps across resolutions, FPN [37] which leveraging the prediction of multiscale hierarchy by multiple prediction. For CXR image, Huang et al. [38] present weight concatenation method to cooperate global and local feature. Thriving of spatial attention gives inspiration to extract attention from multiresolution feature map. Sedai et al. [39] propose A-CNN for chest pathologies localization, which utilize multiscale attention by calculating convex combination from weighted average of the feature maps.

In this episode, we propose our approach that explore multiscale fine-grain feature adaptively. We first produce an overview for our MAG-SD. Then MA-Net is presented in terms of network architecture with attention modules. A weakly supervised data augmentation module, Attention Guided Augmentation, is introduced to address the shortage of COVID-19 cases. At last, Soft Distance Regularization is proposed to erase noise imported by augmentations.

COVID-19 CXR images are less distinctive comparing with other pneumonia cases, which require a model to extract features for fine-grained feature of input image. We adopt WS-DAN [29] which is competitive on fine-grained image classification topic. The architecture of WS-DAN includes a feature extractor which is ResNet50 in original paper, an attention generator operated on feature map and an augmentation generator producing local-enhanced and noise-blended image. An overview of our MAG-SD is shown in Fig. 1 . In primary training route, preprocessed CXR image I 0 is fed into MA-Net for prediction vector P and attention map A. Attention Guided Augmentation is operated on I 0 , using A to produce augmented data I 1 , I 2 , I 3 . In Auxiliary training routes, I 1 , I 2 , I 3 are pushed into MA-Net for prediction vectors p 1 , p 2 , p 3 . All the vectors (i.e. P, p 1 , p 2 , p 3 ) are utilized by Soft Distance Regularization for a proper loss. 

Attention mechanism has been used in natural image contests to guide feedforward process [40] , [41] . Recently, tentative efforts have been made on deep models such as image classification [42] , person perception [43] and sequential decision tasks [44] . Most of the attention models aim at gathering top information, deciding where to attend for the next learning steps. The proposed attention generating model is operated on multiscale feature maps, aiming at extracting attention from both texture level and target level. The last layer before downsampling are selected as feature map in order to exploit information of single resolution. For ResNet50 we used, feature maps with size of 512 * 28 * 28, 1024 * 14 * 14, 2048 * 7 * 7 are chosen. The number of attention map to be generated is set to 32.

The architecture of multiscale attention generator is shown in Fig. 3 . f 1 , f 2 and f 3 are feature maps selected from feature extractor. Each of them are processed by a 1 * 1 convolutional layer to generate corresponding attention. All the attention maps are downsampled to 7 * 7 and connected residually. The effect of using different number of feature maps is discussed in experiments.

Attention pooling module mimic the structure proposed by [29] , which associates attention output and feature map. Fig.  4 shows the function pipeline of the pooling method. Feature map f 3 (2048 * 7 * 7) is extracted from the output of CNN encoder. Multiscale attention map A presented by attention generator have the size of 32 * 7 * 7. Each attention map focuses on diverse location which may contain valuable fine-grained feature. Attention biased features (i.e. part feature map (P F )) are presented by multiplying all the attention maps A, each by each, with feature map. There are 32 P F s which size equals 2048 * 7 * 7. Global average pooling (GAP) is operated to shrink each P F to 2048 * 1 * 1 to describe the activation intensity of attention on feature map. Feature matrix M is produced by concatenating the GAP results, producing a vector of 65536 * 1 * 1. Eq. (1) describes the calculation of P F .

where stands for multiplication of elements between two tensors. f 3 is feature map extracted by CNN. N represents the number of attention maps, which is 32 in our work.

P F j has to go through a downsampling method such as GAP, to get description with compressed size, which is 2048 * 1 * 1. Feature matrix M is represented by concatenating all condensed P F j , which is represented in Fig. 4 . 

As mentioned above, attention mechanism emphasizes local feature which affects the classification result. Following the idea, the performance of classification network could be enhanced if attention guided training cases are considered. Weakly supervised method shown in Fig. 5 is proposed to present an effective augmentation for original image. For each image, one attention map is randomly chosen for individual augmentation. This attention map is normalized as A * .

Mixup is a augmentation strategy which generates data by mixing overall image and regional data together. As we have a attention map A * j , a detailed region D j could be extracted by doing threshold.

For elements in A * j (l, m), Eq. (2) set D j (l, m) to 1 if it is greater than threshold θ m ∈ [0, 1] , and others to 0. A bounding box surrounding the extracted region could be proposed from the raw region. Region coved by the box is enlarged to the same size as input image then merged together with original input I 0 to get augmented input I 1 , which is defined in Eq. (3).

where γ is a parameter range in [0, 1] and B stands for the enlarged bounding box. By mixing local feature and global feature together, fine-grained features could be extracted and the model could see target precisely.

2) Attention Patching: Encoder may sensitive to limited part of reception field as valuable spatial feature usually distributes in similar position. To encourage the encoder to exploit feature globally, attention patching is proposed. Specifically, D mentioned in 1) is patched onto the original image I 0 to propose patched data I 2 as shown in Fig. 5 . Attention patching enlarges the model's interest region, which forces the model to exploit its input globally.

3) Attention Dimming: When training attention generating module for feature map, multiple attention maps may sense similar region. A responsible fine-grain classification model have to focus on different local features of one target.

To stimulate the attention model exploiting the whole reception field, attention dimming is proposed. We obtain a Dimming Mask (DM ) from A * , applying threshold θ d ∈ [0, 1], as represented in Eq. (4) .

Augmented image I 3 is obtained by applying the mask onto the input, which is illustrated in Fig. 5(c) 

Comparing with original training images, disturbances are introduced by augmentation. (e.g. infection area reduced by attention dimming). To address this problem, we formulate the uncertainty of model via the distance between prediction vectors. Intuitively, the distance d could be modeled as Eq. (5).

where x denotes the augmented image, P (I), p(x) represent primary prediction vector and auxiliary prediction vector respectively. However, the distance between P (I) and p(x) is unstable before the the model well-fitted. We reference ground truth labels to stabilize gradients. As shown in Algorithm. 1, P (I) is replaced by soft label P (I), filtering out low confidence inferences. Soft distance d (x) can be represented in Eq. (6) . 

where L prim ce operates between labels and primary prediction. If two vectors have different prediction for one target, L reg will generate a large value, which reflects the uncertainty of the model on one target. Besides, it is also notable that L reg punishes soft distanced , forcing the model to generate the same prediction vectors.

In this section, extensive experiments are conducted to show the effectiveness of MAG-SD. The model is trained on datasets which containing different types of pneumonia. Each proposed method is evaluated to prove their effectiveness. Then the model is compared between other baseline methods using several metrics.

The proposed model is trained and tested on several datasets to evaluate its classification performance and ability of fine-grained pneumonia localization. Details of each dataset is shown in Tab. 1. Dataset A is a mutated dataset with 90 COVID-19 from [14] and 168 other pneumonia cases from [16] , which directly assess model's fine-grained classification ability. Dataset B is selected from [45] and [16] aiming at assessing the model's performance on larger scale. Dataset C is the largest dataset we operate on, which includes COVID-19 detection and fine-grained pneumonia classification. Quality of pneumonia localization is evaluated by Localization dataset, which has 13 COVID-19 cases with pixel-wise mask from [46] and 118 non-COVID pneumonia cases with bounding box annotation from [16] . In experiments, classic ResNet50 has been adopted as the convolutional feature extrator and its output of layer4 is chosen as feature map. Attention is extracted from the output of layer2, layer3 and layer4 to ensure multiscale attention. Size of attention are 28 * 28, 14 * 14 and 7 * 7 respectively. Both training and test sets are divided roughly in the same class proportions. 5-fold cross validation is applied to get reliable results. 

X-ray images are affected by varied configurations of imaging equipment resolving that radiology image of the same tissue can be different. To ensure the intensity distribution of one tissue is similar over the dataset, Z-score normalization is employed when doing model training and testing. Large contrast distribution also introduced extra noise to the dataset, impacting the performance of trained deep model. Contrast limited adaptive histogram equalization (CLAHE) is proposed to enhance contrast between tissues and restrain noise signal [47] .

In image classification, data augmentation has been proved as an effective method to improve robustness and evaluation performance [48] . Augmented data provides more varieties for classification target, remitting impact of overfitting. Random number of transformations are chosen from a sequence of linear transformation to be applied for each training sequence. Transformations list is shown in Tab. 2.

Several widely adopted metrics are employed. Classification metrics includes Accuracy (ACC), True Positive Rate (TPR), True Negative Rate (TNR) and F1 score. Localization quality is quantified by Intersection over Union (IOU). Accuracy describes the proportion of correctly classified targets, which expressed in Eq. (8) .

where TP, TN, FP and FN stand for the number of true positive predictions, true negative predictions, false positive predictions and false negative predictions. TPR, also known as sensitivity, is useful to measure the proportion of true T P R = T P T P + F N

TNR, or Specificity, is a measure of the amount of true negative (TN) and false positive (FP) predictions, defined in Eq. (10) .

F1 Score considers the performance from both precision and recall, which defined in Eq. (11) .

IoU represents a value calculated by dividing the overlap of prediction and ground truth by their union. It could be defined sraightforward in Eq. (12) , where A o and A u denotes area of overlap and area of union respectively. TPR  TNR  F1  ACC  TPR  TNR  F1  ACC  TPR  TNR ACC  TPR  TNR  F1  ACC  TPR  TNR  F1  ACC  TPR  TNR  F1  7 

The method composed for this context could be concluded into attention modules, attention guided data augmentation and soft distance regularization. Each component is studied by evaluating its improvement in classification performance, which is quantified by metrics mentioned above (i.e, Accuracy, TPR, TNR and F1 Score). Performance gain is obtained by the following method: the proposed method is trained on the dataset with all the metrics; then components of the method are been removed or substituted then evaluated on the same dataset. For all the tested models, mean value of each metric is calculated as the final result. Experiments on our model are reported in Tab. 4, 5, 6, 7 and 8 with all metrics. Inter-model comparison could be found in Tab. 3. Fig. 6 shows the interest area proposed by the model. In all the experiments, parameters are maintained unchanged as possible for condition control. The model are trained on the same size of training set then evaluated on the same size of testing set.

Deep explorations into architecture design are proposed by designing experiments using varied methods. To perform this analysis, we evaluated classic coarse-grained deep neural networks (i.e. VGG16, ResNet18, ResNet50 and InceptionV3), COVID-19 oriented architectures (i.e. [49] (ResNet), [49] (InceptionV3), COVID-Net-Large), high performance finegrained structure (i.e. BCNN, BCNN(Attention)) and multiscale feature fusion models (i.e. FPN, U-Net). Choosing these deep structures helps to explain our advantages in fine-grained feature extraction. It can be observed in Tab. 3 that our model has noticeably higher performance over other models. Accuracy on dataset B and C reaches 0.961, 0.878 respectively. BCNN with attention get the best performance on TPR, TNR in Dataset B.

Comparing with classic models, our model is specialized for COVID-19 image classification and attention guided architecture has its advantage in fine-grained visual classification task. Most of the other models for COVID-19 show better performance than classic models, however, none of them applies attention mechanism or considers fine-grained features, which impacted their accuracy on large scale, multi-class dataset such as Dataset B and C. Comparing FPN, U-Net with classic models, it can be shown that models considering multiscale feature are highly over InceptionV3 in Dataset A. Performances of multiscale models trained on dataset B are similar to Inception V3, which reaches 0.938 on TPR and overrun U-net in TPR, TNR and F1. In Dataset C, FPN has higher accuracy than VGG16 and U-Net exceeds ResNet50. Results show that multiscale feature fusion models reaches high performance in a relatively simple structure comparing with classic deep models, which leave us a hint that multiscale attention is a promising route to improve the model. TPR  TNR  F1  ACC  TPR  TNR  F1  ACC  TPR  TNR  F1  L2 0 

Images collected by different devices probably be distinct in contrast due to configuration variety. CLAHE is employed to relieve the noise brought by contrast distribution. Tab. 4 shows the result that CLAHE obviously improve the performance of proposed model, raising over 0.02 Accuracy on average. Larger datasets such as Dataset B and C are reproted to have more performance gain. Model trained without CLAHE is worse on all the metrics in three datasets.

Normally, state-of-the-art coarse-grained CNN models suffer from similar global features between classes when operating on FGVC, meaning local feature is the key to improve. local features are effectively localized by our multiscale attention method, which performance is evaluated on COVID-19 datasets. Models trained with attention module (i.e. MAG-SD(0AUG)) and baseline model (i.e. ResNet50) are compared in Tab.3. Networks with attention achieves better performance than ResNet50 baseline. Metrics show that proposed model surpasses baseline on dataset A and C using all the benchmarks, and gets higher score in ACC, TPR and F1 on dataset B. Our model reaches 0.944, 0.930 and 0.838 on ACC in dataset A, B and C with an over 0.01 average gain comparing with baseline. Furthermore, attention module includes two parts, attention generating and attention pooling. Investigations of these two parts are concluded in two steps. In the first, models are compiled for assessing effectiveness of multiscale attention, with 1, 2 or 3 size of attention maps considered. Results are presented in Tab. 6 , which shows that the model achieves the best accuracy in all three datasets when considering 2 feature maps. Model with 3 attention maps have better performance on TPR, TNR and F1 in Dataset B. Low-level texture feature may be ignored using single attention map while to many scale of maps introduce noise and surpass high-level target information. Secondly, we evaluate attention pooling module with models trained with other commonly used pooling methods such as global average pooling (GAP) or global max pooling (GMP). Results on pooling methods are presented in Tab. 7, which shows that our proposed attention pooling method surpass GAP and GMP in all three datasets.

Attention emphasis local feature that interested the model. With attention, data could be augmented effectively. Attention guided augmentation is shown in Fig. 5 . Models are trained with 0, 1, 2 or 3 augmentation to discuss its effect in COVID-19 CXR image classification task. In the case of 1 augmentation, attention mixup is selected. 2 augmentations model includes attention mixup and attention patching. The results obtained is presented in Tab. 5. In all the datasets, model with all three augmentations has the best Accuracy. In dataset C, Model with 2 augmentations is slightly better in TPR and F1. The proposed augmentations emphasis data according to attention map, minimizing negative effect caused by random augmentations.

Soft distance regularization is presented to relieve augmentation variance. To effectively verify the effectiveness of soft distance regularization, we compare it with L2 distance regularization. Tab. 8 illustrated that our proposed regularization method surpasses L2 on all the metrics. Constraint between auxiliary vector and primary vector screen the false prediction introduced by attention guided augmentations. Regularization is calculated between ground truth and auxiliary vector when primary vector cannot provide reliable prediction, keeping the final result away from local minima.

Technically, attention improve the models by roughly localize the part with high activation intensity. This characteristic of attention inspires us to try MAG-SD on localization topics. The models are trained on the Dataset B we proposed, then test on Localization. Fig. 6 demonstrates several cases from Localization dataset. COVID-19 cases has pixel-wise segmentation and non-COVID-19 cases has bounding box for pneumonia infection. Attention maps are A are upsampled from 7 * 7 to 224 * 224. Localization masks for COVID-19 cases are extracted by applying threshold to the attention maps. Bounding boxes for other pneumonia are produced by simply enclosing the localization masks with rectangles. IoU is calculated to evaluate the quality of localization. Image shown that the attention module we proposed could roughly indicate the position of different type of pneumonia with over 0.25 IoU score. Attention map emphasis the influential part from the input image effectively to improve the model's performance.

We have presented MAG-SD for automatic COVID-19 CXR image classification that reaches the state-of-the-art on our dataset. The proposed novel method achieved splendid performance by treating this topic as a fine-grained image classification task, utilizing local features efficiently under the guidance of attention mechanism. Attention maps were generated using multiscale features then used as a reference to data augmentation, helping the model to overcome the lack of COVID-19 cases. The proposed network learned to weight the predictions from both primary and auxiliary training pathways by calculating soft distances between vectors, gaining improvements by screening noise generated by augmentations.

Findings of our exploration were demonstrated in Section. 4. The results indicated the great potential of applying advanced pattern recognition model to clinical diagnosis and epidemic screening. Trained on the clinical knowledge acquired by physicians, our model was capable to extract fine-grained spatial features for COVID-19. Attention is applied in both feature extraction and augmentation stage, which helped to localize pneumonia infection and accrete the data effectively as part of weakly supervised method. Attention module also shows its capability in different models. It could be interesting to design more auxiliary training strategies to guide the model to an optimal solution. Positive feedback on soft distance regularization proved that our method considered auxiliary predictions and eliminated label noise simultaneously, however, hard threshold may limit its adaptability in complicated data.

Although deep learning methods seem promising in clinical diagnosis and pandemic screening, lacking of prior knowledge is always the Achilles' Heel. Supervised learning method, such as MAG-SD we proposed, have to be trained on labeled data. Which means new diseases or rare diseases without data available cannot be classified properly. To alleviate this limitation, abnormal detecting and clustering model could be proposed as a guidance for supervised models, which is part of our topic in future work.

A novel coronavirus outbreak of global health concern

World Health Organization et al. Coronavirus disease 2019 (covid-19): situation report

Extracting possibly representative covid-19 biomarkers from x-ray images with deep learning approach and image data related to pulmonary diseases

Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases

Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The lancet

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Imaging of pneumonia: trends and algorithms

Clinical Management of Bacterial Pneumonia

The covid-19 epidemic

Padchest: A large chest x-ray image dataset with multi-label annotated reports

Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison

Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: a descriptive study

Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images

Covid-19 image data collection

Figure 1 covid-19 chest x-ray data initiative

Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases

Estimating uncertainty and interpretability in deep learning for coronavirus (covid-19) detection

Identifying medical diagnoses and treatable diseases by image-based deep learning

Covid-19 screening on chest x-ray images using deep learning based anomaly detection

Very deep convolutional networks for large-scale image recognition

Deep residual learning for image recognition

Rethinking the inception architecture for computer vision

Radiology of pneumonia

Clinical, laboratory and imaging features of covid-19: A systematic review and meta-analysis. Travel medicine and infectious disease

Deep learning for fine-grained image analysis: A survey

Part-Based R-CNNs for Fine-Grained Category Detection

Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization

Bilinear CNN Models for Fine-Grained Visual Recognition

Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition

Fine-Grained Lung Cancer Classification from PET and CT Images Based on Multidimensional Attention Mechanism

Learn to Pay Attention

Prior-attention residual learning for more discriminative covid-19 screening in ct images

Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images

Attention residual learning for skin lesion classification

U-net: Convolutional networks for biomedical image segmentation

V-net: Fully convolutional neural networks for volumetric medical image segmentation

Feature pyramid networks for object detection

Yu Pang, and Teen-Hang Meen. Fusion high-resolution network for diagnosing chestx-ray images. Electronics

Deep multiscale convolutional feature learning for weakly supervised localization of chest pathologies in x-ray images

Spatial transformer networks

Attention to scale: Scale-aware semantic image segmentation

Residual attention network for image classification

Mask-guided contrastive attention model for person re-identification

Learning deconvolution network for semantic segmentation

Covid-19 xray dataset

Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients

Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms

The effectiveness of data augmentation in image classification using deep learning

Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks