key: cord-0028570-yd9sc56z authors: Xu, Yi; Han, Kang; Zhou, Yongming; Wu, Jian; Xie, Xin; Xiang, Wei title: Classification of Diabetic Foot Ulcers Using Class Knowledge Banks date: 2022-02-28 journal: Front Bioeng Biotechnol DOI: 10.3389/fbioe.2021.811028 sha: 3563c1d67c3151f139778afda3d9e17b017592ca doc_id: 28570 cord_uid: yd9sc56z Diabetic foot ulcers (DFUs) are one of the most common complications of diabetes. Identifying the presence of infection and ischemia in DFU is important for ulcer examination and treatment planning. Recently, the computerized classification of infection and ischaemia of DFU based on deep learning methods has shown promising performance. Most state-of-the-art DFU image classification methods employ deep neural networks, especially convolutional neural networks, to extract discriminative features, and predict class probabilities from the extracted features by fully connected neural networks. In the testing, the prediction depends on an individual input image and trained parameters, where knowledge in the training data is not explicitly utilized. To better utilize the knowledge in the training data, we propose class knowledge banks (CKBs) consisting of trainable units that can effectively extract and represent class knowledge. Each unit in a CKB is used to compute similarity with a representation extracted from an input image. The averaged similarity between units in the CKB and the representation can be regarded as the logit of the considered input. In this way, the prediction depends not only on input images and trained parameters in networks but the class knowledge extracted from the training data and stored in the CKBs. Experimental results show that the proposed method can effectively improve the performance of DFU infection and ischaemia classifications. The diabetic foot ulcer (DFU) is a complication of diabetes with high incidence Armstrong et al. (2017) . According to the estimation of the International Diabetes Federation Atlas et al. (2015) , 9.1 million to 26.1 million people with diabetes develop foot ulcers each year in the world. For people with diabetes, the presence of DFU can result in amputation and even increase the risk of death Walsh et al. (2016) . Identifying whether the DFU is infection and ischaemia is important for its assessment, treatment, and management Jeffcoate and Harding (2003) , where the infection is defined as bacterial soft tissue or bone infection in the DFU and ischaemia means inadequate blood supply Goyal et al. (2020) . Classification of DFU infection and ischaemia by computerized methods is thus a critical research problem for automatic DFU assessment. Traditional methods for diagnosis of DFU employ hand-crafted features followed by a classifier Veredas et al. (2009) ; Wannous et al. (2010) ; Wang et al. (2016) . However, research in literature has shown that learned features by deep neural networks are more effective than traditional hand-crafted features LeCun et al. (2015) . Extensive research has been done to increase the performance of computerized automatic medical image classification Litjens et al. (2017) , where methods based on deep learning LeCun et al. (2015) are very popular in this field because they perform significantly better than other techniques Li et al. (2014) ; Kumar et al. (2016) ; Goyal et al. (2020) ; Wang et al. (2020) ; Cao et al. (2021) . The most widely used deep learning method in medical image classification is the convolutional neural network (CNN) Gulshan et al. (2016) ; Albawi et al. (2017) . CNNs can effectively extract useful features for image classification He et al. (2016) ; Tan and Le (2019) , object detection Redmon et al. (2016) ; Zhao et al. (2019) , image segmentation Chen et al. (2018) ; Hesamian et al. (2019) ; Xiao et al. (2021) and many other vision tasks LeCun et al. (2015) . With the availability of large-scale training data and highperformance modern GPUs and ASICs, methods based on CNNs have greatly improved the accuracy of image classification. Popular CNNs for general image classification tasks include AlexNet Krizhevsky et al. (2012) , VGG Simonyan and Zisserman (2014) , ResNet He et al. (2016) , and EfficientNet Tan and Le (2019) . These networks usually serve as the backbone of a medical image classification network, or directly apply to medical image classification by transfer learning with pre-trained parameters on large-scale datasets, e.g., ImageNet Deng et al. (2009) . In practice, collecting and labeling medical images are costly. Transfer learning is thus an effective way to solve the problem of the lack of medical training data Shie et al. (2015) ; Shin et al. (2016) ; Cheplygina et al. (2019) ; Chen et al. (2020a) . Other emerging techniques for image classification include vision transformer Dosovitskiy et al. (2020) ; Touvron et al. (2020) ; Liu et al. (2021) and contrastive learning Wang et al. (2020); Jaiswal et al. (2021) . Vision transformer methods are based on the attention mechanism, where an input image is split into small patches and the vision transformer can learn to focus on the most important regions for classification. Contrastive learning usually performs in an unsupervised way, where the network learns to minimize intra-class distance and maximize inter-class distance. The networks trained by contrastive learning perform well on the subsequent tasks like image segmentation but their classification accuracies are still inferior to those of state-ofthe-art supervised methods. However, existing medical image classification networks do not explicitly consider class knowledge in the training data when performing prediction. The training of existing networks involves the optimization of network parameters, where the class knowledge in the training data is extracted implicitly. In the testing, the trained networks process an input image into a high dimensional representation through trained parameters, where the class knowledge in the training data is not explicitly involved in the pipeline. To better utilize the class knowledge in the training data, we propose class knowledge banks (CKBs) that can effectively extract class knowledge from the training data, and the extracted class knowledge can directly participate in the prediction process. A CKB consists of many trainable units that can represent class knowledge from different perspectives. The average similarity between a representation extracted from an input image and knowledge units in the CKB can be used as a class probability. In this way, the class knowledge in the training data is explicitly utilized. Besides, the proposed CKB method can handle class imbalance as each class is given the same importance in CKBs. As a result, the network with the CKB is able to achieve state-of-the-art classification performance in the DFU image dataset (Goyal et al. (2020) ). In summary, we make the following contributions: • We propose a class knowledge bank (CKB) method that can explicitly and efficiently extract and utilize class knowledge in the training data. • We show that the proposed CKB is good at handling class imbalance in the DFU image classification dataset. The remainder of the paper is organized as follows. We first briefly review the related work in Section 2. Then we describe the proposed method in detail in Section 3. Sections 4, 5 present experimental results and discussions. We conclude the paper in Section 6. In this section, we briefly review the related work on image classification, including convolutional neural networks, vision transformers, and contrastive learning. Convolutional neural networks (CNNs) are the most widely used technique for image classification LeCun et al. (2015) . CNNs utilize multiple convolutional kernels in each layer and multiple layers to extract and process features from low levels to high levels. Since the success of AlexNet Krizhevsky et al. (2012) in image classification in 2012, a lot of methods based on CNN have been proposed to tackle this problem, and the performance of image classification on large datasets, e.g., ImageNet Deng et al. Radosavovic et al. (2020) . These networks follow the structure of deep CNNs (to extract feature) and fully-connected (FC) layers (to predict classes). After training, the prediction depends on the input image and parameters in CNNs and FC layers, without explicit use of the class knowledge in the training data. The transformer model was firstly proposed for natural language processing Vaswani et al. (2017) . The model uses an attention mechanism Gao et al. (2021) ; to capture the correlation within tokens and learns to focus on important Frontiers in Bioengineering and Biotechnology | www.frontiersin.org February 2022 | Volume 9 | Article 811028 tokens. Dosovitskiy et al. (2020) first applied the transformer to image classification and achieved an even better performance than CNNs on the ImageNet dataset. Such a model is called vision transformer, where an input image is divided into patches and these patches are regarded as tokens to feed into the network. The vision transformer can learn to focus on important regions by the attention mechanism to predict class labels. Vision transformers have also been applied in medical image classification Dai et al. (2021) . Furthermore, knowledge distillation Hinton et al. (2015) ; Wei et al. (2020) from the model based on CNN is shown to be effective in improving the performance of the vision transformer Touvron et al. (2020) . Instead of simply regarding image patches as tokens, Yuan et al. proposed a tokens-to-token (T2T) method to better tokenize patches with the consideration of image structure Yuan et al. (2021) . The T2T method achieves better accuracy using fewer parameters compared with the vanilla vision transformer Dosovitskiy et al. (2020) . Contrastive learning aims at learning effective representations by maximizing the similarity between positive pairs and minimizing the similarity between negative pairs Jaiswal et al. (2021) . It usually performs in a self-supervised manner, where positive pairs are from different augmentations of the same sample and negative pairs are simply different samples. SimCLR constructs contrastive loss by a large batch size, e.g., 4,096, to fully explore the similarity in negative pairs Chen et al. (2020b) . Followed by SimCLR, SimCLRv2 achieves a better performance than SimCLR by leveraging bigger models and deeper projection head Chen et al. (2020c) . Since contrastive learning requires a large number of representations of negative pairs, He et al. utilized a queue to store the representations of samples and updated the queue via a momentum mechanism He et al. (2020) , which is shown to be more effective than sampling representations from the last epoch Wu et al. (2018) . Although these contrastive learning methods achieve good performance on image classification by fine-tuning with few labeled samples, their performances are still inferior to those of state-of-the-art supervised methods. Existing image classification networks do not explicitly take the class knowledge in the training data into account when performing prediction. To explicitly leverage class knowledge in the training data, we propose the so-called class knowledge bank method that is able to extract class knowledge from the training data, and the extracted class knowledge can directly participate in the prediction process. Given an input medical image x, the goal of image classification is to produce its class y ∈ {0, 1, . . . , N − 1}, where N is the number of classes. Existing deep neural networks for image classification usually extract discriminative features (representations) through a layer-by-layer structure, and directly yield the class from extracted features by multilayer perceptrons (MLPs). We introduce the class knowledge bank into the traditional pipeline to enable explicit utilization of class knowledge in the training data. As shown in Figure 1 , the input image x is first fed to an encoder and then a projection head to extract a high-level where D is the dimension of the extracted representation. The projection head is introduced for two main purposes. Firstly, it can transfer the representation extracted by the encoder to a space that is suitable for contrastive learning, and thus improves the quality of the representation. Secondly, the representation from the encoder does not contain information specific to diabetic foot images as the encoder is pre-trained on a large-scale natural image dataset and froze when training the proposed network. The projection head can learn specific useful information from the diabetic foot dataset to improve classification performance. The extracted representation r is then used to compute the similarity with units in CKBs. The computed similarity can be regarded as the logits and used to build a contrastive loss to train the network. Each class of images has its properties, such as color and structure, which can be used to distinguish them from other classes of images. The knowledge of a class should contain these properties from different perspectives to comprehensively describe the class. The CKB method is proposed to achieve this goal and one CKB is designed to represent the knowledge of one class. A CKB consists of a number of units that can represent class knowledge from different perspectives. Each unit in the CKB is of the same size as the extracted representation r and there are M units in a CKB. The size of a CKB is thus M × D. For image classification with N classes, we need N CKBs to store all the class knowledge. A CKB for class i can be represented as C i where u denotes the unit in the CKB. The average similarity s i between the C i and the extracted representation r can be measured by the mean similarity across units where cos (·, ·) is the cosine similarity A large s i indicates the representation is close to the class i, which means the input x has a high probability of class i. The measured similarities between the representation and the CKBs can be seen as the logits of the input image, and thus can be used to compute probabilities of classes through softmax function where p i is the probability of class i. Based on the similarities, we define the following contrastive loss Label y serves as the index of the correct class. Minimizing this contrastive loss is equivalent to maximizing the similarity between r and units in the correct CKB, and to minimizing the similarity between r and units in other CKBs. The final training loss is a combination of the above contrastive loss and cross-entropy loss L CEL : where s {s 0 , s 1 , . . . , s N−1 } are logits (represented by averaged similarities in (3)) of the input. The units in CKBs are randomly initialized and then optimized through back-propagation. The network will try to extract the class knowledge into the CKBs with the objective of minimizing the designed contrastive loss in (6). In this way, the proposed CKB method is more effective in utilizing knowledge in the training data than existing contrastive learning methods, e.g., end-to-end mechanism Oord et al. (2018) , memory bank Wu et al. (2018) and momentum contrast He et al. (2020) . This effectiveness is mainly derived from two aspects. Firstly, the proposed CKBs do not rely on a large number of specific samples. Instead, CKBs can learn to extract class knowledge and represent them by the units in the CKBs. Since the CKBs are optimized on the whole training dataset, they contain more comprehensive knowledge than some specific samples. Secondly, a small number of, e.g., 64, units in a CKB can represent the knowledge of one class very well, which can greatly reduce the computational complexity and memory usage compared with existing contrastive learning methods that usually require thousands of samples in one training iteration. Figure 2 compares the proposed method and existing popular image classification methods. In existing image classification networks, parameters are mainly weights that are trained via back-propagation using the training data. The classification is achieved by directly predicting class logits from the discriminative representation extracted from the encoder. In such process, class knowledge in the training data is not explicitly utilized as the network is trained to focus on extracting more discriminative representation from the input. As shown in Figure 2 , the network with the proposed CKBs has a different pipeline of producing class logits. The CKBs learn and represent class knowledge through units parameterized by vectors. Then the learned class knowledge in the CKBs explicitly participates in the classification process by measuring the similarity between the units in the CKBs and the representation of the input. In this way, the class knowledge in the training data not only implicitly functions through network weights but explicitly works in the form of class similarity. The encoder is concerned with extracting a discriminative representation from an input image. Training the network with an encoder from scratch on medical image datasets is not effective since medical datasets are usually comparatively small. Thus, we employ a pre-trained image classification network Touvron et al. (2020) that is trained on ImageNet as the encoder in our proposed network. This strategy is shown to be very effective for many medical image processing tasks when training datasets are small Shin et al. (2016) ; Goyal et al. (2020) ; Chen et al. (2021) . We further introduce a multilayer perceptron (MLP) projection head as in Chen et al. (2020c) to transform the output representation from the encoder into a suitable space for contrastive learning. As shown in Figure 3 , the input of the projection head is the representation r 0 extracted from the encoder, and its output is the transformed representation r used for the following contrastive learning. The MLP projection head includes three linear layers and the first two linear layers are followed by batch normalization (BN) Ioffe and Szegedy (2015) and ReLU activation Nair and Hinton (2010) . The output of the last linear layer is only processed by batch normalization without ReLU activation. We use the diabetic foot ulcer (DFU) dataset in Goyal et al. (2020) to evaluate the performance of the proposed method. The DFU dataset includes ischaemia and infection parts that were collected from the Lancashire Teaching Hospitals. There are 628 noninfection and 831 infection cases, and 1,249 non-ischaemia and 210 ischaemia cases in the dataset. It can be observed that class imbalance exists in this dataset. The collected images were labeled by two healthcare professionals and augmented by the natural data augmentation method which extracts region of interest (ROI) ulcers by a learn-based ROI localization method Goyal et al. (2018) . After augmentation, the ischaemia and infection parts include 9,870 and 5,892 augmented image patches, respectively. Figure 4 shows samples of infection and ischaemia images from this dataset. We use 5-fold crossvalidation and report on average performance and standard deviation. The proposed method is implemented by the deep learning library Pytorch Paszke et al. (2017) . We utilize the AdamW Loshchilov and Hutter (2017) algorithm as the optimizer to train models. The AdamW improves the generalization performance of the commonly used Adam algorithm Kingma and Ba (2014) . The learning rate and weight decay are initialized to be 5e-4 and 0.01, respectively. The step learning rate scheduler is employed with the step size of 2 and the decay factor of 0.6. We To investigate whether larger models can lead to better performance, we evaluate the performance of the above models with different layers. Small and base DeiT models are denoted as DeiT-S and DeiT-B. For fair comparisons, all the competing methods use three linear layers with dimension 512 (first and second layers are followed by ReLU) as their classifiers, where the objective of the classifiers is to yield the logits. The objective of the projection head in our method is to produce discriminative representations. The number of units in each CKB is 64. Batch normalization (BN) is not applied in the MLP classifiers for comparison methods, since BN degrades these networks' performances. For all methods, we use the models pretrained on ImageNet and freeze their parameters except for the parameters in the MLP classifiers, MLP projection head, and CKBs. We find freezing the pre-trained parameters leads to better performance than fine-tuning the whole network. For DeiT with knowledge distillation, there are two classifiers or projection heads that process the class token and distillation token, and the final prediction is the sum of two logits. We use accuracy, sensitivity, precision, specificity, F-measure, and area under the ROC curve (AUC) to measure the performance of the classification models. Table 1 presents the DFU infection classification performances of various methods. As shown in Table 1 , larger CNN models usually produce better results. The F-measure and AUC score of ResNet-152 are superior to those of ResNet-101. Similar results are also observed for RegNetY, where RegNetY-16GF achieves better performances than RegNetY-4GF and RegNetY-8GF. However, the performance differences for EfficientNet with different sizes are not significant, and the FIGURE 3 | Structure of projection head. Three linear layers are used and the first two linear layers are followed with batch normalization (BN) and ReLU activation. The input of the projection head is the representation r 0 extracted from the encoder, and its output is the transformed representation r used for contrastive learning. Frontiers in Bioengineering and Biotechnology | www.frontiersin.org February 2022 | Volume 9 | Article 811028 large model even performs slightly worse than small models. MoCo with the backbone of ResNet-50 performs better than the vanilla ResNet-50 for infection classification, showing that the contrastive learning method helps the network learns more discriminative representations for image classifications. Vision transformer-based DeiT models trained with knowledge distillation (denoted as DeiT-S-D and DeiT-B-D) perform better than CNN models. This is reasonable as DeiT-B-D is shown to perform better than the comparison CNN models on ImageNet classification task Touvron et al. (2020) . The superior performance of DeiT-B-D when transferred for the task of diabetic foot infection classification demonstrates its robustness. We also observe a phenomenon similar with performance for infection classification on all performance metrics except sensitivity. As can be seen from Table 1 , the proposed CKB-DeiT-B-D performs better than the latest vision transformer DeiT-B-D, and significantly better than other comparison CNN-based methods in terms of all the reported metrics except sensitivity. For instance, the proposed CKB-DeiT-B-D achieves the best F-measure of 78.20 and the best AUC score of 84.78, which are better than the results of 76.72 and 83.26 achieved by the second-best DeiT-B-D, and significantly better than the results of 75.85 and 83.02 achieved by the CNN-based RegNetY-16GF. The proposed CKB significantly improves the precision and specificity of DeiT-B-D, e.g., improving precision from 73.86 to 77.38 and specificity from 71.88 to 77.00. Also, CKB-DeiT-S-D that combines the CKB with the small DeiT with knowledge distillation performs better than the vanilla DeiT-S-D. Although the proposed CKB-DeiT-B-D performs slightly worse in terms of sensitivity, the performance improvements on all the other metrics demonstrate the superiority of the proposed method. In Figure 5 , we compare the ROC curves of the comparison methods. The methods that achieve the best AUC score over the networks with the same architecture but different layers are selected for comparison. It can be observed from Figure 5 that our proposed CKB-DeiT-B-D produces a better ROC curve than the comparison methods. The proposed method also achieves the best accuracy, sensitivity, F-measure, and AUC score on the DFU ischaemia dataset. As shown in Table 2 , the performances of different methods on the DFU ischaemia dataset are better than their performances on the DFU infection dataset since the characteristics of ischaemia are more discriminative as shown in Figure 4 . The precision and specificity of the proposed method are better than the CNN-based methods (ResNet, RegNetY, and EfficientNet) and contrastive learning method (MoCo) but inferior to the DeiT-B-D. The comparison methods all seem to produce high precision and specificity but significantly lower accuracy, sensitivity, and F-measure. The proposed CKB-DeiT-B-D produces more balanced results across all the reported metrics. The proposed CKB-DeiT-S-D achieves the best AUC score but the improvement of the ROC curve of our method shown in Figure 6 is not significant compared with DeiT-S-D. Overall, the proposed CKB using DeiT Touvron et al. (2020) as the encoder achieves the best infection and ischaemia classification performances in terms of most metrics. The main finding of this research is that better utilization of class knowledge in the training data can improve the performance of DFU image classifications. We have proposed an approach called class knowledge bank which can explicitly and effectively extract class knowledge from the training data and participate in prediction process in the testing. Experimental results have demonstrated the effectiveness of the proposed method in improving classification performances on both DFU infection and ischaemia datasets. Examples of classification results by the proposed method on the infection and ischaemia datasets are presented in Figures 7, 8 , respectively. Correctly classified ulcer images (true negative and true positive) are shown to have discriminative visual characteristics, which are useful for image-based classifications. For instance, true negative non-infection cases in Figure 7A are clean and dry, while true positive infection cases in Figure 7B are The proposed method is good at handling class imbalance than the comparison methods. As can be observed from Tables 1, 2, the specificity of the comparison methods is significantly worse than the sensitivity caused by the class imbalance on the infection dataset, while the proposed method can achieve high sensitivity and specificity simultaneously. Also, the proposed method produces more balanced sensitivity and specificity than the comparison methods on the ischaemia dataset. The advantage of the proposed method in handling imbalance data is derived from the structure of the class knowledge banks, where different CKBs have the same units which give the same importance to different classes. The proposed classification network is based on a pre-trained powerful encoder as training a network from scratch on a relatively small medical image dataset is not efficient. This is a limitation of the proposed network because its performance relies on the pre-trained encoder. We believe that one can achieve better DFU classification performances without relying on a pre-trained encoder when more training data are available. Another limitation is that the proposed method does not consider the contrastive idea in samples in the training data and units in class knowledge banks. Incorporating this idea into the proposed method may further improve DFU classification performances. This paper verifies the performance of the proposed method on the DFU infection and ischaemia datasets. It will be interesting to extend this research to wider areas such as other medical image classification tasks, including binary or multi-class classification problems. The proposed method also has the potential to work as an incremental learning method as we can train additional class knowledge banks for incremental classes. Its performance and characteristics for incremental learning remain further investigation in the future. In this paper, we proposed the method called the class knowledge banks (CKBs) which can effectively extract class knowledge from the training data and explicitly leverage the class knowledge in the testing. The proposed method is an alternative means to produce the logits instead of the usual linear classifiers in the literature. The CKBs leverage their units to extract and represent class knowledge from different perspectives and the similarities between the representation of the input and the corresponding CKBs can be regarded as the logits of the input. The CKB can be trained through back-propagation and be easily embedded into existing image classification models. Experimental results on the DFU infection and ischaemia datasets demonstrate the effectiveness of the proposed CKB in DFU image classifications. The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author. Understanding of a Convolutional Neural Network Medical Image Analysis Using Convolutional Neural Networks: a Review Diabetic Foot Ulcers and Their Recurrence International Diabetes Federation. IDF Diabetes Atlas. 7th edn.. Brussels, Belgium: International Diabetes Federation Multi-modality Fusion Learning for the Automatic Diagnosis of Optic Neuropathy Drinet for Medical Image Segmentation A Transfer Learning Based Super-resolution Microscopy for Biopsy Slice Images: the Joint Methods Perspective A Simple Framework for Contrastive Learning of Visual Representations Big Self-Supervised Models Are strong Semi-supervised Learners Discriminative Cervical Lesion Detection in Colposcopic Images with Global Class Activation and Local Bin Excitation Not-so-supervised: a Survey of Semi-supervised, Multi-Instance, and Transfer Learning in Medical Image Analysis Frontiers in Bioengineering and Biotechnology | www.frontiersin.org Transmed: Transformers advance Multi-Modal Medical Image Classification Imagenet: A Large-Scale Hierarchical Image Database An Image Is worth 16x16 Words: Transformers for Image Recognition at Scale A Deep Learning Approach for Colonoscopy Pathology Wsi Analysis: Accurate Segmentation and Classification The Deep Features and Attention Mechanism-Based Method to Dish Healthcare under Social Iot Systems: an Empirical Study with a Hand-Deep Local-Global Net Region of Interest Detection in Dermoscopic Images for Natural Data-Augmentation Recognition of Ischaemia and Infection in Diabetic Foot Ulcers: Dataset and Techniques Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs Deep Residual Learning for Image Recognition Momentum Contrast for Unsupervised Visual Representation Learning Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges Distilling the Knowledge in a Neural Network Densely Connected Convolutional Networks Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift A Survey on Contrastive Self-Supervised Learning Diabetic Foot Ulcers Adam: A Method for Stochastic Optimization Imagenet Classification with Deep Convolutional Neural Networks An Ensemble of fine-tuned Convolutional Neural Networks for Medical Image Classification Deep Learning Medical Image Classification with Convolutional Neural Network Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows Rectified Linear Units Improve Restricted Boltzmann Machines Representation Learning with Contrastive Predictive Coding Automatic Differentiation in Pytorch Designing Network Design Spaces You Only Look once: Unified, Real-Time Object Detection Transfer Representation Learning for Medical Image Analysis Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning Very Deep Convolutional Networks for Large-Scale Image Recognition Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks Training Data-Efficient Image Transformers & Distillation through Attention Attention Is All You Need Binary Tissue Classification on Wound Images with Neural Networks and Bayesian Classifiers Association of Diabetic Foot Ulcer and Death in a Population-Based Cohort from the united kingdom Residual Attention Network for Image Classification Area Determination of Diabetic Foot Ulcer Images Using a Cascaded Two-Stage Svm-Based Classification Contrastive Cross-Site Learning with Redesigned Net for Covid-19 Ct Classification Enhanced Assessment of the Wound-Healing Process by Accurate Multiview Tissue Classification Circumventing Outliers of Autoaugment with Knowledge Distillation Frontiers in Bioengineering and Biotechnology | www.frontiersin.org Unsupervised Feature Learning via Non-parametric Instance Discrimination A Weakly Supervised Semantic Segmentation Network by Aggregating Seed Cues: The Multi-Object Proposal Generation Perspective Deep Convolutional Neural Network Based Medical Image Classification for Disease Diagnosis Tokens-totoken Vit: Training Vision Transformers from Scratch on Imagenet Object Detection with Deep Learning: A Review All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication. This work was supported in part by Shanghai University of TCM under Grants 2019LK049 and RCPY0039, in part by Shanghai TCM-Integrated Hospital under Grants 18-01-04 and RCPY0028, and in part by Shanxi S&T under Grant 2021KW-07. Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Publisher's Note: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.Copyright © 2022 Xu, Han, Zhou, Wu, Xie and Xiang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.