key: cord-0458032-zynlal8i authors: Huang, Weikai; Huang, Yijin; Tang, Xiaoying title: LesionPaste: One-Shot Anomaly Detection for Medical Images date: 2022-03-12 journal: nan DOI: nan sha: bf0eafc5a5a7ed654e836195a050a841af8c14ad doc_id: 458032 cord_uid: zynlal8i Due to the high cost of manually annotating medical images, especially for large-scale datasets, anomaly detection has been explored through training models with only normal data. Lacking prior knowledge of true anomalies is the main reason for the limited application of previous anomaly detection methods, especially in the medical image analysis realm. In this work, we propose a one-shot anomaly detection framework, namely LesionPaste, that utilizes true anomalies from a single annotated sample and synthesizes artificial anomalous samples for anomaly detection. First, a lesion bank is constructed by applying augmentation to randomly selected lesion patches. Then, MixUp is adopted to paste patches from the lesion bank at random positions in normal images to synthesize anomalous samples for training. Finally, a classification network is trained using the synthetic abnormal samples and the true normal data. Extensive experiments are conducted on two publicly-available medical image datasets with different types of abnormalities. On both datasets, our proposed LesionPaste largely outperforms several state-of-the-art unsupervised and semi-supervised anomaly detection methods, and is on a par with the fully-supervised counterpart. To note, LesionPaste is even better than the fully-supervised method in detecting early-stage diabetic retinopathy. In recent years, deep learning has achieved great success in the field of medical image analysis [12] . However, the effectiveness of deep representation learning techniques, such as convolutional neural networks (CNNs), is severely limited by the availability of the training data. Most of the disease detection methods in the medical image field are fully supervised and heavily rely on large-scale annotated datasets [10] . In general, acquiring and manually labeling abnormal data are more challenging and more expensive than normal data. Thus, a vast majority of anomaly detection methods in computer vision have been focusing arXiv:2203.06354v1 [eess.IV] 12 Mar 2022 on unsupervised learning through training detection models using only normal data, assuming no access to abnormal data at the training phase [1, 19, 22] . Typically, these methods use normal samples to train models to learn normality patterns and declare anomalies when the models have poor representation of specific test samples [15] . For instance, training an autoencoder that reconstructs normal samples by minimizing the reconstruction error can be used to detect anomalies when the reconstruction error is high in testing [23, 25] . Generative models aim to learn a latent feature space that captures normal patterns well and then defines the residual between real and generated instances as the anomaly score to detect anomalies [16, 20] . Recently, some approaches have been explored by synthesizing anomalous samples. For example, CutPaste performs data augmentation by cutting image patches and pasting them at random locations in normal images to serve as coarse approximations of real anomalies for anomaly detection [9] . DREAM synthetically generates anomalous samples to serve as the input to its reconstruction network and calculates anomaly scores based on the reconstruction results [23] . Since there is no prior knowledge of the true anomalies, these methods generally use very simple and rough methods to synthesize anomalous samples, although of high effectiveness. A major limitation is that the anomaly score defined as the pixel-wise reconstruction error or the generative residual relies heavily on the assumption on the anomaly distribution [18] . Therefore, these methods may not be sufficiently robust and generalizable in discriminating anomalies in real-life clinical practice. In such context, we propose LesionPaste, a one-shot anomaly detection (OSAD) method, which is the extreme case of few-shot anomaly detection [14, 21] . Namely, we train an anomaly detection network with only one annotated anomalous sample. Requiring only a single labeled anomalous sample, LesionPaste is highly flexible and accommodates well various settings even for rare diseases or other unusual instances. Our goal is to make use of the prior knowledge of true anomalies to synthesize artificial anomalous samples, at the cost of annotating anomalies in only a single anomalous sample. In LesionPaste, we first choose one annotated anomalous image and extract all lesion patches. Then, data augmentation is applied to the extracted lesion patches to construct a lesion bank. Afterwards, MixUp is employed to paste lesion patches from the lesion bank to normal images to synthesize artificial anomalous images. Finally, we train an anomaly detection network by discriminating synthesized anomalous images from normal ones. The performance of our proposed LesionPaste is evaluated on two publicly-available medical image datasets with different types of abnormalities, namely EyeQ [3] and MosMed [13] . The main contributions of this work are two-fold: (1) We propose a novel OSAD framework, namely LesionPaste, to utilize the prior knowledge of true anomalies from a single sample to synthesize artificial anomalous samples. To the best of our knowledge, this is the first work that synthesizes artificial anomalous samples using real anomalies from a single sample, for a purpose of anomaly detection. (2) We comprehensively evaluate our framework on two large-scale publicly-available medical image datasets including fundus images and lung CT images. The superiority of LesionPaste is established from the experimental results. The source code is available at https://***/***. As depicted in Fig. 1A , we first choose one annotated anomalous image and extract all lesions based on the pixel-wise lesion annotation. For the two anomaly detection tasks investigated in this work, the single annotated anomalous data are illustrated in Fig. 2 . After lesion extraction, the Connected Component Labeling algorithm [5] is adopted to identify each isolated lesion region from which a corresponding lesion patch is extracted. Following that, random resampling with repetition is carried out to select the lesion patches to be pasted. The number N of the to-be-pasted lesion patches is also randomly generated with N ∼ U (1, 1.5N L ), where N L is the total number of the isolated lesion regions. The selected lesion patches are then sent to a subsequent transformation block for lesion patch augmentation to construct a lesion bank, for a purpose of synthesizing more diverse anomalies. Data augmentations are conducted as shown in Fig. 1B , including flipping, rotation, resizing, contrast, and brightness changing, to generalize our anomalies from a single sample to various unseen anomalies during testing. We randomly sample a number of lesion patches from the lesion bank and paste them at random positions in the normal images to synthesize artificial anomalous images. Each normal image is used to generate one corresponding artificial anomalous image. The MixUp technique is initially proposed as a simple data augmentation method in [24] to regularize model complexity and decrease over-fitting in deep neural networks by randomly combining training samples. Extensive experiments have shown that MixUp can lead to better generalization and improved model calibration. As such, to have the pasted lesion fuse more naturally with the normal image, random MixUp is employed when we paste a lesion patch L to a normal image I. The image after MixUp I M U is defined as where M is the binary mask of the lesion patch,M is the inverse of M , denotes the pixel-wise multiplication operation and λ ∼ U (0.5, 0.8). After generating the artificial anomalous samples, as shown in Fig. 1C , together with the normal data, a CNN can be trained to detect anomalies. VGG16 initialized with ImageNet parameters is adopted as our backbone model. Let N denote a set of normal images, CE(·, ·) a cross-entropy loss, f (·) a binary classifier parameterized by VGG16 and AP(·) a LesionPaste involved augmentation operation, the training loss function of the proposed LesionPaste framework is defined as (2) In both training and testing phases, images are resized to be 256×256 and the batch size is set to be 64. We adopt an SGD optimizer with a momentum factor of 0.9, an initial learning rate of 0.001, and the cosine decay strategy to train the network. All network trainings are performed for 50 epochs on EyeQ and 100 epochs on MosMed, with fixed random seeds. All compared models and the proposed LesionPaste framework are implemented with Pytorch using NVIDIA TITAN RTX GPUs. EyeQ. EyeQ [3] is a subset of the famous EyePACS [8] dataset focusing on diabetic retinopathy (DR), consisting of 28792 fundus images with quality grading annotations. The quality of each image is labeled as "good", "usable", or "reject". In our experiments, we remove images labeled as either "usable" or "reject", ending up with 7482/865/8471 fundus images for training/validation/testing. According to the severity of DR, images in EyeQ are classified into five grades: 0 (normal), 1 (mild), 2 (moderate), 3 (severe), and 4 (proliferative) [11] . The class distribution of the training data is shown in Fig. A1 of the appendix. Images of grades 1-4 are all considered as abnormal. All normal images in the training set are used to train LesionPaste and all images in the testing set are used for evaluation. All fundus images are preprocessed [4] to reduce heterogeneity as much as possible, as shown in Fig. A1 of the appendix. IDRiD. IDRiD [17] consists of 81 DR fundus images, with pixel-wise lesion annotations of microaneurysms (MA), hemorrhages (HE), soft exudates (SE) and hard exudates (EX) [7] (see Fig. A2 MosMed. MosMed [13] contains human lung CT scans with COVID-19 related findings, as well as some health samples. A small subset have been annotated with pixel-wise COVID-19 lesions. CT slices containing COVID-19 lesions are considered as abnormal, ending up with 759 slices. A total of 2024 normal CT slices are selected and extracted from 254 health samples. For the anomaly detection task on this dataset, 5-fold cross-validation is used for evaluation. All CT images are preprocessed by windowing with a window level of -300 HU and a window width of 1400 HU to focus more on lung tissues [6] . The anomaly detection performance is evaluated using a commonly-employed metric, namely the area under the curve (AUC) of receiver operating characteristic (ROC), keeping consistent with previous anomaly detection works [9, 14, 23] . Different Annotated Samples. In this experiment, we only use the original lesion patches with no data augmentation, and evaluate the performance of Le-sionPaste with different annotated fundus images of IDRiD. In Fig. 3 and Table A1 of the appendix, we show the results of randomly selecting five different images from IDRiD for lesion extraction. Apparently, the difference in the single annotated anomalous image does not affect the anomaly detection performance of our LesionPaste, identifying the robustness of our proposed pipeline. We choose a representative image, idrid_48, as the single annotated fundus image for all subsequent experiments on EyeQ. Different Numbers of Annotated Samples. The influence of different numbers of annotated samples is also investigated (see Table A2 of the appendix). We observe that the more annotated samples, the better the anomaly detection performance, although the difference is not huge and the performance gradually reaches bottleneck. Balancing the anomaly detection performance and the cost of annotating lesions, we still use only one single annotated anomalous image. Data Augmentation Operations. In this experiment, we fix the randomly resampled lesions and their to-be-pasted positions, and then evaluate the importance of six augmentation operations and their compositions. From the top panel of Table 1 , we find that brightness works much better than each of the other five operations. As shown in the bottom panel of Table 1 , the composition of five augmentation operations other than color distortion achieves the highest AUC of 0.8126, which is even higher than that from using 10 annotated samples (an AUC of 0.8052). This clearly indicates the importance of data augmentation. We conjecture it is because DR lesions are tightly linked to the color information and color distortion may significantly destroy important lesion-related color information. So we apply a composition of the five augmentation operations, namely Flip, Contrast, Rotation, Resize, and Brightness, in all subsequent experiments. MixUp Coefficients. After identifying the optimal data augmentation strategy, the impact of different MixUp coefficients is analyzed. As shown in Table 2 , four different MixUp coefficients (three fixed and one random) are tested and the random MixUp coefficient λ ∼ U (0.5, 0.8) achieves the best performance. In Table 3 , we compare our LesionPaste method with state-of-the-art anomaly detection works. As shown in that table, our proposed LesionPaste significantly outperforms all unsupervised learning and semi-supervised learning methods under comparison, and even works better than the fully supervised counterpart in detecting DR of grade 1 with an AUC of 0.7348, grade 2 with an AUC of 0.8528, and all 1-4 grades combined with an AUC of 0.8216. Particularly for detecting DR of grade 1, dramatic improvements of LesionPaste over other methods are observed: an increase of 0.1893 on AUC over the 10-shot anomaly detection method DevNet [14] and an increase of 0.1091 on AUC over the fully supervised method. DR images of grade 1 contain only MA lesions which are extremely tiny in fundus images, and therefore DR of grade 1 is the most challenging anomaly to detect. However, in our LesionPaste, most of the synthesized DR images (80%) contain only MA lesions, forcing the classification CNN to learn the most difficult samples, so as to improve the performance on detecting DR images of grade 1. In Table 3 , LesionPaste also achieves the best result on MosMed. Statistically significant superiority of LesionPaste has been identified from DeLong tests [2] at a p-value of e −10 . Visualization results of the two anomaly detection tasks are shown in Fig. A3 . These results clearly demonstrate the applicability of Lesion-Paste to different anomaly detection tasks involving different types of diseases, different types of lesions, as well as different types of medical images. In this paper, we propose a novel OSAD framework for medical images, the key of which is to synthesize artificial anomalous samples using only one annotated anomalous sample. Different data augmentation and pasting strategies are examined to identify the optimal setting for our proposed LesionPaste. Compared with state-of-the-art anomaly detection methods, either under the unsupervised setting or the semi-supervised setting, LesionPaste shows superior performance on two medical image datasets, especially in the detection of early-stage DR, which even significantly outperforms its fully-supervised counterpart. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach Evaluation of retinal image quality assessment networks in different color-spaces Kaggle diabetic retinopathy detection competition report The connected-component labeling problem: A review of state-of-the-art algorithms Correlation of PET standard uptake value and CT window-level thresholds for target delineation in CTbased radiation treatment planning Lesion-Based Contrastive Learning for Diabetic Retinopathy Grading from Fundus Images Kaggle diabetic retinopathy detection competition Cutpaste: Self-supervised learning for anomaly detection and localization Applications of deep learning in fundus images: A review The SUSTech-SYSU dataset for automated exudate detection and diabetic retinopathy grading A survey on deep learning in medical image analysis Mosmeddata: Chest ct scans with covid-19 related findings dataset Explainable Deep Few-shot Anomaly Detection with Deviation Networks Deep learning for anomaly detection: A review Ocgan: One-class novelty detection using gans with constrained latent representations Indian diabetic retinopathy image dataset (IDRID): a database for diabetic retinopathy screening research Likelihood ratios for out-of-distribution detection Multiresolution knowledge distillation for anomaly detection f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks Few-shot anomaly detection for polyp frames from colonoscopy Reconstruction by inpainting for visual anomaly detection DRAEM-A discriminatively trained reconstruction embedding for surface anomaly detection mixup: Beyond Empirical Risk Minimization Encoding structure-texture relation with p-net for anomaly detection in retinal images