key: cord-0055514-oceowrby authors: Yoo, Tae Keun; Choi, Joon Yul; Kim, Hong Kyu title: Feasibility study to improve deep learning in OCT diagnosis of rare retinal diseases with few-shot classification date: 2021-01-25 journal: Med Biol Eng Comput DOI: 10.1007/s11517-021-02321-1 sha: 847e4f88bf5ae1d4d9c99a94b76fccdf7b355b0e doc_id: 55514 cord_uid: oceowrby Deep learning (DL) has been successfully applied to the diagnosis of ophthalmic diseases. However, rare diseases are commonly neglected due to insufficient data. Here, we demonstrate that few-shot learning (FSL) using a generative adversarial network (GAN) can improve the applicability of DL in the optical coherence tomography (OCT) diagnosis of rare diseases. Four major classes with a large number of datasets and five rare disease classes with a few-shot dataset are included in this study. Before training the classifier, we constructed GAN models to generate pathological OCT images of each rare disease from normal OCT images. The Inception-v3 architecture was trained using an augmented training dataset, and the final model was validated using an independent test dataset. The synthetic images helped in the extraction of the characteristic features of each rare disease. The proposed DL model demonstrated a significant improvement in the accuracy of the OCT diagnosis of rare retinal diseases and outperformed the traditional DL models, Siamese network, and prototypical network. By increasing the accuracy of diagnosing rare retinal diseases through FSL, clinicians can avoid neglecting rare diseases with DL assistance, thereby reducing diagnosis delay and patient burden. [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11517-021-02321-1. In the USA, a rare disease is generally defined as a condition with a prevalence of no more than one in 1250 individuals; however, the exact prevalence rate for most of these diseases is currently not available [1] . In primary care, a lack of awareness and cognitive factors are considered to be the main reasons for frequent misdiagnosis because clinicians cannot focus on all rare diseases at the same time [2] . Rare retinal diseases affect a limited number of patients; however, they impose a significant burden on society. Most patients with such retinal diseases often encounter diagnostic delays during the screening stage. However, recent artificial intelligencebased diagnostic or screening tools have targeted diseases that have a high prevalence, including diabetic retinopathy and age-related macular degeneration [3] . Because of the lack of sufficient clinical data, it is necessary to improve the accuracy of diagnosing rare retinal diseases [4] . Optical coherence tomography (OCT) is the most important diagnostic tool for screening rare retinal and optic nerve diseases, and it uses a light wave-based mechanism to provide three-dimensional retinal structural information [5] . Since the introduction of the deep learning (DL) algorithm, automated diagnosis for detecting multiple diseases from OCT imaging has attracted considerable attention [6] . However, previous studies using OCT images have been unable to detect rare diseases. Machine learning techniques have successfully improved clinical decision support in the field of ophthalmology [7, 8] . In particular, the recent availability of large volumes of retinal image data has enabled DL techniques to make significant contributions to diagnostic tasks [9] . However, conventional DL models are still unable to accurately extract disease characteristics from the insufficient clinical data that is available. The use of limited datasets for conventional deep learning training brings an over-fitting problem and may cause critically low classification performance in the validation set [10, 11] . As large quantities of data are labeled by clinicians, the current approaches have been limited to the few retinal diseases that have a high prevalence. These DL models may disregard the rare diseases for which they are not trained due to the lack of sufficient labeled data [12] . However, humans can learn new disease categories using a few characteristic images that are available. To accurately detect rare diseases using an automated system, this gap between humans and DL needs to be bridged. Recently, few-shot learning (FSL), which is a new research area in the field of machine learning, has been receiving increasing attention because it requires a limited amount of data for pattern extraction similar to human experts [13] . After the introduction of generative adversarial network (GAN) for data augmentation, the performance of FSL was significantly improved due to the generation of synthetic images [14] . This GAN-based FSL technique provides an intuitive solution for utilizing conventional DL methods that have generally been used for large databases. Recently, few-shot learning techniques have been adopted to diagnose rare diseases. Parbhu et al. showed that a prototypical network, which is a metric learning technique, is effective for dermatological disease diagnosis using real-world imbalanced datasets [15] . Quellec et al. used a similar metric learning technique using the K-nearest neighbor to classify fundus photographs with rare diseases [12] . Few-shot metric learning using Siamese networks has been used to detect plant diseases with very small datasets [16] . A gradient-based metalearning approach has been used to improve diagnostic performance with a few-shot skin disease dataset [17] . Burlina et al. demonstrated the feasibility of using low-shot learning based on automated data augmentation to classify fundus photographs with rare conditions [18] . Several researchers have utilized generative models to enlarge training datasets in order to improve the detection accuracy of diseases using very small datasets [19, 20] . Few-shot learning based on data augmentation has also been used to detect pathological chest images of patients with COVID-19 [21] . These previous studies demonstrated that few-shot learning techniques could achieve reliable performance and outperform classical machine learning models when using small training datasets. To the best of our knowledge, no study has been conducted on detecting rare diseases using the concept of FSL with OCT. Therefore, the purpose of this study is to build a convolutional neural network (CNN) model to detect rare diseases using OCT images. Because limited training data is available on rare retinal diseases, our approach was based on FSL using GANbased data augmentation. In particular, the cycle-consistent GAN (CycleGAN) was adopted to generate images without matching paired images. CycleGAN is a type of unsupervised machine learning model used for mapping different image domains, and it has demonstrated reliable performance in various academic fields. We conducted experiments to evaluate the qualitative effectiveness of our method and to validate this technique. We also compared the proposed method with other well-known few-shot learning techniques. This study was conducted using a publicly accessible OCT image database obtained from a previous study by Kermany [6] and additional anonymized OCT images of rare retinal diseases collected by the authors. Figure 1 illustrates the FSL methods used in our study. Our proposed method ( Fig. 1(b) ) involves transfer learning with GAN-based augmentation, which comprises two stages: (1) development of CycleGAN models for each rare disease for few-shot OCT image augmentation and (2) fine-tuning training and validation of the DL classification model. The backbone DL models for transfer learning were pretrained using the ImageNet database. Figure 2 shows the data distribution and typical OCT images of the major and rare diseases considered in this study. The large database obtained from Kermany's previous study (https://data.mendeley.com/datasets/rscbjbr9sj/2) consists of OCT images showing the characteristics of a normal retina as well as that of major retinal diseases [6] , including diabetic macular edema [22] , drusen [23] , and choroidal neovascularization [23] , which are considered to be highly prevalent diseases. This database was collected from various eye hospitals and includes labeling data confirmed by expert ophthalmologists. The detailed diagnosis procedure is described in Kermany's original work [6] . Additional retinal image datasets were extracted from Google Images and Google search engine by searching for keywords such as central serous chorioretinopathy, macular telangiectasia, macular hole, Stargardt disease, and retinitis pigmentosa. These rare diseases were selected based on a previous review on OCT diagnosis [24] . According to the Orphan database, central serous chorioretinopathy [25] , Stargardt disease [26] , and retinitis pigmentosa [27] are considered as rare retinal diseases [28] . Because macular telangiectasia [29] and macular hole [30] also have very low prevalence, it is reasonable to consider them as relatively rare diseases. The images showing the characteristics of these rare diseases were manually classified by two board-certified ophthalmologists with prior knowledge about data sources and related documents, and the ambiguous images were isolated to clarify the image domains. Since the OCT images fitted perfectly with the typical characteristics of each disease, OCT examination was sufficient to diagnose rare diseases in the present study. There was no disagreement between the two ophthalmologists. The OCT images with rare retinal diseases collected by our team are available at the Mendeley Data repository (https://data.mendeley.com/datasets/btv6yrdbmv). The detailed links of the collected OCT image sources are listed in Supplementary Materials. Table 1 shows the OCT characteristics and epidemiologic data of retinal diseases. The initial training dataset contained a total of nine classes, including 26,860 normal retinas, 11,348 diabetic macular edema, 8616 drusen, 37,205 choroidal neovascularization, 25 central serous chorioretinopathy, 20 macular telangiectasia, 25 macular hole, 15 Stargardt disease, and 12 retinitis pigmentosa images. The aim of extracting these extremely imbalanced datasets was to diagnose rare retinal diseases using the FSL framework. For the test dataset, we collected 250 normal retinas (sampled from the original test dataset to balance the major classes), 250 diabetic macular edema, 250 drusen, 250 choroidal neovascularization, 5 central serous chorioretinopathy, 4 macular telangiectasia, 5 macular hole, 4 Stargardt disease, and 4 retinitis pigmentosa datasets. The training and test datasets were split randomly, and they exhibited no overlap. FSL learns new patterns from a limited number of training datasets. There are mainly three popular categories of FSL, namely meta-learning, metric learning, and augment-based [31] . Inspired by previous works using GAN for FSL [32, 33] , we adopted CycleGAN-based augmentation for rare retinal diseases to increase the accuracy of diagnosis. CycleGAN was developed to overcome the limitation of paired data when two generators and two discriminators are used. Figure 3 shows the detailed framework of CycleGAN, which is considered to be a powerful DL technique that performs image domain transfer and face transfer. Because there is no database that includes both pathological OCT images and matched normal OCT images, supervised GAN techniques, such as conditional GAN and Pix2Pix, are not applicable in this study. CycleGAN is a type of unsupervised machine learning technique used for mapping different domains, and several researchers have already used it for few-shot and small data domain transfer [32] [33] [34] . The detailed mathematical implementation of CycleGAN is described in Supplementary Materials. We developed CycleGAN augmentation models for each rare retinal disease (central serous chorioretinopathy, macular telangiectasia, macular hole, Stargardt disease, and retinitis pigmentosa). The major classes did not require data augmentation because they had sufficient OCT images to train conventional DL models. Each CycleGAN model was trained based on two domains, including normal retina and one specific rare disease. The few-shot OCT images with rare diseases were augmented using both linear and elastic transformations. Linear transformation included left and right flip, width and height translation from −5 to +5%, random rotation from −30°to +30°, zooming from 0 to 20%, and random brightness change from −10 to +10%. Elastic transformation was achieved using a Gaussian kernel [35] . We defined this transformation as "the basic augmentation step." In our experience, 40% of the original images with basic augmentation should be retained for training the classifier. In this training step, 2000 normal retinal OCT images were randomly sampled from Kermany's study, and 2000 pathological images were generated by basic augmentation with few-shot samples. The five trained CycleGAN models translated normal OCT images to match the pathological images with each rare disease. Expert ophthalmologists reviewed the generated images and removed images possessing severe artifacts. A total of 5000 pathological OCT images, including 3000 CycleGANbased and 2000 basic augmented images, were prepared for each rare disease to train the diagnostic classifier model. To use a verified and pre-designed image generator, all the input images needed to be resized to a pixel resolution of 256 × 256 × 3, which is the basic setup of a CycleGAN. Therefore, we used the default parameter settings, that is, the ADAM optimizer with a batch size of 1, to optimize the GAN networks. To visualize the effect of CycleGAN-based augmentation, the t-distributed stochastic neighbor embedding (t-SNE) algorithm was executed using sampled instances. The feature vectors from the last layer of the pre-trained Inception-v3 model were extracted to train the t-SNE. After data augmentation for rare retinal diseases, we trained the deep CNN using the Inception-v3 model, which is the most popular DL network developed by Google, to build a multi-class diagnosis model. The Inception-v3 model has been used successfully in many previous studies, demonstrating state-of-the-art performance with a saliency map [6, 9] . Figure 4 shows the training and validation processes. The first validation scheme involved fivefold cross-validation using the entire dataset including training and test datasets ( Fig. 4(a) ). In Retinal layer break and tissue defect involving the fovea 0.11% [30] Relatively rare disease MacTel Temporal foveal cystic pit enlargement secondary to loss of outer nuclear layer and ellipsoid zone~0 .022% [29] Relatively rare disease Retinitis Pigmentosa mild inner retinal layer thinning and severe outer retinal layer thinning, cystic macular lesions 0.17% [27] Definitely rare disease* Stargardt disease disruption or complete loss of both outer retinal layers at fovea, thinning of whole retinal layers 0.01% [26] Definitely rare disease* CNV, choroidal neovascularization; CSC, central serous chorioretinopathy; DME, diabetic macular edema; MacTel, macular telangiectasia; OCT, optical coherence tomography * Included in the Orphanet rare disease database [28] this scheme, even during GAN training, the verification datasets were thoroughly separated from the training sets so that the GAN models could maintain full independence of the verification sets. Because the independent test dataset for the major classes was selected from Kermany's previous work [6] , the second scheme involved training the CNN model using the training set and validating it with the test dataset ( Fig. 4(b) ). The final training dataset for the diagnostic FLS models contained a total of nine classes (Fig. 2) , including 26,860 normal retina, 11,348 diabetic macular edema, 8616 drusen, 37,205 choroidal neovascularization, 5000 central serous chorioretinopathy, 5000 macular telangiectasia, 5000 macular hole, 5000 Stargardt disease, and 5000 retinitis pigmentosa images (Fig. 3) . A tenth of the training dataset was used as the validation set to estimate how well the model had been trained. We downloaded the Inception-v3 model, which was pre-trained on the ImageNet database, and performed fine-tuning of the weights of the pre-trained networks ( Fig. 1(b) ). This process generally keeps the weights of some bottom layers to avoid over-fitting and performs delicate modification of the high-level features. To use the images generated by CycleGAN for the CNN model, the size of the input images for the Inception-v3 model was resized to a pixel resolution of 299 × 299 × 3. The model was trained with an epoch of 250 and a batch size of 10. The ADAM optimizer was also used with a categorical cross-entropy loss. In our experiments based on transfer learning, it tuned a fully connected layer of the CNNs. The backbone convolutional layers of Inception-v3 were left frozen, and the last fully connected layer was trained using the ADAM optimizer. Because there is a growing demand for explainable artificial intelligence methods [36] , we adopted the Grad-CAM Google CoLab Pro, which is a cloud service for disseminating the DL research, was adopted to implement the CycleGAN and Inception-v3 models. Google CoLab Pro provides a development environment with Tensorflow-based DL libraries and a robust graphic processing unit (GPU). This enables rapid processing of a heavy DL network without the need for a personal GPU. For comparison, FSL techniques based on metric-learning were also implemented. A convolutional Siamese neural network was developed to find the relationship between two comparable classes [37] . Recently, researchers have reported that Siamese networks perform well in complicated FSL tasks with shared weights of the backbone CNN model [16] . We used Inception-v3 as identical subnetworks for the classes, and the Siamese network was designed as described in the MATLAB 2020b (MathWorks Inc., Natick, MA, USA) example (Fig. 2(c) ). In this study, both the prototypical network and K-nearest neighbor learn an embedding based on the Euclidean distance to classify a new instance. To reduce the feature space dimension, we used the Inception-v3 model trained without data augmentation as a backbone CNN model for both prototypical network and K-nearest neighbor techniques. The prototypical network learns a metric space by computing the distance to the prototype representations of each class (Fig. 2(d) ) [15] . We set the K value as 3 for the K-nearest neighbor model according to Quellec's work [12] . To verify that the segmentation of pathological lesions with few-shot rare disease data is possible, we built an additional segmentation CycleGAN model. The training process was based on a total of 72 ground-truth images, including the images of sampled major diseases and fewshot rare diseases. In these images, the sub-retinal fluid, intra-retinal cyst, and pigmented epithelial detachment were manually labeled by two board-certified ophthalmologists. We performed basic augmentation of these groundtruth images into 1000 images. Finally, 1000 augmented ground-truth segmentation images and 1000 randomly sampled pathological OCT images were used to train the segmentation CycleGAN model. The main focus of this study was the accuracy of the classification model. The performance of the Inception-v3 model was evaluated based on the accuracies of the whole classes and sub-group of rare diseases. The assessment of diagnostic performance for each class was based on the area under the receiver operating characteristic curve (AUC). To establish the performance of the imbalanced classification, we calculated the unweighted Cohen's κ values, relative classifier information (RCI), and Matthews correlation coefficient from all the classes [38, 39] . To evaluate our FSL from a clinical perspective, all the OCT images in the test dataset were reviewed by an independent expert ophthalmologist who did not have any prior information about the disease names, distribution, and sources. The basic augmentation step before training the GAN and I n c e p t i o n -v 3 m o d e l s w a s p e r f o r m e d u s i n g t h e imageDataAugmenter and imgaussfilt functions with a Gaussian kernel (with σ = 10 and α = 2) in MATLAB 2020b. We used CoLab's CycleGAN tutorial page to develop and validate the CycleGAN model. All these codes are available on the Tensorflow webpage (https://www.tensorflow. org/tutorials/generative/cyclegan). We modified the data input pipeline of the CycleGAN and Inception-v3 codes to import our dataset. We developed our DL model using CycleGAN-based augmentation in the challenging context of few-shot OCT images for rare diseases. First, the CycleGAN models generated OCT images with rare diseases, including central serous chorioretinopathy, macular telangiectasia, macular hole, Stargardt disease, and retinitis pigmentosa, using the initial training dataset. The final CycleGAN model for each rare disease was trained for 100 epochs, which required approximately 20 h in the CoLab Pro environment. After training, randomly sampled normal OCT images were translated into pathological images for augmentation while maintaining the structures of the choroid and peripheral retina. In the initial exploratory experiment, the number of CycleGAN-based augmented data was increased, and it yielded the highest performance at 5000 OCT images per rare disease class (2000 original images with basic augmentation and 3000 CycleGAN-based augmented images) as shown in Fig. 5. Additionally, Fig. 6 shows the acceptance rate for the synthetic OCT images used to train the deep learning model after review by the ophthalmologist. Stargardt disease and retinitis pigmentosa showed higher rejection rates than the other rare diseases. The main reasons for the rejection of the synthetic images were the overlapped feature, low quality, and mode collapse. The results of the t-SNE algorithm shows that the initial data without augmentation fails to visualize the minor groups with rare diseases (Fig. 7(a) ). After the CycleGAN-based augmentation for rare diseases, the minor groups were easily clustered with improved generalizability (Fig. 7(b) ). The use of CycleGAN-based synthetic images helped in the accurate extraction of the characteristic features of each rare disease, such as the sub-retinal fluid of central serous chorioretinopathy and cavitation of the inner retina in macular telangiectasia. During the image generation process, each case requires approximately 0.2 s for execution. Figure 8 shows examples of the pathological OCT images with rare diseases generated using the CycleGAN model. This feature generation based on normal OCT images can be effective for generating new samples to increase the intra-class variation of the rare disease classes. The overall classification performance of the deep learning models for the first validation scheme of the five-fold cross validation using the whole dataset is shown in Table 2 , and the best performance was observed in the transfer learning with GAN-based data augmentation (proposed DL model). The multiclass metrics of overall accuracy, Cohen's κ, RCI, and Matthews correlation coefficient pertaining to the best model were 93.9%, 0.910, 0.969, and 0.911, respectively. In the second validation scheme, the Inception-v3 model was trained using the final training dataset and validated using the test dataset. The training process required approximately 150 h for 250 epochs with fine-tuning for the proposed model. In our CycleGAN-based DL model, the accuracy of diagnosis, Fig. 5 Exploratory experiment for optimal data augmentation. Inception-v3 with augmentation using 5000 additional images for each rare disease (2000 original images with basic augmentation and 3000 CycleGAN-based augmented images) yielded the highest performance Fig. 6 The acceptance rate for synthetic OCT images to train the deep learning model. The images were generated by the highly tuned CycleGAN models for each rare disease. For each group, 100 synthesized samples were extracted randomly for evaluation by two ophthalmologists Cohen's κ, RCI, and Matthews correlation coefficient were 92.1%, 0.896, 0.983, and 0.897 for the test dataset, respectively (Table 3 ). Our proposed model demonstrated superior performance in comparison with the other FSL techniques. Regarding accuracy, the Siamese network and prototypical network showed lower classification performance than the transfer learning methods. A similar tendency was observed for Cohen's κ, RCI, and Matthews correlation coefficient values, demonstrating that our proposed model outperforms the other models in terms of multi-class classification. The accuracy of diagnosis, Cohen's κ, RCI, and Matthews correlation coefficient of the ophthalmologist without prior knowledge were 97.5%, 0.967, 0.956, and 0.968, respectively, and the diagnostic performance of the human expert was better Table 4 shows that the human expert conducted frequent misclassification of rare diseases, considering the true positive rates per class. The ophthalmologist's true positive rates per class for diagnosing central serous chorioretinopathy, macular hole, macular telangiectasia, retinitis pigmentosa, and Stargardt disease were 1.00, 1.00, 0.50, 0.25, and 0.50, respectively, whereas those of our proposed model were 1.00, 1.00, 1.00, 0.75, and 0.75, respectively. The detection performance of each disease was evaluated using the receiver operating characteristic curves (Fig. 9) . The AUCs of the DL models without augmentation, with only basic augmentation, and with the proposed GAN-based augmentation are not distinguishable in the major classes. In the detection of rare diseases, the individual performance of the DL models showed a significant improvement with our proposed GAN-based augmentation. We also generated a saliency map using the Grad-CAM technique by successfully visualizing the characteristic pathological features for the predicted evidence (Fig. 10) . Additionally, we performed experiments to evaluate the dataset imbalance using the test dataset. After GAN-based data augmentation, under-sampling was performed by random selection to control the data distribution. Figure 11 shows that controlling the distribution of the dataset did not have a significant impact on the classification results after data augmentation. The MobileNet-v2 and ResNet models demonstrated similar classification performance to that of the Inception-v3 model, which is used in this study (Supplementary Materials). Because our method requires a limited amount of data to train the CycleGAN model, it is expected to be highly applicable in the segmentation of OCT images of rare diseases. To determine the feasibility of our approach in a segmentation task, we also trained the CycleGAN model using 72 manually segmented OCT images and 1000 normal images ( Fig. 12(a) ). By considering the mean Dice score, data augmentation using 50 ground truth segmentation images could generate enhanced OCT images highlighting the pathological features with the mean Dice score of 0.784 ( Fig. 12(b) ). Although the training dataset includes few-shot ground-truth segmentation images of rare diseases, the results indicate that the pathological features, such as sub-retinal fluid, intra-retinal cyst, and pigmented epithelial detachment, were segmented Fig. 12(c) ). In this study, we investigated the feasibility of DL with a GAN technique for accurately detecting rare retinal diseases using OCT images. We found that CycleGAN-based augmentation could improve the diagnostic accuracy of rare diseases using a conventional DL model with an interpretable explanation via Grad-CAM. In addition, this GAN technique can be extended to segmentation tasks using small datasets. To the best of our knowledge, this is the first experimental study to construct a few-shot DL model for OCT images considering rare disease diagnosis using GAN-based augmentation. A recent study emphasized the large amount of OCT data required to train a DL model but did not investigate the feasibility of FSL in OCT imaging [40] . To address the limitations of traditional DL models, we first performed an experiment to explore the feasibility of FSL in the OCT imaging domain. We found that FSL could be a valuable tool for detecting rare retinal diseases. Our FSL model using GAN-based data augmentation performed better than an expert without prior knowledge in diagnosing rare diseases considering the true positive rate per class. This result strongly illustrated the feasibility of applying FSL to improve the diagnostic accuracy of rare diseases. Because there are less noisy features compared to other image domains such as skin [15] and fundus photographs [18] , OCT appears to be more suitable for image synthesis and few-shot learning. However, it is important to note that all the many synthetic images generated by the GAN models were not acceptable for use. Therefore, considerable effort and time to select acceptable images are needed to build an accurate DL model. Moreover, it will be a huge challenge to improve the diagnostic accuracy of both major and rare diseases to a very accurate level for real clinical application. This study aimed to increase the accuracy of DL in diagnosing rare retinal diseases while maintaining the diagnostic performance for major diseases. Several previous studies have focused on building DL models for the diagnosis of rare retinal diseases, including macular hole [41] , retinitis pigmentosa [42, 43] , and Stargardt disease [4] . However, these DL models were designed for binary classification using normal and pathological image data. Therefore, a multiclass classification DL model is necessary to detect not only rare diseases but also major diseases such as diabetic retinopathy and age-related macular degeneration [10, 44] . One study demonstrated that CNN could classify five classes of OCT images using a large dataset without augmentation [45] . A recent study using both segmentation and multiclass classification networks improved the performance using affine and elastic transformations [35] . Another study using fundus photographs demonstrated the applicability of the FSL model based on principal component analysis and k-nearest neighbor [12] ; however, this approach was limited by the lack of sufficient interpretability. This study established that the accuracy of DL models and the quality of the images generated using the few-shot setting decreases significantly with a decrease in the amount of available data. We succeeded in improving the accuracy of OCT diagnosis of rare diseases by using the GAN technique. The main limitation of DL models in diagnosing rare retinal diseases is the inability to generalize decision boundaries from a very small number of datasets. DL using the FSL technique enables the model to learn a new task with limited information from a few instances by incorporating prior knowledge [14] . FSL relieves the burden of collecting a large amount of labeled data on rare diseases. In the medical field, FSL can learn even from extremely imbalanced disease data distribution using prior knowledge [12] . To solve this problem, several methods such as meta-learning, metric learning, and data augmentation have been proposed [31] . As most FSL methods are based on pre-trained DL networks, they generally lack interpretability regarding their operation [46] . Previous studies have demonstrated that GAN can improve FSL models by generating training situations to learn better decision boundaries between categories [14] . Recent studies using CT and MRI datasets have shown that the GAN-based data augmentation technique significantly improves the performance of machine learning models [47, 48] . GAN has also been successfully applied to cancer cell classification with insufficient training data [49] . CycleGAN has been used to improve the breast mass classification accuracy using a small dataset [50] . Consistent with previous studies using GANbased augmentation, the accuracy of diagnosing rare retinal diseases was significantly improved using the CycleGAN model in the OCT domain. Unlike the studies aiming at developing new GAN-based CNN models to accommodate the limited number of datasets [49] , we used a standard CNN model that utilizes CycleGANbased augmentation. This method is advantageous because researchers can easily check the output images of CycleGAN to assess the accuracy of the DL model. Synthetic OCT images can generalize rare disease classes based on a variety of normal OCT images and can guide the CNN model to avoid over-fitting to specific images [51] . In addition, the trained standard CNN model can be easily combined with Grad-CAM to improve interpretability. Previous studies have shown that CycleGAN is effective in generating synthetic images with morphologic feature transformation and according to the number of training images. c Example of pathological feature segmentation results generated by CycleGAN. Pathological images include central serous chorioretinopathy, macular telangiectasia, diabetic macular edema, and choroidal neovascularization in performing the segmentation task using a small number of datasets [51, 52] . However, we established that synthetic images contain several artifacts; therefore, future studies should be directed at increasing the quality of synthetic images generated by GAN with few-shot setting. Further clinical validation of the resulting synthetic images using real-world data from clinics is also necessary. This study has several limitations. First, the OCT images generated by the CycleGAN model have a low resolution of 256 × 256 pixels. This is because CycleGAN incurs a high computational cost for training networks for highresolution applications. The low resolution may affect the classification results of the DL model [53] . Second, this study does not include a volumetric analysis for OCT. A recent study demonstrated that there is a lack of standardization in the OCT acquisition and analysis protocol [40] . Future studies should consider the variations in OCT images and devices. Third, the dataset includes a limited number of rare disease classes. Although we attempted to collect rare disease data from web-based sources, we could not include all the retinal diseases that have been reported in the existing literature. A recent study demonstrated that the conventional DL model can classify over 100 disease classes if the data is prepared for training [54] . We believe that our CycleGAN-based augmentation for rare diseases can be adopted to address similar classification problems with a large number of classes. In summary, our DL model using GAN was useful in improving the accuracy of OCT diagnosis of rare retinal diseases while maintaining the diagnostic performance for major diseases. In particular, the CycleGAN-based augmentation was effective for the generalization of few-shot OCT images of rare diseases to avoid over-fitting. Thus, by increasing the accuracy of diagnosing rare retinal diseases via FSL, clinicians can avoid neglecting rare diseases with DL assistance, thereby reducing diagnosis delay and social burden of patients. The online version contains supplementary material available at https://doi.org/10.1007/s11517-021-02321-1. Author contribution Tae Keun Yoo and Joon Yul Choi conceived and designed this study; Tae Keun Yoo and Joon Yul Choi analyzed and described the data; Joon Yul Choi and Hong Kyu Kim collected the data; and all the authors contributed to the writing and approval of the final manuscript. Ethical approval All procedures were performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This article does not contain any studies with human participants performed by any of the authors. This study did not require ethics committee approval; this is because the researchers used open web-based and deidentified data. Why rare diseases are an important medical and social issue Can a decision support system accelerate rare disease diagnosis? Evaluating the potential impact of Ada DX in a retrospective study Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning Automated classification of normal and Stargardt disease optical coherence tomography images using deep learning A deep-learning approach for automated OCT en-face retinal vessel segmentation in cases of optic disc swelling using multiple en-face images as input Identifying medical diagnoses and treatable diseases by image-based deep learning Machine learning techniques in clinical vision sciences Adopting machine learning to automatically identify candidate patients for corneal refractive surgery Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs Multicategorical deep learning neural network to classify retinal images: a pilot study employing small database Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification Automatic detection of rare pathologies in fundus photographs using few-shot learning Few-shot learning-based human activity recognition Metagan: an adversarial approach to few-shot learning Prototypical clustering networks for dermatological disease diagnosis Few-shot learning approach for plant disease classification using images taken in the field Meta-DermDiagnosis: fewshot skin disease identification using meta-learning Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases Zero-and few-shot learning for diseases recognition of Citrus aurantium L. using conditional adversarial autoencoders Toward automated severe pharyngitis detection with smartphone camera using deep learning networks Novel coronavirus-infected pneumonia on CT: a feasibility study of few-shot learning for computerized diagnosis of emergency diseases Prevalence of and risk factors for diabetic macular edema in the United States The prevalence of age-related maculopathy in Iceland: Reykjavik eye study Clinical applications of spectral domain optical coherence tomography in retinal diseases Association of corticosteroid use with incidence of central serous chorioretinopathy in South Korea Frequency, genotype, and clinical spectrum of best vitelliform macular dystrophy: data from a National Center in Denmark Prevalence of retinitis pigmentosa in south Indian population aged above 40 years Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database The prevalence estimates of macular telangiectasia type 2: the Melbourne collaborative cohort study Prevalence of diagnosed macular hole, macular pucker, vitreomacular adhesions/traction, retinal tear/detachment, and pterygium in US health care claims databases Few-shot learning: a survey Joint pose and expression modeling for facial expression recognition Improving few-shot user-specific gaze adaptation via gaze redirection synthesis Few-shot unsupervised image-to-image translation Clinically applicable deep learning for diagnosis and referral in retinal disease Explainable machine learning approach as a tool to understand factors used to select the refractive surgery technique on the expert level Using a convolutional Siamese Network for image-based plant species identification with small datasets Biomim Basel Switz 5 The possibility of the combination of OCT and fundus images for improving the diagnostic accuracy of deep learning for age-related macular degeneration: a preliminary experiment Comparing two K-category assignments by a Kcategory correlation coefficient Methodological challenges of deep learning in optical coherence tomography for retinal diseases: a review Accuracy of deep learning, a machine learning technology, using ultra-wide-field fundus ophthalmoscopy for detecting idiopathic macular holes Accuracy of a deep convolutional neural network in detection of retinitis pigmentosa on ultrawidefield images Application of a deep machine learning model for automatic measurement of EZ width in SD-OCT images of RP Multi-retinal disease classification by reduced deep learning features Deep learning-based automated classification of multi-categorical abnormalities from optical coherence tomography images F-VAEGAN-D2: a feature generating framework for any-shot learning GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification Infinite brain MR images: PGGAN-based data augmentation for tumor detection TOP-GAN: stain-free cancer cell classification using deep learning with a small training set Improving breast mass classification by shared data with domain transformation using a generative adversarial network Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks A generative adversarial network approach to predicting postoperative appearance after orbital decompression surgery for thyroid eye disease Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations He received his diploma in Mechanical Engineering from Seoul National University and received the Doctor of Medicine from Yonsei University. His research interest concerns ophthalmology and machine learning Joon Yul Choi is currently doing his post-doc at Epilepsy Center in Cleveland Clinic He received the Doctor of Medicine from Yonsei University and is a PhD student at Yonsei University. His areas of research interest are big data and data analysis Acknowledgements This work was technically assisted by Dr. Ik Hee Ryu and VISUWORKS, Inc., which is a Korean AI startup providing medical machine learning solutions. The authors declare that they have no conflict of interest.