key: cord-0918471-k7cvtxwk authors: Wang, Ting-Yuan; Chen, Yi-Hao; Chen, Jiann-Torng; Liu, Jung-Tzu; Wu, Po-Yi; Chang, Sung-Yen; Lee, Ya-Wen; Su, Kuo-Chen; Chen, Ching-Long title: Diabetic Macular Edema Detection Using End-to-End Deep Fusion Model and Anatomical Landmark Visualization on an Edge Computing Device date: 2022-04-04 journal: Front Med (Lausanne) DOI: 10.3389/fmed.2022.851644 sha: 956fa7379b44c93a529e1dbf44f22ae3fe80002c doc_id: 918471 cord_uid: k7cvtxwk PURPOSE: Diabetic macular edema (DME) is a common cause of vision impairment and blindness in patients with diabetes. However, vision loss can be prevented by regular eye examinations during primary care. This study aimed to design an artificial intelligence (AI) system to facilitate ophthalmology referrals by physicians. METHODS: We developed an end-to-end deep fusion model for DME classification and hard exudate (HE) detection. Based on the architecture of fusion model, we also applied a dual model which included an independent classifier and object detector to perform these two tasks separately. We used 35,001 annotated fundus images from three hospitals between 2007 and 2018 in Taiwan to create a private dataset. The Private dataset, Messidor-1 and Messidor-2 were used to assess the performance of the fusion model for DME classification and HE detection. A second object detector was trained to identify anatomical landmarks (optic disc and macula). We integrated the fusion model and the anatomical landmark detector, and evaluated their performance on an edge device, a device with limited compute resources. RESULTS: For DME classification of our private testing dataset, Messidor-1 and Messidor-2, the area under the receiver operating characteristic curve (AUC) for the fusion model had values of 98.1, 95.2, and 95.8%, the sensitivities were 96.4, 88.7, and 87.4%, the specificities were 90.1, 90.2, and 90.2%, and the accuracies were 90.8, 90.0, and 89.9%, respectively. In addition, the AUC was not significantly different for the fusion and dual models for the three datasets (p = 0.743, 0.942, and 0.114, respectively). For HE detection, the fusion model achieved a sensitivity of 79.5%, a specificity of 87.7%, and an accuracy of 86.3% using our private testing dataset. The sensitivity of the fusion model was higher than that of the dual model (p = 0.048). For optic disc and macula detection, the second object detector achieved accuracies of 98.4% (optic disc) and 99.3% (macula). The fusion model and the anatomical landmark detector can be deployed on a portable edge device. CONCLUSION: This portable AI system exhibited excellent performance for the classification of DME, and the visualization of HE and anatomical locations. It facilitates interpretability and can serve as a clinical reference for physicians. Clinically, this system could be applied to diabetic eye screening to improve the interpretation of fundus imaging in patients with DME. We developed an end-to-end deep fusion model for DME classification and hard exudate (HE) detection. Based on the architecture of fusion model, we also applied a dual model which included an independent classifier and object detector to perform these two tasks separately. We used 35,001 annotated fundus images from three hospitals between 2007 and 2018 in Taiwan to create a private dataset. The Private dataset, Messidor-1 and Messidor-2 were used to assess the performance of the fusion model for DME classification and HE detection. A second object detector was trained to identify anatomical landmarks (optic disc and macula). We integrated the fusion model and the anatomical landmark detector, and evaluated their performance on an edge device, a device with limited compute resources. Results: For DME classification of our private testing dataset, Messidor-1 and Messidor-2, the area under the receiver operating characteristic curve (AUC) for the fusion model had values of 98.1, 95.2, and 95.8%, the sensitivities were 96.4, 88.7, and 87.4%, the specificities were 90.1, 90.2, and 90.2%, and the accuracies were 90.8, 90.0, and 89.9%, respectively. In addition, the AUC was not significantly different for the fusion and dual models for the three datasets (p = 0.743, 0.942, and 0.114, respectively). For HE detection, the fusion model achieved a sensitivity of 79.5%, a specificity of 87.7%, and an accuracy of 86.3% using our private testing dataset. The sensitivity of the fusion model was higher than that of the dual model (p = 0.048). For optic disc and macula detection, the second object detector achieved accuracies of 98.4% (optic disc) and 99.3% (macula). The fusion model and the anatomical landmark detector can be deployed on a portable edge device. Diabetes is a prevalent disease that affects ∼476 million people worldwide (1) . Diabetic macular edema (DME), characterized by the accumulation of extracellular fluid that leaks from blood vessels in the macula (2) , is one of the complications of diabetes mellitus. DME can appear at any stage of diabetic retinopathy (DR) and is the leading cause of severe vision loss in working-age adults with diabetic mellitus (3) . The Early Treatment of Diabetic Retinopathy Study (ETDRS) defined the criteria for DME and demonstrated the benefits of laser photocoagulation therapy (4) . Currently, with the revolutionary development of intraocular medication, intravitreal injections of anti-vascular endothelial growth factor (anti-VEGF) and steroid agents are the first-line treatment as alternatives to traditional laser photocoagulation as they provide better vision recovery in patients with centerinvolved macular edema (5) (6) (7) . Early diagnosis plays an important role in DME treatment. Moreover, early management such as intensive diabetes control may reduce the risk of progressive retinopathy (8) . Early diagnosis and preemptive treatment are facilitated by frequent diabetic eye screening, which reduces the risk of progression to blindness, and the associated socioeconomic burden. To date, owing to developments in the field of ophthalmic imaging, the detection of DME using optical coherence tomography (OCT) imaging is the gold standard in the decision-making process for DME treatment (9) . However, limited by various factors, such as the requirements of expensive equipment and highly specialized technicians, OCT imaging is typically readily available in high-income countries. In contrast, retinal photography examination is feasible and affordable in low-income countries and remote areas (10) . However, the number of people with diabetes worldwide is increasing yearly and is estimated to reach 571 million by 2025 (1). The rapid growth of diabetic patients is expected to increase the diagnostic burden associated with DME detection. As such, an efficacious and accurate automatic fundus imaging interpretation system is urgently needed. In the past decade, several studies have focused on DME detection using feature engineering techniques, which extract features by selecting or transforming raw data. Among them, Siddalingaswamy et al. (11) identified DME by detecting hard exudates (HE) and the macula. Subsequently, decisions were made based on the distance between the HE and the macula. Machine learning algorithms have also been applied in several studies for feature extraction in DME classification (12) (13) (14) (15) . The advantage of feature engineering is that it utilizes a smaller training dataset to achieve satisfactory performance. However, the identification of salient and useful features depends on the experience of clinicians and is thus subjective and limited. In contrast to feature engineering techniques, deep learning, particularly convolutional neural networks (CNNs), is gaining popularity and has achieved significant success in medical imaging applications. This approach can automatically learn feature extraction by using a backbone network mainly comprising convolutional and pooling layers. Several studies have shown that various architectures of CNN can be used to effectively extract features in fundus images for subsequent classification of DR or DME (16) (17) (18) (19) (20) (21) . Moreover, given that deep learning models lack interpretability and are viewed as black boxes (22), visualization of the lesion in fundus images is an important issue. Lesion visualization can improve the interpretability of nonophthalmologist physicians. In addition, visualization is useful to physicians during an initial assessment before a patient is referred to an ophthalmologist for further evaluation, thereby substantially increasing the screening rate and reducing the workload of ophthalmologists. In addition, lesion visualization could help physicians to monitor the status and progression of the disease. Generally, deep learning models are implemented in cloud computing environments or high-end computers, which provide more computing power and memory space. However, this is usually expensive and requires considerable network resources. These factors limit the application of deep learning models for medical image analysis in remote or resource-limited areas. Thus, an edge device is potentially suitable for the application of deep learning models for medical image analysis in these areas. Previous studies have demonstrated the feasibility of deploying deep learning models for medical image analysis on edge devices (23) (24) (25) . However, a system with multiple models for disease classification and visualization requires more computing power and memory. Thus, the implementation of such a system on an edge device is challenging. In this study, we designed an end-to-end deep fusion network model to perform two deep learning tasks, one for the classification of DME and the other for the visualization of HE lesions. We used a private dataset and two open datasets to evaluate the performance of this fusion model. We also added a second object detector model to identify anatomical landmarks (optic disc and macula). These models were deployed on an edge device. The private dataset was used to assess the performance of the models. Overall, this system could be used for diabetic eye screening by non-specialist physicians or in remote or resourcelimited areas to improve the early diagnosis of DME. As a result, diabetic patients may be referred for early assessment and appropriate treatment, which should lead to better outcomes. We enrolled patients who had a diagnosis of diabetic mellitus according to the ICD-9 codes 250.xx or ICD-10 codes E10-E14 between 2007 and 2018 from three medical centers in Taiwan. Patients younger than 20 years of age and with unknown sex were excluded. The retinal photographs were acquired from ZEISS (VISUCAM 200), Nidek (AFC-330), and Canon (CF-1, CR-DGI, CR2, or CR2-AF) fundus cameras with a 45 • field-ofview (FOV) and anonymized owing to the retrospective nature of the study. We collected 347,042 fundus images from 79,151 diabetic patients. For the present study, we included image with optic disc and macula to develop models. Blurred fundus image, vitreous hemorrhage, vitreous opacity, image without entire optic disc, image without entire macula, image without optic disc and macula, other retinal diseases, and low-quality image were excluded, and 101,145 fundus images from 51,042 diabetic patients were left for random sampling and annotations. Finally, 35,001 fundus images from 15,607 patients formed our private dataset for model development (The flowchart shown in Figure 1 ). On our private dataset, the mean age of patients was 57.6 ± 11.8 years and 54.5% were males and 45.5% were females. Eight thousand four hundred and ninety-six patients took only one image and 7,111 patients took more than one image from each eye. The original dimension of the images were 522,728 pixels (724 × 722) to 12,212,224 pixels (4,288 × 2,848). All images were the JPG image format. The study was reviewed and approved by the institutional review board (IRB) of the three medical centers: Tri-Service General Hospital (IRB: 1-107-05-039), Chung Shan Medical University Hospital (IRB: CSH: CS18087), and China Medical Hospital (IRB: CMUH10FREC3-062). Given that the identities of all patients in three medical centers were encrypted before fundus images were released, the requirement for signed informed consent of the included patients was waived. Annotating DME Classification for Fundus Image We recruited 38 ophthalmologists to annotate the fundus images. Each fundus image was annotated by a group of three ophthalmologists. According to the criteria of ETDRS, DME was defined as any HE at or within 1 disc diameter (1DD) of the center of the macula (4). Each ophthalmologist annotated images by using our annotation tool. We used the majority decision of the three ophthalmologists as the ground truth (GT) of the fundus images. Further, the dataset was split into training, validation, and testing sets by patient level to prevent the same patient in different sets (Figure 1 ). Eight thousand four hundred and ninety-six of 15,607 patients took only one image and were randomly sampled to validation set (1,266 patients, 1,266 images) and testing set (1,049 patients, 1,049 images). The rest of these patients and 7,111 of 15,607 patients were reserved as a training set (13,292 patients, 32,686 images). Table 1 lists the DME and non-DME profiles of these three subsets. The HE lesions in each fundus image were also annotated by a group of three ophthalmologists (randomly chosen from 38 ophthalmologists) using a bounding box format. However, three resulting annotations may be different from each other in the number, size, and location of the boxes. We adopted the following procedure to obtain a final GT image for training purposes: (Step 1) The bounding boxes for the image labeled by two ophthalmologists were compared. If an HE lesion was annotated and the intersection over union (IoU) > 0.15, then a larger annotated area was taken as the GT; (Step 2) The bounding boxes of an image labeled by two ophthalmologists were compared. If the HE lesion was annotated and the IoU ≤ 0.15, then both bounding boxes were retained as the GT. After step 1 and 2, we obtained the first GT image as shown in Figure 2 . Step 3: First GT image was compared with the image labeled by the third ophthalmologist according to the same method in steps 1 and 2. Then, we obtained the final GT image. In this study, ophthalmologists used bounding boxes to annotate HE lesions in fundus image. The size of the annotated bounding boxes in original images were 196 ,672 pixels (9,791.57 ± 36,966.28 pixels). After resized the image, the size of the annotated bounding boxes in model's input images were 1.50-190,008.85 pixels (1,002.99 ± 2,719.79 pixels). However, the annotated bounding boxes only indicated whether existed HE lesions and location, not represented the true size of HE lesions. Therefore, the size of the bounding boxes was usually larger than the true size of the HE lesions. In addition, the profiles of HE labels of the three subsets are shown in Table 2 . Two open datasets were used to evaluate the performance and ability of the proposed model to adapt to different datasets. Messidor-1 The Messidor-1 (26) dataset contained 1,200 fundus images from three ophthalmologic departments in France and was annotated with DR and the risk of DME. All images were acquired using a Topcon TRC NW6 non-mydriatic retinal camera with a 45 • FOV. Our grading scheme was slightly different from that of Messidor-1, in which DME was graded according to three categories, with 0, 1, 2 representing "no visible HE, " "HE presence at least 1DD away from the macula, " and "HE presence within 1DD from the macula, " respectively. As previously indicated, HE that occurs within 1DD of the center of the macula can serve as a proxy for detecting DME; hence, grades 0 and 1 are equivalent to non-DME and grade 2 is equivalent to DME in our classification scheme. The Messidor-2 (26, 27) dataset, as an extension of the Messidor-1 dataset, contained 1,748 (1,744 annotated as gradable) fundus images. In this study, we used 1,744 graded fundus images from the annotated Messidor-2 dataset by Krause et al. (28) . We use EfficientDet-d1 (29) as the object detector because of its great balance between performance and resource usage. Because EfficientDet-d1 employs the feature extraction part of EfficientNet-b1 (30), we can readily use this aspect as the backbone in the fusion model. Lesion detection was implemented using bi-directional feature pyramid network (BiFPN). The classification module consisted of three layers and included a convolutional layer, a global average pooling layer, and a fully connected (FC) layer. The architecture of the fusion model is shown in Figure 3 . The fusion model is computationally efficient, as only one convolution layer is needed to extract higher-level features based on the output features obtained from the EfficientDet-d1 backbone. We denote E ob as the loss function of EfficientDet-d1, E cl as the loss function of the classification module, and the loss function for the fusion model is given by Equation (1), where ω ob > 0 and ω cl > 0, which are hyperparameters used to linearly combine the loss functions of the object detector and classifier. First, we use the equal weights for ω ob and ω cl in the initial training. Then analyzing the loss value obtained from the object detection model and the classification model. Second, we use the weighting factor (ω ob and ω cl ) that is inversely proportional to the loss value of the classifier or object detector to balance the loss, respectively. Finally, we retrain the fusion model using ω ob (= 0.5) and ω cl (= 100) to balance the loss obtained from both models, and avoid overfitting in the classification model or the object detection model. Our results showed that the setting ω ob =0.5 and ω cl =100 achieved a satisfactory balance. The parameters α ≥ 0 and γ ≥ 0 were also heuristically set to address the large class imbalance encountered during training. In general, α, the weight assigned to the rare class, should be slightly reduced as γ is increased (31) . Here we used γ = 2, α = 0.25 as a default setting. The variable p t is defined in Equation (2), where p is the estimated probability for the binary classification. For comparison with the fusion model, we implemented a dual model, which consisted of two separate models including an image classifier and an object detector. The two separate models were trained and inferred separately. We used EfficientNet-b1 and EfficientDet-d1 as the image classifier and object detector, respectively, in our dual model for a fair comparison. EfficientNet stacked basic fixed modules and adjusted some hyperparameters such as the number of layers, number of channels, and input image resolution, using a neural architecture search. In addition, EfficientNet achieved state-of-the-art performance on ImageNet without using additional data. All images of private dataset, Messidor-1, and Messidor-2 dataset were preprocessed before feeding our model. Each image was cropped to the fundus image with minimal black region (Supplementary Figure 1) and saved in the JPG image format. These cropped images were resized to input image sizes of 640 × 640 pixels. For image augmentation, we randomly flipped the images of private dataset vertically or horizontally. We trained and tested the model on an Intel Xeon E5-2660 v4 computer with 396 GB DRAM and NVIDIA Tesla V100 GPU using PyTorch with an initial learning rate of 0.0001, a dropout rate of 0. Figure 2) . To validate the feasibility of deploying our fusion model on an edge device, it was implemented on NVIDIA Jetson Xavier NX with 8GB of memory using PyTorch. For the evaluation of performance in DME classification, we used metrics of sensitivity, specificity, accuracy, and area under the FIGURE 3 | The architecture of the proposed end-to-end deep fusion model. The red arrow denotes the classification path, which forms the same architecture as EfficientNet-b1. The blue arrow denotes the lesion detection path, which has the same architecture as EfficientDet-d1. The number (640, 320, 160, …) near each feature map denotes its resolution. receiver operating characteristic curve (AUC). All metrics were listed with 95% confidence intervals (CIs). Receiver operating characteristic (ROC) curves were used to illustrate the overall performance using different cutoffs to distinguish between non-DME and DME. A two-proportion z-test was used to compare the two observed proportions obtained from the two models. The DeLong test (32) was used to compare the AUCs. Statistical significance was set at p < 0.05. In addition, we evaluated the performance of lesion detection according to Tseng et al. (20) . We trained the fusion model and dual model using the private dataset, and the performance was compared in three aspects: memory usage and execution time, DME classification, and HE detection. We investigated the demand for memory and the execution time of the fusion and dual models to process one image from the private testing dataset. We used a command-line utility tool (Nvidia-smi) to evaluate the requirement of memory usage of the fusion model and the dual model to process one fundus image. In addition, the required time of processing one fundus image was calculated by using Python code "time.time()". Table 3 shows that the fusion model required 1.6 GB of memory, whereas the dual model required 3.6 GB of memory. The mean required time of the fusion and dual model were 2.8 ± 1.5 s and 4.5 ± 1.8 s, respectively. This was averaged over the full testing dataset. These results show that the fusion model reduced the requirement The distribution of DME in a private dataset and two open datasets (Messidor-1 and Messidor-2) are shown in Figure 4 . In Table 4 , the performance of the fusion and dual models was evaluated using the AUC, sensitivity, specificity, and accuracy. The AUCs of both models were compared using the DeLong test for the three datasets. The result showed that there was no statistically significant difference between the models (p-values of 0.743, 0.942, and 0.114 for the private testing dataset, Messidor-1, and Messidor-2, respectively). Correspondingly, Figure 5 shows the results of the receiver operating characteristic curves (ROC) of both models for the three datasets. This result demonstrates that the performance of the fusion model is similar to that of the dual model. We used fusion and dual models to detect HE lesions on our private testing dataset. We evaluated the performance of these models by using true positive, false positive, true negative, and false negative to calculate the accuracy, sensitivity, and specificity. Note that in the HE lesion detection, a true positive image is defined as one of the predicted HE area having an IoU > 0.15 compared to the GT location (as shown in Figure 6 ); a true negative image is defined as both GT and prediction without any lesion detection; a false positive image is defined as GT without any lesion detection but with prediction; and a false negative image is defined as GT with at least one location but no prediction or any prediction location having an IoU ≤ 0.15. In Table 5 , the results of our private testing dataset revealed that the sensitivity of the fusion model was higher than that of the dual model, and the difference was statistically significant (p = 0.048). In addition, the specificity and accuracy of both models were not significantly different (p = 0.433 and p = 0.998, respectively). This result indicated that the fusion model could detect images with HE lesions more accurately. Furthermore, for lesion visualization, our models could output fundus image with the annotated HE lesion, as shown in Figure 7 . Based on the preceding results, we established a novel end-to-end fusion model that can simultaneously facilitate disease classification and lesion detection. Clinically, anatomical landmarks such as the optic disc and the macula are examined by physicians to determine if there are HE lesions within 1DD from the center of the macula. Thus, we constructed an object detector to detect anatomical landmarks to facilitate advanced visualization. We trained an object detector using YOLOv3 (33) to detect the optic disc and macula. The details of the training process are provided in the Supplementary Material. The accuracy of the object detector for the detection of the optic disc and macula was 98.4 and 99.3%, respectively. Furthermore, the object detector could identify the optic disc using a white bounding box and an area within 1DD from the center of the macula using a white circle. These outlined boxes and circles can be integrated into the image results as shown in Figure 7 . Figure 8 shows that physicians can instantly ascertain the presence of HE lesions within 1DD from the center of the macula, thereby enabling them to more reliably diagnose DME. Taken together, the results show that lesion visualization can more readily account for the result of DME classification when using the fusion model. To verify the feasibility of implementing the entire workflow on an edge device, we tested our fusion model and the anatomical landmark detector on NVIDIA Jetson Xavier NX with 8 GB of memory. The fusion model and the anatomical landmark detector required 7.4 ± 0.02 GB of memory and took 2.53 ± 0.72 s to infer a single fundus image on average. However, the combination of a dual model and an anatomical landmark detector cannot be implemented on edge devices owing to their memory constraints. In addition, we also tested the fusion model on DME classification of the three datasets and HE lesion detection using the NVIDIA Jetson Xavier NX with 8GB of memory. The performance for DME classification and HE lesion FIGURE 9 | Overview of the proposed approach for implementing a system on an edge device that integrates DME classification, HE detection, and optic disc and macula detection to assist in the interpretation of fundus image by physicians. detection using the NVIDIA Jetson Xavier NX 8GB of memory was the same as that of the Intel Xeon E5-2660 v4 computer, as shown in Tables 4, 5, respectively. In this study, we proposed a novel end-to-end fusion model to simultaneously facilitate DME classification and HE lesion detection. The performance of the fusion model for DME classification was similar to that of the dual model. The sensitivity of the fusion model for the detection of HE lesions was higher than that of the dual model. We further integrated the detection outputs from the fusion model and the anatomical landmark detector to improve lesion visualization. In addition, we implemented these two models on an edge device to facilitate portability and affordability in remote or resource-limited areas. As shown in Figure 9 , we report for the first time the integration of the fusion model and a second object detector on an edge device for DME classification, HE detection, and optic disc and macula detection, for lesion visualization and improved interpretability of the AI model. This system allowed physicians not only to obtain the results of DME classification but also to observe the location of HE lesions related to the macula. This might assist physicians in assessing the necessity of referring diabetic patients to ophthalmologists for further examination and treatment. Recently, several studies have used AI to classify DR with DME or DME only in the Messidor-1 and Messidor-2 datasets (16) (17) (18) (19) (34) (35) (36) (37) . In Messidor-1, Sahlsten et al. (18) proposed an approach based on the ensemble of CNNs with AUC of 95.3%, Sensitivity of 57.5%, Specificity of 99.5%, and Accuracy of 91.6% to detect referable DME. Singh (37) used an improved inception-v4 with AUC of 91.7% to detect referable DME. Compared to the performances of above studies in the Messidor-1 and Messidor-2 datasets, the performance of fusion model was AUC of 95.2 and 95.8%, Sensitivity of 88.7 and 87.4%, Specificity of 90.2 and 90.2%, and Accuracy of 90.0% and 89.9% in the Messidor-1 and Messidor-2 datasets. In this study, the classifier of the fusion model was constructed by integrating the EfficientDet-d1 backbone and a classification module. This classifier had the same architecture as EfficientNet-b1. It was determined that the performance of DME classification was similar to that of the original EfficientNet-b1 in the dual model. In fundus imaging, the determination of the presence and location of HE is useful for physicians in the diagnosis of DME. Several studies have used deep learning to detect HE lesions. Son et al. (38) used a class activation map (CAM) to generate a heatmap to identify the areas that contributed most to the model's decision in classifying DR and other ocular abnormalities. Lam et al. (39) used a sliding window to scan images and a CNN to detect whether HE lesions were present. In addition, Kurilová et al. (40) used the object detector of Faster-RCNN to detect HE lesions in fundus images. In this study, the object detector of our fusion model was modified from EfficientDet-d1, in which the backbone was co-used with the classification module during both the training and inference phases. We found that the performance of EfficientDet-d1 had significantly higher sensitivity for the detection of HEs compared to the original EfficientDet-d1 in the dual model. The higher sensitivity might be because the classification and object detection tasks are complementary in our system. Typical deep learning models usually lack interpretability, whereas visualization is useful for physicians to assess the result of DME classification by AI. To resolve this problem, we trained another object detector, YOLOv3, to detect anatomical landmarks (optic disc and macula). Our system integrated the fusion model and another object detector to achieve visualization and increase the interpretability of the AI. We also applied this system to fundus images obtained from open datasets to examine its effect. As shown in Figure 10 , three fundus images were classified as DME by our system, and it was possible to detect and annotate HE lesions, the optic disc, and 1DD from the macula center. These output fundus images can increase the interpretability of AI results for physicians. Deep learning models often require large memory usage and computing power. It is difficult to deploy deep learning models on high-end computers in remote areas where resources are limited. Typically, edge devices or cloud computing is utilized to address this issue. However, cloud computing requires network resources. In some remote areas, there was no well-internet service to support cloud computing. Beede et al. (41) discovered that 2 h were required to screen ten diabetic patients using their cloud eye-screening system deployed in Thailand due to sluggish Internet service. Although the edge device is portable and does not require network connections, its small memory size and limited computing power are the primary hindrances. Singh and Kolekar (42) reduced the model size to resolve the storage issue associated with edge devices to classify COVID-19 using computed tomography scans of the chest. In our fusion model, the classifier and object detector co-used the backbone of the object detector. This design reduced the demand for memory usage and the execution time, as shown in Table 3 . This fusion model is computationally efficient and can be deployed on an edge device with an anatomical landmark detector. In addition, due to traditional fundus camera without appropriate hardware (at least equipped with NVIDIA GeForce GTX 1070 8GB memory), one model to process the data on an edge device could resolve this issue. Therefore, this is the reason why we need to design a deep learning model to process the data in an edge device. Nonetheless, if the computer associated with the fundus camera has appropriate hardware, our model also could integrate into the computer system of camera without needing on an independent edge device. Our study has several strengths. First, we used a large number of fundus images to train the model. Second, our model yielded satisfactory results for private and open datasets. The model could be implemented on fundus images for different ethnicities. Third, this system facilitates DME classification and the visualization of HE lesions, optic disc, and the macula. Therefore, it is expected that non-ophthalmologist physicians would have more confidence in DME diagnosis determined using AI. Fourth, this system can be deployed on an edge device. This device is portable and affordable. Thus, the proposed system could be applied to diabetic patients in remote or resourcelimited areas. This study has several limitations. First, drusen and the partial features of silicone oil retention are similar to those of HEs. These types of features were not well-trained in our system owing to limited data. This could lead to a falsepositive result for DME. Second, we did not integrate the fusion model and anatomical landmark detector into one fusion model. Third, some diseases, such as myelinated fiber layer and optic disc edema, presented blurred boundaries of the optic disc. These diseases could influence the detection of the optic disc and cause inaccurate visualization of 1DD from the macula center. Based on the obtained results, our future work will involve the application of the proposed system to other object detectors with a backbone that was originally a CNN image classifier, followed by the integration of the fusion model and the anatomical landmark detector into one fusion model on an edge device. Furthermore, we will also train this system to classify the grade of DR and annotate the locations of hard exudates, hemorrhages, soft exudates, microaneurysms, the optic disc, and the macula. This system will grade DR and DME, as well as provide lesion visualization to increase the interpretability of the AI results for physicians. In conclusion, our system combines a novel end-to-end fusion model with a second object detector to perform DME classification, HE detection, and anatomical localization. It can identify DME and elucidate the relationship between HE and the macula. The entire system can facilitate higher interpretability and serve as a clinical reference for physicians. In addition, it can be implemented on a portable edge device. Clinically, this AI system can be used during the regular examination of DR to improve the interpretation of fundus imaging in patients with DME. Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025 Macular edema. A complication of diabetic retinopathy Diabetic retinopathy Photocoagulation for Diabetic Macular Edema. Early treatment diabetic retinopathy study report number 1. Early Treatment Diabetic Retinopathy Study research group Ranibizumab for diabetic macular edema: results from 2 phase III randomized trials: RISE and RIDE Intravitreal dexamethasone implant Ozurdex R in naïve and refractory patients with different subtypes of diabetic macular edema Intravitreal aflibercept for diabetic macular edema Retinopathy and nephropathy in patients with type 1 diabetes four years after a trial of intensive therapy Optical coherence tomography (OCT) for detection of macular oedema in patients with diabetic retinopathy Guidelines on diabetic eye care: the international council of ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings Automatic detection and grading of severity level in exudative maculopathy Automated detection of exudates and macula for grading of diabetic macular edema Diabetic macular edema grading in retinal images using vector quantization and semi-supervised learning Automatic assessment of macular edema from color retinal images Exudate-based diabetic macular edema detection in fundus images using publicly available datasets Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs Deep learning fundus image analysis for diabetic retinopathy and macular edema grading DMENet: diabetic macular edema diagnosis using hierarchical ensemble of CNNs Leveraging multimodal deep learning architecture with retina lesion information to detect diabetic retinopathy Deep learning for automated diabetic retinopathy screening fused with heterogeneous data from EHRs can lead to earlier referral decisions Understanding intermediate layers using linear classifier probes Robust methods for real-time diabetic foot ulcer detection and localization on mobile devices Quadratic polynomial guided fuzzy C-means and dual attention mechanism for medical image segmentation A study on the use of Edge TPUs for eye fundus image segmentation Feedback on a publicly distributed image database: the messidor database Automated analysis of retinal images for detection of referable diabetic retinopathy Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy EfficientDet: Scalable and Efficient Object Detection EfficientNet: rethinking model scaling for convolutional neural networks Focal loss for dense object detection Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach YOLOv3: an incremental improvement Diabetic retinopathy screening using deep neural network CANet: Cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading ResNet based deep features and random forest classifier for diabetic retinopathy detection Deep learning-based automated detection for diabetic retinopathy and diabetic macular oedema in retinal fundus photographs Development and validation of deep learning models for screening multiple abnormal findings in retinal fundus images Retinal lesion detection with deep learning using image patches Support vector machine and deep-learning object detection for localisation of hard exudates A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy Deep learning empowered COVID-19 diagnosis using chest CT scan images for collaborative edge-cloud computing platform We thank Tri-Service General Hospital, Chung Shan Medical University Hospital, and China Medical University Hospital in Taiwan for providing the fundus image data for this study. Messidor-1 and Messidor-2 are kindly provided by the Messidor program partners (see https://www.adcis.net/en/ thirdparty/messidor/) and the LaTIM laboratory. The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s. The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.