key: cord-0760054-gg9gzvrj authors: Zhu, Haihua; Cao, Zheng; Lian, Luya; Ye, Guanchen; Gao, Honghao; Wu, Jian title: CariesNet: a deep learning approach for segmentation of multi-stage caries lesion from oral panoramic X-ray image date: 2022-01-07 journal: Neural Comput Appl DOI: 10.1007/s00521-021-06684-2 sha: 864f4da119019148263733af2609ba89d8a7b1de doc_id: 760054 cord_uid: gg9gzvrj Dental caries has been a common health issue throughout the world, which can even lead to dental pulp and root apical inflammation eventually. Timely and effective treatment of dental caries is vital for patients to reduce pain. Traditional caries disease diagnosis methods like naked-eye detection and panoramic radiograph examinations rely on experienced doctors, which may cause misdiagnosis and high time-consuming. To this end, we propose a novel deep learning architecture called CariesNet to delineate different caries degrees from panoramic radiographs. We firstly collect a high-quality panoramic radiograph dataset with 3127 well-delineated caries lesions, including shallow caries, moderate caries, and deep caries. Then we construct CariesNet as a U-shape network with the additional full-scale axial attention module to segment these three caries types from the oral panoramic images. Moreover, we test the segmentation performance between CariesNet and other baseline methods. Experiments show that our method can achieve a mean 93.64% Dice coefficient and 93.61% accuracy in the segmentation of three different levels of caries. Dental caries is defined as a localized disease which affects the hard tissues of teeth caused by microorganisms in plaque [1] . It is one of the most common oral diseases. According to the 4th Chinese National Oral Health Epidemiological Survey in 2015, the prevalence of caries in primary teeth of children aged 3-5 is as high as 70.81%. With the increase of age, the incidence rate of caries also mounts progressively. The prevalence of dental caries is 80.7% in the 65-74 age group [2] . At the same time, dental caries causes a huge social and economic burden. A global burden of diseases study showed that about 3.5 billion people worldwide suffer from oral disease and the direct cost of treating these diseases is 298 billion dollars [3] . Meanwhile, the diagnosis of dental caries depends on the subjectivity of doctors in clinic. The traditional dental caries diagnosis method is mainly discovered by the attending doctor through the visual inspection and the probe exploration, which has certain subjectivity. Given that the early caries and the hidden caries are difficult to be detected, the misdiagnosis rate is high. If not treated timely, the dental caries may expand gradually, invade the dental pulp, trigger the tooth apical inflammation, the apical abscess and other dental diseases, and eventually the teeth may fall off. Oral panoramic radiographs (X-rays) play a critical role in the diagnoses of dental diseases like caries. As a preventative diagnostic tool, dentists can utilize oral H. Zhu panoramic radiographs, a preventive diagnostic tool, to find hidden dental structure, bone loss, malignant or benign masses and cavities that cannot be examined under visual examination. Caries decay can be reflected radiographically when there is sufficient decalcification of tooth structures [1] . X-ray image of dental caries shows different gray value in different developing stages. According to [4] , the shallow caries is defined as caries radiolucency in enamel or in the outer third of dentin; moderate caries is defined as caries radiolucency in the middle third of dentin; and deep caries is defined as caries radiolucency in the inner third of dentin with or without apparent pulp involvement. An example of panoramic radiograph image is shown in Fig. 1 . The Boxes A, B and C are corresponding to shallow caries, moderate caries and deep caries, respectively. Computer-aided diagnosis system provides a more efficient method to solve the above problems. By means of the analysis and calculation ability of computer, a mathematical model of a disease diagnoses is established. Furthermore, the classification, prediction and localization of the lesions of this type of disease can greatly alleviate the burden and reduce the difficulty for clinical doctors. In recent years, with the rapid development of artificial intelligence, this technology has also gained its popularity in the field of medical imaging, while deep learning is the most used in the field of artificial intelligence and automatic learning based on large datasets of medical images with the means of introducing a convolutional neural network (CNN) to extract image features. In this paper, to attain accurate segmentation of caries lesions, we propose a new deep learning network called CariesNet. Inspired by the structure of U-Net [5] , we build an U-shape neural network for oral panoramic image segmentation. In particularly, we use the full-scale axial attention module as well as the partial encoder module to enhance the segmentation performance. To sum up, the main contributions of this work are threefold. (1) We propose a novel deep architecture CariesNet for segmenting dental caries lesions in panoramic radiograph. The rest of this paper is organized as follows. Section 2 briefly reviews the related work. Our proposed CariesNet method is described in Sect. 3. Section 4 reports the experimental results. Finally, Section 5 concludes the work. 2 Related works 2.1 Computer-aided diagnosis methods for dental caries The computer system can be used to quantify the changes of gray value in the image and realize clinical diagnoses. In recent years, deep learning has also been applied to identify and diagnose dental caries. In 2016, Anias et al. extracted 48 regions of interest from oral panoramic X-ray images by threshold segmentation and utilize a neural network combined with BP backpropagation to diagnose dental caries [6] . Ali et al. use three stacked sparse automatic encoders to extract the characteristics of apices and apply the Softmax classifier to determine whether the teeth had caries [7] . In 2017, a linear adaptive particle swarm optimization (LA-PSO) algorithm is introduced to generate l-rate for 120 panoramic images of decayed teeth, and the classification performance of the proposed LA-PSO is evaluated by backpropagation neural model [8] . Prajapati et al. introduce migration techniques to construct a convolutional neural network-based dental caries diagnostic model, using VGG-16 to detect caries in 251 X-ray images [9] ; Zhang et al. construct a computer-aided assessment system based on CBCT images to improve the accuracy of caries diagnosis [10] . In 2020, Lin et al. construct a computer-aided dental caries diagnosis system based on the depth-learning model, which shows that deep learning has good performance in detecting dental caries in root-tip X-ray images, to detect the adjacent surface caries of permanent teeth in apical X-ray images and to provide a reference for the early diagnosis of the adjacent surface caries [11] . Haghanifar et al. collect 480 oral panoramic X-ray images and propose a teeth segmentation and caries detection workflow to achieve a 90.52% caries detection accuracy [12] . However, collecting the high-quality caries dataset and building a highly efficient deep learning architecture still remain huge challenges. In dentistry, many methods have been proposed for computer-assisted image segmentation (see [13, 14] for comprehensive reviews of such methods). Similar to the processing in the natural image, deep learning has been widely applied in computer vision tasks such as image classification and object detection [15] . Recently, an increasing number of deep learning-based methods were developed for image segmentation. One typical method is fully connected networks (FCNs) can perform end-to-end segmentation and be effective in diverse imaging applications (e.g., semantic segmentation [16, 17] , video object detection [18, 19] , multi-modality classification [20] ). However, due to its fully connected structure, FCN uses plenty of parameters, thus incurring obstacles in model training. SegNet [21] was presented with an encoder-decoder architecture for accelerating the training process. Based on FCNs and SegNet, an improved network, U-Net [5] , employed an encoder-decoder architecture and used skip connections between the upsampling and downsampling layers to combine high-resolution features with the upsampled output. Some variants of U-Net have also been proposed to enhance performance, such as 3D U-Net [22] , V-Net [23] , UNet?? [24] , SE-ResUnet [25] and attention U-Net [26] . In particular, Fan et al. [27] proposed the efficient network PraNet to balance the inference speed and segmentation performance. Besides the known general image segmentation frameworks mentioned above, some dedicated deep learning models have also been developed. Specifically, for segmenting X-ray images, multiple deep learning-based models have been devised. Al-Antari et al. used the Dee-pLab directly to segment live [28] . Blain et al. proposed a modified U-Net network to detect COVID-19 infections from chest X-ray image [29] . Moeskops et al. utilized different image modalities to train a multi-task segmentation model [30] . Trullo et al. introduced the structure of a conditional random field module as RNN into FCN [31] . Moreover, deep learning methods have been significantly successful in other medical image segmentation tasks, such as segmentation of cells [32] , head and neck (HaN) [33] , liver [34] , brain [35] and optic disk [36] . In this section, we demonstrate the workflow of the proposed CariesNet. We first explain the collection of a comprehensive oral panoramic X-ray image dataset. Next we introduce the CariesNet architecture as well as the fullscale axial attention (FSSA) module. Finally we explain the loss function and the model training details. Most related studies in the field of dental problem detection using X-rays lack a sufficient number of images in their datasets. Large datasets let the models have more sophisticated architectures, including more parameters. Hence, developed models can handle more complicated features and detect subtle abnormalities that appeared in the tooth texture, like dental caries in the early stages. Annotation is Neural Computing and Applications an essential but time-consuming part that needs to be performed by the field specialists, e.g., dentists or radiologists. To address the issues of data lacking, we try to build a high-quality oral panoramic dataset. A set of 1159 panoramic images originating from dental treatment and routine care are collected by the Affiliated Stomatology Hospital, Zhejiang University School of Medicine, from 2015 to 2020. Data collection was ethically approved by the Chinese Stomatological Association ethics committee. Only panoramic images of permanent teeth were included in the dataset. Panoramic images of primary teeth or those where any assessment is deemed impossible were excluded. Most of the data were generated using radiographic machines from the manufacturer Dentsply Sirona (Bensheim, Germany), mainly Orthophos XG. On all panoramic images, each tooth was segmented and labeled using the FDI scheme by three dentist and checked by a forth dentists. From 1159 oral panoramic images, 3217 caries regions are labeled as shallow caries, moderate caries or deep caries. The detail of our caries dataset is shown in Table 1 . Generally, the size of the oral panoramic radiograph is large, while the target caries region is small. It is a challenge to find and delineate the overall architecture of the network as shown in Fig. 2 . We design CariesNet inspired by the overall architecture from the PraNet [27] , which is based on reverse attention mechanism [37] . As is shown in Fig. 2 , CariesNet is a general U-shape encoder-decoder framework, which can aggregate the features extracted from multi-level convolution networks. Traditional U-Net simply passes the feature to each decoder layer, and some high-level contextual information may lose in the decoder. Similar as introduced in [27] , we use the partial decoder to aggregate more high-level features in CariesNet. In this paper, we utilize Res2Net [38] as an efficient backbone. We concatenate three high-level feature maps in backbone to the partial decoder, and it predicts the initial saliency map for dental caries, which is labeled as a global map in Fig. 2 . Then both the backbone feature and partial decoder feature are concatenated to the attention module. In Car-iesNet, we replace the reverse attention (RA) module with the full-scale axial attention (FSAA) module, and the detail of FSAA is described in Sect. 3.4. Next, the feature map is passed through a 1 Â 1 convolution layer and added with the previous FSAA global map. Besides, in each high-level layer, the feature map obtained from FSAA in the previous layer and the feature map from the backbone is concatenated as the input of FSAA as well. We use three consequent FSAAs to compute the high-level saliency map. In the end, A 4 times bi-linear upsampling transformation with a sigmoid function is used to obtain the final output from the global feature map. The CariesNet network is efficient to segment slight dental caries regions from oral panoramic X-ray images. By aggregating the features in three high-level layers in the partial decoder, the contextual information can be effectively extracted from the global map, which means the target dental caries lesions can be placed at the initial guidance area (global map). The full-scale axial attention module can further mine the boundary cues of the output segmentation result. To sum up, the overall architecture shows that the Res2Net backbone features are forwarded to the partial decoder to generate the initial global map, and the full-scale axial attention module can reconstruct accurate dental caries segmentation results. Generally, the delineation of the target dental caries lesion includes two steps for experienced doctors. First, a coarse region that may contain a target lesion is located. And the second step is to annotate the accurate boundary of the target area. Since the rough saliency map is obtained from the partial decoder, we propose the FSSA module that can mine the boundary cues. The above-mentioned module can extract fine-grained feature maps which have both high-level semantic information and low-level detail information. As is shown in Fig. 3 , the input high-level backbone feature map and the upsampled location map are concatenated firstly. Different from the normal axial attention module, in order to enable the module to integrate more layers of characteristic information, we consider average pooling and maximum pooling at the same time. The extracted channel domain features are mapped to the same dimension with the number of channels as the original feature image again through the full connection layer, while the spatial domain features are mapped through the convolution layer of element-wise convolution kernel to obtain the single-channel feature with the same size as the spatial feature. We parallel extract the attention features from the channel domain and the spatial domain and then allow the network to aggregate both of them through the element-wise convolution layer. In order to get a smoother attention feature map, we utilize a sigmoid layer after the fusion layer. FSAA eventually outputs an attention feature map that represents the contextual information from a global view. Loss Function The binary cross-entropy (BCE) is usually employed as the loss function, which can be formulated as follows: where f is the number of pixels, and m j and n j, respectively, show the predicted value and its corresponding groundtruth value. However, the resulting inefficient optimization requires the adaptive loss function due to the high susceptibility of the cross-entropy loss function to class imbalance. Therefore, Dice loss is used as the loss function in our model as follows: where m j is the predicted value, and n j is the corresponding ground-truth value. We combine the BCE loss and Dice loss in CariesNet, and the final loss function can be expressed as: Some evaluation metrics, including the Dice coefficient, accuracy, precision and recall, are adopted to compare the performances of CariesNet and other methods to compare the performances of different methods. The Dice coefficient measures the overlapping pixels between the automatic and manual segmentation of dental caries, which is calculated as follows: where TP, FP, TN and FN represent true-positive, falsepositive, true-negative and false-negative prediction, respectively. Accuracy is the entire accuracy of the dental caries types and background segmentation, which is described as the following: Precision is the proportion of dental caries area that are classified as true-positive areas concerning all pixels of caries lesions that are classified by automatic segmentation, which is delimited as follows: The recall represents the proportion of the true-positive pixels of in dental caries that are classified by automatic segmentation versus the pixels of caries lesions that are classified by manual segmentation, which is calculated as follows: F1 score is used to quantify the weighted average of dental caries lesions between the precision and recall rate, with a value in [0, 1], and is calculated as follows: The results obtained from the caries dataset are reported in Table 2 . In each test case, we split the oral panoramic image two parts, the left and the right. In regard of the test result evaluation, joint segmentation results of the two parts are merged. DeepLab is a widely used pixel-wise segmentation tool [39] , which also uses an encoder-decoder structure. Here, we use U-Net and DeepLabV3? as baseline models. We use Res2Net as a backbone in Res-Unet [40] , which is implemented for the ablation experiments as a backbone method as well. All the deep learning models are tested on the same validation set, and the results of DSC, accuracy, F1 score, precision and recall are shown in Apart from the above comparison with the state-of-the-art methods, we also conduct extensive ablation experiments to validate the effectiveness of our method, including the partial encoder module, full-scale axial attention module, BCE/Dice loss function and deep supervision strategy. As shown in Table 3 Figure 2 shows the segmentation results. CariesNet can kindly find the small dental caries lesions from oral panoramic radiographs. We mark the deep dental caries lesions as yellow part in Fig. 4 . The moderate caries and the shallow caries area are marked as blue and green parts, respectively. To compare the performance between the methods clearly, we select a part to enlarge the display. The segmentation results of CariesNet, PraNet, U-Net, DeepLabv3 and Res-U-Net are shown in Fig. 4 . Compared with other methods, CariesNet has a smoother and more accurate boundary. In conclusion, we developed an automated system for caries diagnosis. Experiments demonstrate that the deep learning model can effectively segment the dental caries lesions from the oral panoramic X-ray image. In particular, we developed a state-of-the-art segmentation network CariesNet, implementing the partial encoder module and the full-scale axial attention module into the common encoder-decoder U-shape structure. We conducted experiments on the dataset and the validation and test studies showed the capability of our new approach for this segmentation task. Comparison and ablation experiments also suggested that our new CariesNet architecture yields very good performance in segmenting slight lesions from large X-ray images. A survey: segmentation in dental X-ray images for diagnosis of dental caries The 4th national oral health survey in the mainland of China: background and methodology Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories Deep learning for caries detection and classification Squeeze-and-excitation networks Detection of caries in panoramic dental X-ray images using back-propagation neural network Detection and classification of dental caries in x-ray images using deep neural networks 2017 IEEE Int Conf Power. Signals and Instrumentation Engineering (ICPCSI) (IEEE, Control Classification of dental diseases using CNN and transfer learning Study on the sensitivity of computer-aided detection of adjacent caries in cone-beam CT images Study on the sensitivity of computer-aided detection of adjacent caries in conebeam CT images Paxnet: Dental caries detection in panoramic x-ray using ensemble transfer learning and capsule classifier The visualization of segmentation results from CariesNet, PraNet, U-Net, DeepLabv3 and Res-Unet. Deep caries, moderate caries and shallow caries masks are labeled as yellow A review of semantic segmentation using deep neural networks Applications of deep learning in dentistry, Oral Surgery. Oral Med, Oral Pathol Oral Radiol Imagenet classification with deep convolutional neural networks Fully convolutional networks for semantic segmentation Fully convolutional instance-aware semantic segmentation Video salient object detection via fully convolutional networks One-shot video object segmentation Multimodality fusion learning for the automatic diagnosis of optic neuropathy DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs U-Net: Learning dense volumetric segmentation from sparse annotation V-Net: Fully convolutional neural networks for volumetric medical image segmentation UNet??: A nested U-Net architecture for medical image segmentation Cascaded SE-ResUnet for segmentation of thoracic organs at risk Attention u-net: Learning where to look for the pancreas Pranet: Parallel reverse attention network for polyp segmentation A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification Determination of disease severity in COVID-19 patients using deep learning in chest X-ray images Deep learning for multi-task medical image segmentation in multiple modalities Segmentation of organs at risk in thoracic CT images using a sharpmask architecture and conditional random fields U-Net: Deep learning for cell counting, detection, and morphometry AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy Bottleneck feature supervised U-Net for pixel-wise liver and tumor segmentation M-Net: A convolutional neural network for deep brain structure segmentation Optic disc and cup segmentation methods for glaucoma detection with modification of U-Net convolutional neural network Reverse attention for salient object detection Res2net: A new multi-scale backbone architecture DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs U-Net: Convolutional networks for biomedical image segmentation Conflict of interest The authors declare that they have no conflict of interest.