key: cord-0058707-m1c28s1r authors: Filipe, Vítor; Teixeira, Pedro; Teixeira, Ana title: A Clustering Approach for Prediction of Diabetic Foot Using Thermal Images date: 2020-08-20 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58808-3_45 sha: 40f3bd4c1c398f6d4c2180b87e1106c0af31dd90 doc_id: 58707 cord_uid: m1c28s1r Diabetes Mellitus (DM) is one of the most predominant diseases in the world, causing a high number of deaths. Diabetic foot is one of the main complications observed in diabetic patients, which can lead to the development of ulcers. As the risk of ulceration is directly linked to an increase of the temperature in the plantar region, several studies use thermography as a method for automatic identification of problems in diabetic foot. As the distribution of plantar temperature of diabetic patients do not follow a specific pattern, it is difficult to measure temperature changes and, therefore, there is an interest in the development of methods that allow the detection of these abnormal changes. The objective of this work is to develop a methodology that uses thermograms of the feet of diabetic and healthy individuals and analyzes the thermal changes diversity in the plantar region, classifying each foot as belonging to a DM or a healthy individual. Based on the concept of clustering, a binary classifier to predict diabetic foot is presented; both a quantitative indicator and a classification thresholder (evaluated and validated by several performance metrics) are presented. To measure the binary classifier performance, experiments were conducted on a public dataset (with 122 images of DM individuals and 45 of healthy ones), being obtained the following metrics: Sensitivity = 0.73, Fmeasure = 0.81 and AUC = 0.84. Diabetes Mellitus (DM) or diabetes is a chronic disease characterized by the inability of our body to use its main source of energy, glucose (sugar), resulting in increased blood glucose levels (glycemia) [1] . Diabetes is a disease for life, which can have serious consequences if not well controlled [2] . The International Diabetes Federation (IDF) estimated that in 2015 there were about 415 million people with diabetes, and by 2040 the number would increase to 642 million, representing 1 in 10 adults worldwide [3] . In 2017, diabetes resulted in approximately 3.2 to 5.0 million deaths [4] . Diabetic foot is one of the main complications observed in diabetic patients, and can be defined as infection, ulceration and/ or destruction of deep tissues [5, 6] . Diabetic patients have between 12% and 25% of risk of developing foot ulcers during life, which is mainly related to peripheral neuropathy, and often with the peripheral vascular disease [7] . Peripheral vascular disease is a significant complication of diabetes and can produce changes in blood flow that will influence a change in skin temperature. An increase in skin temperature can also indicate tissue damage or inflammation associated with some type of trauma or excessive pressure [8] . The risk of ulceration is directly linked to an increase of the temperature in the plantar region; thus, there is a growing interest in monitoring plantar temperature frequently [9, 10] . Infrared thermography (IRT) is a fast, non-invasive and non-contact method that allows visualizing the foot temperature distribution and analyzing the thermal changes that occur [11] . Temperature analysis involves the identification of characteristic patterns and the measurement of thermal changes. It has been shown that in healthy individuals, there is a specific spatial pattern, called the butterfly pattern; while, in the DM groups there is a wide variety of spatial patterns [12] [13] [14] . The use of thermal images for the detection of complications in diabetic foot assumes that a variation in plantar temperature is associated with these types of complications. Several research works on the application of thermography to automatic identification of problems in diabetic feet can be found in the literature, being possible to classify those works in four categories, based on the type of the analyzes carried out: independent analysis of the temperature of the limbs, asymmetric analysis, analysis of the distribution of temperature and analysis of external stress [10, 15, 16] . Some brief considerations on these four classifications are presented. The analysis of an external stress consists in the application of external stimulus to the patient, e.g. immersing the feet in cold water or walking for a while, and analyzing the behavior of the plantar temperature to that stimulus. The main drawback of this type of analysis is that it can be uncomfortable for some people [15, 17, 18] . The independent analysis of temperature allows to obtain representative temperature ranges between the different study groups, however, it is not possible to detect specific areas with some risk related to problems in diabetic foot [10, 15] . Asymmetric temperature analysis consists in comparing the temperature of the foot with the contralateral, in order to define a limit that enables to detect risk areas. This approach has shown good results in several studies, however, it has some limitations; for example, when the patient has similar complications in both feet, because it is not possible to detect the risk areas, and when the patient has a partial or total amputation of the foot, as there is no area to compare with [15, 19, 20] . The analysis of temperature distribution is a method whose main advantage is that it does not use the comparison of the temperature of the feet, allowing to analyze each patient's foot separately. This approach makes it possible to measure changes by calculating a representative value for each foot in the DM group. Therefore, the measurement depends on the temperature distribution and not on a spatial pattern [12, 21, 22] . Several works present a thermal analysis based on the observation of specific points, or specific regions. Considering the temperature of the entire foot it is possible to carry out a complete analysis of its general condition. However, as the foot does not have an uniform temperature, it is important to consider a regional division [23] . For example, in [24, 25] , temperatures were recorded in thirty-three regions of interest (ROIs) on both feet (considering the points of the foot that are most likely to ulcer) before the analysis. Nevertheless, as one of the main causes of diabetic foot ulceration is the decrease in blood supply, the division that has been mostly used and discussed in recent years [12-14, 26, 27] , divides the feet into four regions according to the concept of angiosome, which is a region of tissue that has blood supply through a single artery. Some of the studies that use a partition of the feet based on the angiosomes propose quantitative indices to estimate the temperature variation; for example, in [26] quantitative information on the distribution of plantar temperature is provided, by identifying differences between the corresponding areas of the right and left feet, while the index presented in [12] , takes into account not only the temperature difference between the regions, but also the temperature interval of the control group (constituted by healthy subjects), in order to avoid the limitations of the asymmetric approach. The aim of this study is to develop a methodology to analyze the diversity of thermal changes in the plantar region of diabetic and healthy individuals, classifying each foot as belonging to a DM or to a healthy individual; thermographic photos of the plant of the feet are used. With this in mind, a binary classifier, based on the clustering concept, was developed to predict the diabetic foot, using both, a quantitative temperature index (CTI) and a classification temperature threshold (CTT). The calculation of the CTI is based on the division of the foot in clusters and uses the average temperature of each region; thus, low ranges of temperature values within the same region give a better index. The clustering method consists in grouping approximate values of the temperature in a cluster; therefore, dividing the foot in different areas (clusters) that present similar values of temperature [28] . When clusters are used to divide the foot in regions, the temperature values within each region are similar and the range of temperatures within each one of the clusters is low; thus, it is expected to obtain an index capable of measuring thermal variations. To measure the classifier's performance, binary experiments were performed using a public data set (with 122 images of individuals with DM and another 45 of healthy ones); the performance metrics Sensitivity, Specificity, Precision, Accuracy, Fmeasure, and Area Under the Curve were used to evaluate and validate the proposed CTT. This paper is organized as follows: In Sect. 2 the proposed binary classifier is described. In Sect. 3 the used database is introduced and the obtained results are presented and analyzed. Finally, conclusions and guidelines for future work are presented in Sect. 4. The proposed method has three stages of processing ( Fig. 1 ): temperature clustering, index computation and classification. In the first stage the foot is divided in regions based on a temperature clustering algorithm. For each cluster, regional parameters of temperature are calculated. Next, an index of temperature variation in relation to the reference values is computed. Finally, applying a thresholding procedure on the index, results concerning the subject classification as healthy or diabetic are presented. Clustering is the process of identifying natural clusters within multidimensional data based on some similarity measure. The global objective is to group data points with similar characteristics and assign them to the same cluster. This method is used in several fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, or computer graphics [28] . One of the most used clustering algorithms is k-means clustering, due to its simplicity and being computationally fast, when operated with a large data set. This algorithm divides a set of data into a k number of clusters, being an iterative process in which it minimizes the sum of the distances of each object to the cluster centroid, in all clusters [29] . Considering an image I(x,y), the objective is to group the pixels in k number of clusters (Cl i ; i ¼ 1; . . .k). Let p(x,y) be an input pixel and c i the centroid of each cluster (a centroid is the center of a cluster). The k-means algorithm is composed of 5 steps: 1. Initialize the number of clusters, k, and randomly choose k pixels from the image as initial centroids. 2. For each centroid, calculate the Euclidean distance between the pixel and the centroid using (1). 3. Assign the pixel to the nearest center based on the distances computed in (1). 4. After all pixels have been assigned, recalculate the new centroid's value, using (2). 5. Repeat steps 2, 3 and 4 until the centroids stop moving, i.e. k-means algorithm has converged. Temperature clustering Index computation Classification 6. Reshape the cluster pixels into image. Although k-means has the great advantage of being easy to implement, it has some disadvantages, being the main one the fact that the quality of the result depends on the arbitrary selection of the initial centroid; thus, being possible to achieve to different results for each cluster in the same image [29] . In this case, as we are dealing with foot temperatures, the goal will be to assign the pixels to different groups, according to its temperature, in order to guarantee that all pixels in the same group has the closest temperature values. In Fig. 2 two sets of images, resulting from the application of the k-means algorithm to the thermograms (a) and (e), are presented. The first set ((a) to (d)) represents a non-diabetic individual, while the second one ((e) to (h)) a diabetic individual; the second, third and fourth images of each set corresponds to the division of the plant of the foot in three, four and five clusters, respectively. After dividing the image in k clusters, for each one them, regional parameters of the temperature are calculated, namely, average, standard deviation, maximum and minimum values. Based on the concept of clustering and in order to provide a quantitative estimate of the thermal changes in the foot caused by DM, a new quantitative index is proposed. This index is based on the temperature variation of each cluster in relation to a reference temperature obtained from the healthy individuals (control group). To obtain the reference value of the temperature for each region (cluster), the average of the temperature of the corresponding region (cluster) of the control group is computed. For each subject, the value of the CTI is calculated as the average of the positive differences between the temperatures of the clusters of an individual (IND) and the correspondent reference values, obtained from the control group, as is (3), where k represents the number of used clusters, n the number of feet of the individuals, Tc i is the reference temperature value of cluster i and IND ij is the average temperature value of cluster i of the j th foot, respectively. This index is used to classify each foot as belonging to a healthy or diabetic subject. The CTI index measures the differences in the distribution of a subject's plantar temperature regarding to the reference temperatures, obtained using the control group. Thus, taking into account the reference values and the 334 CTI values obtained using (3), it can be observed that higher index values correspond to greater values of temperature variation; therefore, implying higher risk of the individual developing foot lesion. Based on the CTI values, in the last stage of the method the selection of an appropriate threshold (CTT), to classify the thermograms in one of the categories: healthy or diabetic is proposed. The success of this classification depends on the CTT used. To measure the classifier's performance, several experiments were carried out using the metrics: Sensitivity, Specificity, Precision, Accuracy, F-measure, and Area Under the Curve (AUC), defined by (4) to (8) , where TP, FP, TN and FN represent the number of cases of True Positive, False Positive, True Negative and False Negative, respectively. The AUC summarizes the entire location of the Receiver Operating Characteristic (ROC) curve and, rather than depending on a specific operating point, is an effective and combined measure of sensitivity and specificity that describes the inherent validity of diagnostic tests [30] . As the classifier is being used to detect the presence of a disease, Sensitivity and Specificity are the most relevant metrics to assess the integrity of the classifier [31] : To test the binary classifier, thermograms of individuals with and without DM were obtained from a public database that contains 334 individual plantar thermograms, corresponding to 167 individuals (122 diabetic and 45 healthy) [23] (Table 1) . Each thermogram, that corresponds to the plant of the left or right feet, is already segmented and vertically adjusted. In order to determine the number of clusters that should be used to obtain the best classification, several experiments were carried out, starting with three clusters and increasing this number until six. For each one of these cases, the reference values were computed using the control group temperature values and the CTI was calculated individually for each foot of both groups. From these experiments it was possible to observe that worse metric results are obtained with three clusters than with four or five. But, although the metric values improve when the number of clusters increases from three to four and from four to five, the same does not happen when six clusters are considered; therefore, in this work, five clusters are used. It is worth mentioning that when six or more clusters are used, the values of the metrics do not improve because the temperature variations between each cluster are low and, so, different clusters present very close temperature values; thus, affecting the calculation of the CTI index. For the five clusters case, the reference average values of the temperature per cluster ðTcÞ and the correspondent standard deviation (Std. dev.), both measured in centigrade degrees (°C), are the ones in Table 2 and the reference CTI value, obtained with the control group data, is 1.23 ± 0.91°C. Table 3 illustrates the index values obtained for seven thermograms from the dataset (CG027, CG002, DM102, DM004, DM032, DM027, DM006 in [23] ), the first two are from the control group and the remaining five from the DM group. In the columns of the table there is information concerning: the subject's identification, the thermographic image, the image obtained after clustering, the CTI value, and the measures (mean, standard deviation, minimum and maximum values) of the entire foot and of each one of the five clusters, respectively. During the feet thermal analysis, it could also be observed that when the clusters of a foot have average temperature values close to the ones of the control reference ( Table 2) , low values of the CTI are obtained and the butterfly pattern can be observed; this is what happens with the subjects in the control group, e.g. (Table 3 (CG027 and CG002) ). Nevertheless, notice that the foot of the example Table 3 (CG002) presents, in general, average values of the temperature clusters more distant from the reference values than the ones of example Table 3 (CG027) and presents a CTI value more than 1°C higher than the one of Table 3 (CG027) (although still concordant with the reference CTI value); this shows that the TCI index detects small temperature variations. For the feet of the DM group wide variations of the values of the CTI index can be observed and thermal changes can vary from slightly different from the butterfly pattern to a completely different pattern and this reflects in the CTI value. As shown in Table 3 , the CTI is closer to the reference CTI value when slight temperature variations occur, as in Table 3 (DM102) example. As variations start to be more evident (Table 3 (DM102 and DM004)), the CTI value increases, moving further and further away from the reference CTI value. It was also observed that, when the hot spots evolve, they not only become wider, but also warmer; thus, higher CTI values are obtained, e.g. (Table 3 (DM032 and DM027)). The hot spots may even cover the entire plantar region, reaching very high temperatures and presenting very high CTI values, e.g. (Table 3 (DM006)) . Based on the obtained CTI values, a new classification methodology is proposed. As the success of the classification methodology depends on the threshold value used to determine whether the individual is diabetic or not, several threshold values were tested, considering the reference CTI value. In Table 4 the performance measures used to evaluate the classification, for each one of the tested CTT, are presented. In order to visualize the tradeoffs between the true positive rate (sensitivity) and false positive rate (1-specificity) for different threshold values, we plot in Fig. 3 the Receiver Operating Characteristic (ROC) curve. Calculating the area under the curve (AUC) we obtained value the value 0.838. Given that the problem under study concerns a disease, special attention must be payed to the values of the sensitivity and specificity metrics [31] . In order to obtain a CTT that balances these two performance metrics, the value of 1.9 is proposed as the limit to classify the individuals as diabetic or healthy. This value was chosen using an approach known as the point closest-to-(0,1) corner in the ROC plane, which defines the optimal cut-point as the point that minimizes the Euclidean distance between the ROC curve and the (0,1) point [32] . Table 3 . CTI and temperature measure values, per cluster, for some feet thermograms of the control and the DM groups in [23] . In this work, a methodology that is capable of analyzing a diversity of thermal variations in the plantar region, classifying each foot as belonging to a DM or to a healthy individual is presented. In this approach, the cluster concept is used to obtain an index, CTI, that measures the temperature variations of the plant of the foot. The classification was evaluated by a threshold, that balances the metrics sensitivity and specificity metrics. Based on the presented results, the concept of cluster proved to be an effective approach to help measuring temperature variations in the plant of the feet and the presented index is able to detect those variations, indicating that higher index values correspond to greater values of temperature variation, therefore, implying a higher risk of the individual developing foot lesion. Additionally, the threshold that was determined allows to classify an individual as having DM or being healthy. Therefore, this work contributes for health professionals to have access to a classification instrument, that can assist in medical diagnosis, early detection of injury risk and helping to prevent ulceration. As future work we intend to classify the thermograms of the DM individuals in multiple categories. We also intend to extend the dataset, in order to balance the number of elements of healthy and DM individuals. Definition, diagnosis, and classification Prevention and treatment of the complications of diabetes mellitus IDF Diabetes Atlas What is the most effective way to reduce incidence of amputation in the diabetic foot? Diabetic foot ulcers -a comprehensive review Challenges in diagnosing infection in the diabetic foot Diabetic foot disorders: a clinical practice guideline (2006 revision) The Herschel heritage to medical thermography Narrative review: Diabetic foot and infrared thermography InfrCHed thermography A quantitative index for classification of plantar thermal changes in the diabetic foot Variations of plantar thermographic patterns in normal controls and nonulcer diabetic patients: Novel classification using angiosome concept Morphological pattern classification system for plantar thermography of patients with diabetes Computer aided diagnosis of diabetic foot using infrared thermography: a review A systematic literature review for early detection of type II diabetes Repeatability of infrared plantar thermography in diabetes patients: a pilot study Characterization of diabetic peripheral neuropathy in infrared video sequences using independent component analysis Asymmetry analysis based on genetic algorithms for the prediction of foot ulcers Automatic detection of diabetic foot complications with infrared thermography by asymmetric analysis with infrared thermography by asymmetric analysis Statistical approximation of plantar temperature distribution on diabetic subjects based on beta mixture model Automatic classification of thermal patterns in diabetic foot based on morphological pattern spectrum Plantar thermogram database for the study of diabetic foot complications Thermal symmetry of healthy feet: a precursor to a thermal study of diabetic feet prior to skin breakdown Between visit variability of thermal imaging of feet in people attending podiatric clinics with diabetic neuropathy at high risk of developing foot ulcers Quantitative estimation of temperature variations in plantar angiosomes: A study case for diabetic foot Thermal image processing for quantitative determination of temperature variations in plantar angiosomes An overview of clustering methods Image segmentation using k-means clustering algorithm and subtractive clustering algorithm Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool Defining an optimal cut-point value in ROC analysis: an alternative approach Acknowledgments. This work is financed by National Funds through the Portuguese funding agency, FCT -Fundação para a Ciência e a Tecnologia within project UIDB/50014/2020.