key: cord-0111450-d0leyo6l authors: Jain, Rohit Kumar; Sharma, Prasen Kumar; Gaj, Sibaji; Sur, Arijit; Ghosh, Palash title: Knee Osteoarthritis Severity Prediction using an Attentive Multi-Scale Deep Convolutional Neural Network date: 2021-06-27 journal: nan DOI: nan sha: bcbe989ee2079dc879cd57782ae2d42c070187b2 doc_id: 111450 cord_uid: d0leyo6l Knee Osteoarthritis (OA) is a destructive joint disease identified by joint stiffness, pain, and functional disability concerning millions of lives across the globe. It is generally assessed by evaluating physical symptoms, medical history, and other joint screening tests like radiographs, Magnetic Resonance Imaging (MRI), and Computed Tomography (CT) scans. Unfortunately, the conventional methods are very subjective, which forms a barrier in detecting the disease progression at an early stage. This paper presents a deep learning-based framework, namely OsteoHRNet, that automatically assesses the Knee OA severity in terms of Kellgren and Lawrence (KL) grade classification from X-rays. As a primary novelty, the proposed approach is built upon one of the most recent deep models, called the High-Resolution Network (HRNet), to capture the multi-scale features of knee X-rays. In addition, we have also incorporated an attention mechanism to filter out the counterproductive features and boost the performance further. Our proposed model has achieved the best multiclass accuracy of 71.74% and MAE of 0.311 on the baseline cohort of the OAI dataset, which is a remarkable gain over the existing best-published works. We have also employed the Gradient-based Class Activation Maps (Grad-CAMs) visualization to justify the proposed network learning. arthritis where activity and exercising improve symptoms. It can also lead to instability, joint deformity, and reduction in joint functionality [2] . In addition, the distance between the knee joint begins to flatten out due to the loss of the cartilage, leading to the progression of knee OA [1] . The following key changes, described by the word LOSS, marks the progression of knee OA: • L-"loss of joint space", caused by the cartilage loss, • O-"osteophytes formations", projections that form along the margins of the joint, • S-"subarticular sclerosi", increase in bone density along the joint line, and • S-"subchondral cysts", caused due to holes in the bone filled with fluid along the joints [3] . Radiographic screening (X-Rays), MRI, and CT scans are a few of the common ways to detect the structural changes in the joint and diagnose knee OA's biological condition. However, the traditional treatment for knee OA may not be effective enough to completely fix the disease in today's time. Therefore, it is of utmost importance to detect the deformation of the joint at such a stage before which it becomes impossible to reverse the loss [4] . Generally, the knee OA severity is measured in terms of the World Health Organization (WHO) approved KL grading scale [5] . KL grading is a 5-point semiquantitative progressive ordinal scale ranging from grade 0 (low severity) to 4 (high severity). Fig. 1 shows the disease progression along with its corresponding KL grade. In general, a complete cure for this disease remains quite challenging to find, and OA management is mainly palliative [1] , [6] . MRI screenings and CT scans are effective as they highlight the three-dimensional structure of the knee joints [7] arXiv:2106.14292v1 [eess.IV] 27 Jun 2021 [8] . However, they have certain drawbacks, including limited availability, extreme device expenses, the time required in diagnosing, and the inclination to image ancient rarities [9] , [10] . At the same time, X-Rays are the most effective and economically feasible way of diagnosing the disease, given the routine knee OA diagnosis. However, the currently adopted methods for assessing the disease progression from X-Ray images may not be much effective. They, in general, require a very skilled practitioner to analyze the radiographic scans accurately and are thus absolutely subjective. In most cases, the practitioners require multiple tests to quantify the condition accurately, which is generally time-consuming. The analysis may differ based on their expertise and sometimes may be inaccurate. Further, multiple tests may be costly for some of the patients. A better and in-depth understanding of knee OA may result in timely prevention and treatment. It is believed that early treatments and preventive measures are the most effective way of managing knee OA. Unfortunately, there has been no significant and predominant way of identifying the disease at an early stage to date. Recently, the use of Machine Learning (ML) and Deep Convolutional Neural Networks (CNNs) for knee OA analysis have shown remarkable supremacy in detecting even the slightest differences in biological joint structural variations in the X-Rays [11] . Deep CNNs have been widely adopted in many medical imaging tasks, including classifications of COVID-19, pneumonia, tumor, bone fracture, polyps detection, etc. For e.g., CheXNet [12] , a 121-layers deep CNN, performed astonishingly better than the average performance of four specialists in assessing pneumonia using plain radiographs [13] . However, it is difficult to collect the medical images, as the collection and annotation of such data are challenged by the expert availability, and the data privacy concerns [13] . OAI is a distributed, observational study of patients, which is publicly available 1 . It facilitates the scientific and research community worldwide to work on knee OA progression and develop new treatments and techniques beneficial for its detection and treatment. In this work, we have utilized the data acquired from the OAI repository and made available by Chen et al. [14] , [15] . The dataset comprises knee bilateral posterioranterior fixed flexion radiographs of 4796 participants, including male and female subjects from the baseline cohort. Fig. 1 shows sample X-ray images pertaining to each KL grade. Several schemes have been developed for the Knee OA severity prediction in the past few years. Shamir et al. [16] utilized a weighted nearest neighbors algorithm that incorporated the hand-crafted features like Gabor filters, Chebyshev statistics, multi-scale histograms, etc. Antony et al. [17] proposed to utilize the transfer learning of the existing pre-trained deep CNNs. Later, Antony et al. [18] customized a deep CNN from scratch and optimized the network using a weighted combination of the traditional cross-entropy and the mean squared error, which served as dual-objective learning. Tuilpin et al. [19] developed a method inspired from the deep Siamese network [20] , for learning the similarity metric between the pair of radiographs. Gorriz et al. [21] developed an end-to-end attention-based network, bypassing the need to localize knee joint, to quantify the knee OA severity automatically. Chen et al. [15] proposed to utilize pre-trained VGG-19 [22] along with an adjustable ordinal loss for the proportionate penalty to the misclassification. Yong et al. [23] utilized the pretrained DenseNet-161 [24] , along with an ordinal regression module (ORM), in order to treat the ordinality of the KL grading. They further optimized the network using the cumulative link (CL) loss function. Deep CNNs are renowned for learning the highly correlated features in an image. In addition, it is a widely known fact that the first few layers of a deep CNN contribute to the learning of low-level features in an image. Whereas the last few layers contribute to the learning of the high-level features, enabling the final classification by adaptively learning spatial hierarchies of features [25] . While the low-level features are the minute details of an image, including points, lines, edges, etc., the high-level features comprise several low-level features, which make up the more prominent and robust structures for classification. However, in general, the knee X-Rays do not comprise many edgy or low-level structures. Due to a lack of such vital information, it may be difficult for a deep CNN to learn an efficient classification particularly, in the case of knee OA, where one KL grade is not very distinctive from the other unless carefully inspected (see Fig. 1 ). A few of the most recent state-of-the-art methods [15] , [23] have directly utilized the existing popular image classification models in a plugand-play fashion without supervising the network engineering relevant to the given problem. It should be mentioned that while a majority of those methods were built for a generic image classification problem, a few of them were explicitly designed using architectural search, e.g., MobileNetV2 [26] . Moreover, for the knee OA severity classification, the presented best-performing deep CNNs were enormous in size, exceeding 500 MB [15] , to be precise. As a result, such models may require substantially high computational resources, making it challenging to deploy in real-time environments. Therefore, it may be said that the direct usage of popular classification models may not be appropriate. Although some recent methods [19] , [27] , [21] [23] , have started to design the models specific to knee OA given the amount of information present in the knee X-rays. However, they still lack in terms of accuracy and computational overhead. For e.g., Zhang et al. [27] utilized the Convolutional Block Attention Module, namely CBAM [28] , after every residual layer in their proposed architecture, which may not be computationally pleasant. The attention module has performed undoubtedly well in many high-level vision tasks. However, one must not overlook its computational overhead considering the presence of fully connected layers. The applicability of deep CNNs in medical imaging heavily depends on the amount of data available for efficient learning. As an alternative, many deep learning-based methods have utilized the data augmentation techniques to further boost the performance, which has not been much considered in the existing works. Based on the aforementioned drawbacks of the existing bestpublished works, our contributions are five-fold, as follows: 1) We propose an efficient deep CNN for the knee OA severity prediction in terms of KL grades using X-ray images. Unlike existing methods, our proposed scheme is not a blind plug-and-play of popular deep models. The proposed scheme has been built upon a high-resolution network; namely, HRNet [29] , that takes the spatial scale of the X-Ray image into account for efficient classification. 2) We also propose to utilize the attention mechanism only once in the entire network to reduce the computational overhead and adaptive filtering of the counterproductive features just before classification. 3) Also, instead of relying on traditional entropy-based minimization, we have adopted the ordinal loss [15] to optimize the proposed scheme. 4) To further boost the performance of the proposed scheme, we have incorporated the data augmentation techniques, which have not been much considered in any recent work so far. 5) Lastly, we present an extensive set of experiments and Grad-CAM [30] visualization to justify the importance of each module of the proposed framework. The rest of the paper is organized as follows: Section III presents the proposed method and the adopted cost function. Section IV briefly describes the incorporated dataset, training details, competing methods, and evaluation metrics. Section V presents the quantitative and qualitative comparison against the best-published works. Section VI presents a brief discussion on the learning of proposed scheme in terms of Grad-CAM visualization of obtained results. Section VII demonstrates the ablation study against various components, and finally, the paper is concluded in Section VIII. This section presents the details of the proposed model, followed by a brief description of the incorporated cost function. The proposed framework is built upon the HRNet and Convolution Block Attention Module (CBAM) in a serially cascaded manner. A descriptive representation of the proposed model is shown in Fig. 2 . High-Resolution Network (HRNet) [29] is a novel and revolutionary multi-resolution deep CNN, which tends to maintain high-resolution feature representations throughout the network. It starts as a stream of 2D convolutions and subsequently adds up the high-to-low resolution streams to form the following stages. It then merges the multi-resolution streams in parallel for information exchange [29] as shown in Fig. 2 (marked as High-Resolution Network). HRNet tends to generate reliable multi-resolution representations with strong spatial sensitivity. It has been achieved by utilizing parallel connections instead of serial (see Fig. 3 (a)) and recurrent fusion of the intermediate representations from multi-resolution streams (see Fig. 3 (b)), as shown in Fig. 3 . As a result, it enables the network to learn more highly correlated and semantically robust spatial features. This motivates us to incorporate HRNet for processing the knee X-Ray images, which lack such rich spatial features. To formally define, let D ij denotes the sub-network in the i th stage of j th resolution index. The spatial resolution in this branch is 1/2 j − 1 of that of the high-resolution (HR) branch. For e.g., HRNet, which consists of four different resolution scales, can be illustrated as follows: Later, the obtained multi-resolution feature maps are fused to exchange the learned variscaled information, as shown in Fig. 4 . For this, HRNet utilizes bilinear upsampling followed by the 1 × 1 convolution to adjust the number of channels when transforming the lower resolution feature map to a higher resolution scale, or a strided 3 × 3 convolution otherwise. Convolutional Block Attention Module (CBAM) consists of two sequential sub-modules : (a) channel attention module, and (b) spatial attention module [28] . Given an input feature map, P ∈ R C×H×W , CBAM sequentially infers a onedimensional channel attention map M ap c ∈ R C×1×1 and a two-dimensional spatial attention map M ap s ∈ R 1×H×W . Thus we obtain a final refined attention map, here denoted as T, and the comprehensive attention mechanism can be summarized as: where ⊗ signifies element-wise multiplication. M ap c is first generated by making use of the cross-channel relationship of the features, as, where k 7×7 denotes the convolution operation with kernel of size 7 × 7. We propose a deep CNN, called OsteoHRNet, that utilizes the HRNet as the backbone and is further empowered with an attention mechanism for the knee KL grade classification. CBAM is integrated at the end of the HRNet, followed by a fully connected (FC) output layer, as depicted in Fig. 2 . It may be said that the integration of the CBAM module after HRNet has been beneficial in learning adaptive enriched features for an efficient KL grade classification. It can also be observed that the proposed one-time integration of CBAM is computationally pleasant, compared to the multiple additions in the existing work [27] . The resultant output from the CBAM is then fed into the final fully connected layer, which outputs the probabilities of the KL grade for the given input X-Ray image. HRNet has been considered for reliable feature extraction, whereas the capabilities of CBAM are leveraged to help the model better focus on relevant features. A majority of the existing works on knee OA severity classification have considered the nominal nature of KL grades for classification. However, inspired by the idea of Chen et al. [15], we approach this task as an ordinal regression problem and therefore utilize the ordinal loss function instead of the traditional cross-entropy. The ordinal loss function used in this paper is a weighted ratio of the traditional cross-entropy. Given the ordinality in the KL grading, it must be acknowledged that extra information is provided by progressive grading. This approach penalizes the distant grade misclassification more than the nearby grade according to the penalty weights. For e.g., a grade 1 classified as grade 3 is penalized more severely than it is classified as grade 2 and even more for being classified as grade 4. An ordinal matrix C n×n is considered as the penalty weights between the outcome and the true grade, i.e., c uv denotes the penalty weight for predicting a grade v as u with n = 5. In this study, with five KL grades to classify and c uu = 1, the adopted ordinal loss can be written as where u, v are the predicted and true KL grades of the input image, respectively, p u is the output probability by the final output layer of the architecture with q u = p u if u = v and q u = 1−p u , otherwise. We have utilized the following penalty matrix for our experimentation. We have utilized the X-ray radiographs acquired from the OAI repository which has been made available by Chen et al. [14] . The images obtained are of 4796 participants, Dataset Grade0 Grade1 Grade2 Grade3 Grade4 Total Training 2286 1046 1516 757 173 5778 Testing 639 296 447 223 51 1656 Validation 328 153 212 106 27 826 Total 3253 1495 2175 1086 251 8260 including men and women. Given that we focus primarily on the KL grades, radiographs with annotated KL grades from the baseline cohort are acquired to assess our method. The dataset of a total of 8260 radiographs, including the left and right knee, was split into train, test, and validation sets in the ratio of 7:2:1 with balanced distribution across all KL grades [14] . Table I shows the train, test, and validation distribution of the dataset. The entire code is developed using Pytorch [31] framework, and all the experiments have been conducted on a 12GB Tesla K40c GPU. Furthermore, the training of all the experimental models was optimized using stochastic gradient descent (SGD) for 30 epochs with an initial learning rate of 5e-4. Additionally, owing to the GPU capacity, the batch size was set to 24. In [15] , the authors proposed to utilize the pre-trained VGG-19 [32] network with a novel ordinal loss function. Yong et al. [23] proposed to utilize the DenseNet-161 [24] with the ordinal regression module (ORM). We have compared the OsteoHRNet against the results obtained by the best-published studies mentioned above for a robust comparison. In this study, we have utilized the following three evaluation metrics to analyze and compare the performance of our proposed model : (a) Multi-class accuracy, (b) Quadratic Weighted Cohen's Kappa coefficient (QWK), and (c) Mean Absolute Error (MAE). Traditionally, multi-class accuracy is defined as the average number of outcomes matching the ground truth across all the classes. Accuracy for five classes with N instances is formulated as below where, F is a function which returns 1 if the prediction is correct and 0 otherwise. MAE is the mean of the absolute error of the individual prediction over all the input instances. The error in the prediction value is determined by the difference between the predicted and the true value for that given instance. MAE for five classes with N instances can be expressed as below where, y i &ŷ i are the true and the predicted grade, respectively. A weighted Cohen Kappa is a metric that accounts for the similarity between predictions and the actual values. The Kappa coefficient is a chance-adjusted index of agreement measuring the reliability of inter-annotator for qualitative prediction. The Quadratic Weighted Kappa (QWK) is evaluated using a predefined table of weights which measures the extent of non-alignment between the two raters. The greater the disagreement, the greater the weight. O is the contingency matrix for K classes such that O p,p denotes the count ofp grade images predicted as p. The weight, w, is defined as Next, E is calculated as the normalized product between the predicted grade's and original grade's histogram vector. Of the three metrics, accuracy and QWK are positive in nature while MAE is negative in nature. It can be observed from Table II that the proposed method has outperformed the existing best-published works [15] [23] in terms of classification accuracy, MAE, and QWK. It should be mentioned that Yong et al. [23] reported the macro accuracy 2 and contingency matrix of their best model. For a fair comparison, equivalent to the above, we have reported their results in multi-class accuracy of 70.23%. Whereas Chen et al. [15] has reported the best multi-class accuracy of 69.69%. OsteoHRNet has reported a maximum multi-class accuracy of 71.74%, multi-class average accuracy of 70.52%, MAE of 0.311, and QWK of 0.869 which is a significant improvement over [23] , [15] . Fig. 5 represents the confusion matrix obtained by using the proposed and existing methods [15] , [23] which when fed with 1656 test images. Furthermore, we have employed the Gradient-weighted Class Activation Maps (Grad CAM) [30] visualization technique to demonstrate the superiority of the proposed OsteoHR-Net. It also helps in showcasing the most relevant regions the network has learned to focus on in the X-ray images. Figs. 6, 7, 8, 9, It is evident from Fig.5 that the OsteoHRNet has outperformed the previous works [15] , [23] , significantly. It should be mentioned that the OsteoHRNet classifies the higher grade X-rays very accurately while reducing the misclassification between far away grades. In comparison to existing methods, there has been a significant increase in correct classifications for grade 2. Furthermore, the nearby misclassifications between higher grades (grade 2-grade 3, grade 3-grade 4) are minimum for the proposed method, which needs to be acknowledged. Also, by way of analysis using obtained Grad-CAM visualization of such incorrect classifications, it can be observed that OsteoHRNet is trying to locate joint space narrowing and osteophytes in accordance with the medical characteristics. At the same time, VGG-19 [15] is confused and focuses on the entire knee, giving importance to irrelevant features for KL grade classification, as seen in Figure 11 , 12. Owing to its superior network learning, our model is extremely relevant to the medical setting of KL grade clas- sification. Furthermore, the Grad-CAM visualization of our model can be extended for the use of the medical practitioner to provide confidence in the findings. However, our study has some limitations, and certain radiographs could not be correctly classified due to the lack of rich features in the radiographs. Fig. 14 shows nearby grade misclassifications, which to a great extent is unavoidable. But, there is high inter and intraobserver variability (correlation coefficient = 0.83) for manual knee KL grading [33] . Thus, our proposed fully automated KL grading method can be extended in clinical settings for getting reliable and reproducible OA grading. This section presents an ablation study to demonstrate the contributions made by each sub-module of the proposed OsteoHRNet. For this, we have performed the following baselines: 1) HRNet: Original HRNet trained by utilizing the adopted dataset. 2) HRNet + CBAM: Original HRNet followed by the CBAM module trained using the adopted dataset. 3) OsteoHRNet: Original HRNet followed by the CBAM module trained using the adopted dataset. Further, during training, we have employed the data augmentation techniques to enhance the performance of the proposed model. It can be observed from Table III that the addition of the CBAM module and data augmentation techniques have immensely improved the performance compared to its curtailed baseline. The CBAM module might have adaptively learned the relevant features from the HRNet. Such features may have contributed more towards an efficient classification compared to the features learned by the original HRNet [29] , VGG-19 [32] , or DenseNet161 [24] . Fig. 13 demonstrates the Grad-CAM visualizations for our ablation study. It can be observed that the proposed OsteoHR-Net has learned the robust features progressively on each component of our proposed network. Thus, it is verified that each component of our network contributes to the final knee OA KL grade prediction. This paper proposes a novel OsteoHRNet by adopting the HRNet as the backbone and integrating the CBAM module for an improved knee OA severity prediction results from plain radiographs. The proposed network was able to perform exceptionally well and attain significant improvements over the previously proposed methods owing to the HRNet's capability to maintain high-resolution features throughout the network and its ability to capture reliable spatial features. The intermediate extracted features were significantly refined with the help of the attention mechanism; therefore, the radiographs with a similarity between classes and variations within classes could be distinguished better. Moreover, we have employed the Grad-CAM visualizations to validate that the model has learned the most relevant spatial features in the radiographs. In the future, we will work on the entire OAI multi-modal data and consider all the cohorts in our study. Fully automatic quantification of knee osteoarthritis severity on plain radiographs Knee osteoarthritis: A primer The truth behind subchondral cysts in osteoarthritis of the knee The epidemiology, etiology, diagnosis, and treatment of osteoarthritis of the knee Radiological assessment of osteoarthrosis Early detection of radiographic knee osteoarthritis using computer-aided analysis The osteoarthritis initiative: report on the design rationale for the magnetic resonance imaging protocol for the knee Learning-based cost functions for 3-d and 4-d multi-surface multi-object segmentation of knee mri: Data from the osteoarthritis initiative A comparative systematic literature review on knee bone reports from mri, x-rays and ct scans using deep learning and machine learning methodologies General practitioners referring adults to mr imaging for knee pain: A randomized controlled trial to assess cost-effectiveness Machine learning in knee osteoarthritis: A review Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Deep convolutional neural network based medical image classification for disease diagnosis knee osteoarthritis severity grading dataset Fully automatic knee osteoarthritis severity grading using deep neural networks with a novel ordinal loss Knee x-ray image analysis method for automated detection of osteoarthritis Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks Automatic detection of knee joints and quantification of knee osteoarthritis severity using convolutional neural networks Automatic knee osteoarthritis diagnosis from plain radiographs: a deep learning-based approach Learning a similarity metric discriminatively, with application to face verification Assessing knee oa severity with cnn attention-based end-to-end architectures Very deep convolutional networks for large-scale image recognition Knee osteoarthritis severity classification with ordinal regression module Densely connected convolutional networks Convolutional neural networks: an overview and application in radiology Mobilenetv2: Inverted residuals and linear bottlenecks Attention-based cnn for kl grade classification: Data from the osteoarthritis initiative Cbam: Convolutional block attention module Deep high-resolution representation learning for visual recognition Grad-cam: Why did you say that? Pytorch: An imperative style, highperformance deep learning library Very deep convolutional networks for large-scale image recognition Classifications in brief: Kellgrenlawrence classification of osteoarthritis