key: cord-0025072-1kob3bve
authors: Liu, Peng-ran; Zhang, Jia-yao; Xue, Ming-di; Duan, Yu-yu; Hu, Jia-lang; Liu, Song-xiang; Xie, Yi; Wang, Hong-lin; Wang, Jun-wen; Huo, Tong-tong; Ye, Zhe-wei
title: Artificial Intelligence to Diagnose Tibial Plateau Fractures: An Intelligent Assistant for Orthopedic Physicians
date: 2021-12-31
journal: Curr Med Sci
DOI: 10.1007/s11596-021-2501-4
sha: a1867c6e6990bbcb7de8992efb27e70ad23bbdb2
doc_id: 25072
cord_uid: 1kob3bve

OBJECTIVE: To explore a new artificial intelligence (AI)-aided method to assist the clinical diagnosis of tibial plateau fractures (TPFs) and further measure its validity and feasibility. METHODS: A total of 542 X-rays of TPFs were collected as a reference database. An AI algorithm (RetinaNet) was trained to analyze and detect TPF on the X-rays. The ability of the AI algorithm was determined by indexes such as detection accuracy and time taken for analysis. The algorithm performance was also compared with orthopedic physicians. RESULTS: The AI algorithm showed a detection accuracy of 0.91 for the identification of TPF, which was similar to the performance of orthopedic physicians (0.92±0.03). The average time spent for analysis of the AI was 0.56 s, which was 16 times faster than human performance (8.44±3.26 s). CONCLUSION: The AI algorithm is a valid and efficient method for the clinical diagnosis of TPF. It can be a useful assistant for orthopedic physicians, which largely promotes clinical workflow and further guarantees the health and security of patients.

Tibial plateau fracture (TPF) is a fracture of proximal tibia in the knee joint. It is a severe articular injury with a broad damage-spectrum to the locomotor system, which usually accompanies poor clinical effect and limited articular function [1, 2] . As the pivotal location of force conduction among the lower extremity, the proximal tibia could be damaged by a compression fracture, split fracture, bone defect or other structural injuries during an excessive violent load [3, 4] . Currently, A conventional X-ray remains the primary diagnostic method to detect TPF in the orthopedics department, which processes convenient, rapid, and simple characteristics compared with other imaging-related examinations. Generally, image reading ability is a basic clinical skill that has to be mastered by qualified orthopedic physicians, which concerns the accurate diagnosis of TPF and effective treatment for patients. However, under the pressure of overloaded clinical work caused by surging patient demand as well as insufficient medical resources, it is prone to induce the risk of missed diagnoses and misdiagnoses, especially when the TPF fractures are not obvious on an X-ray, such as a minor fracture, non-displaced fracture, or occult fracture. There have been studies to indicate that under severe and urgent conditions, diagnostic errors often occurred to influence a correct reading of the radiograph, and involved uncooperative patients, inadequate clinical history, time critical decisions, and simultaneous clinical work, during which the missed image-based diagnosis and misdiagnosis could even exceed 40%, causing a great threat to the health and safety of patients [5] [6] [7] . Thus, exploring an accurate and safe auxiliary method to quickly detect TPF has been of great interest to orthopedic physicians.

Artificial intelligence (AI), the interdisciplinary study of computer technology, mathematic, cybernetics as well as determinism, is a new technology in the 21st century, which has led to an earth-shaking transformation of the world's operational model. The conception of AI is studying and researching human intelligence (HI) and making imitation computers based on an intelligent algorithm to simulate HI and even surpass it [8] . With the appearance of machine learning (ML), deep learning (DL) and convolutional neural networks (CNNs), the primary techniques of AI that are favorably suited to capture feature items and learning, AI has transformed into a performed method in image analysis. It has also gradually formed several functional applications such as (1) computer vision, (2) speech recognition, (3) natural language recognition, (4) decision planning and (5) big data analysis, which have also been applied to traditional industries including the medical field. In past studies, AI has been successfully used to assist in tumor detection of pathological sections [9] , gastrointestinal disease detection using capsule endoscopy [10] , thyroid disease detection by ultrasound [11] , pulmonary nodules and cancer detection with computerized tomography (CT), and also some orthopedic diseases with imaging-based examination such as scoliosis [12] , osteoarthritis [13] , and meniscus and cruciate ligament injuries [14] . All of these applications have shown satisfying results to improve detection accuracy and reduce the clinical workload. However, to the best of our knowledge, there are no studies involving AI-assisted TPF detection.

Therefore, the present study explored whether AI could be used for TPF detection on X-ray images and compared the performance of the AI to that of veteran orthopedic physicians, which aimed to further verify the feasibility and ability of AI-aided diagnosis. These results may aid in the development of a novel method for the clinical diagnosis and treatment of TPF.

According to the difficulty of data acquisition, the research was performed as a multi-center study among five Chinese triple-A grade hospitals, including the Wuhan Union Hospital, Wuhan Puai Hospital, Wuhan Puren Hospital, Xiangya Changde Hospital, and Northern Jiangsu People's Hospital. After data collection between August 2020 and August 2021, a total of 542 anterior knee joint X-rays of patients with TPF were acquired and included in the final database. Next, the X-rays in the database were converted from Digital Imaging and Communications in Medicine (DICOM) files to Joint Photographic Experts Group (JPEG) files with a matrix size of 1080×1333 pixels by Photoshop 20.0 (Adobe Corp., USA). Using the random-number-table function in Excel (Microsoft Corp., USA), the 542 JPEG files were numbered and randomly divided into two datasets: the training dataset (including 458 files, for algorithm learning and training) and the test dataset (including 84 files, for algorithm validation). The ratio of the two datasets was nearly 17:3.

With the training dataset, a type of DL-CNN recognition algorithm was setup and trained to learn the appearance of TPF on knee joint X-rays. After the training, the algorithm could automatically recognize and label the suspicious area of TPF on knee joint X-rays in the test dataset, which could be considered a detection assistant. Finally, the performance of the CNN algorithm, including accuracy and time spent on analysis, were compared with a panel of five veteran orthopedic physicians in the Orthopedics Department of Wuhan Union Hospital. To protect patient privacy, all identifying information on the X-rays were anonymized and omitted. The study was approved by the Ethics Committee of Wuhan Union Hospital.

Firstly, a DL-CNN recognition algorithm [RetinaNet, primarily proposed by Kaiming He in 2018 (structure of the original RetinaNet shown in fig. 1 )] was designed and improved. RetinaNet can extract image features with the Residual Network (ResNet), fuse the context information with the Feature Pyramid Network (FPN), and respectively predict the classification and location of objects with two CNNs. Compared to other frequently used algorithms (such as Faster R-CNN), RetinaNet has obvious detection advantages in terms of speed and accuracy. Subsequently, the original RetinaNet algorithm was further improved by applying the structure of MobileNet Version 2. For instance, the ResNet was replaced by Depthwise Separable Convolution in the feature extraction process. Additionally, a triple FPN was applied to expand the size of high-level images by upsampling, which was subsequently added to the lowlevel image to increase the semantic information and the accuracy of small targets in the low-level image. The structure of the improved RetinaNet was shown in fig. 2 . 

The training dataset (458 files with TPF) was labeled by two senior orthopedic physicians with more than 10 years of experience with the labeling software LabelImg (https://github.com/ tzutalin/LabelImg). The fracture line of each file was labeled carefully as a region of interest (ROI) for training the improved-RetinaNet. Then, the process of data enhancement of the training dataset was implemented by the algorithm, including image rollover, rotation, cropping and blurring, which doubled the original training dataset (916 files with TPF).

The improved-RetinaNet was trained with the labeled and enhanced training dataset using the Adam optimizer. The training parameters were set as follows: batch size, 2; dropout, 8; initial learning rate, le-3; and learning rate, 0.1.

The detection ability of TPF in X-ray images of the trained and improved RetinaNet was assessed using the test database, which was considered the final algorithm performance. Each of the 84 files with TPF in the test database was diagnosed by the trained and improved RetinaNet to automatically label the suspicious fracture line. The output results would be judged by the two senior orthopedic physicians with more than 10 years working experience. The potential divergence was arbitrated by the radiological report of the 84 X-rays from the imaging department.

To assess the diagnostic performance of orthopedic physicians on the clinical front line, there was also a panel set up of five volunteer orthopedic physicians from the Orthopedics Department of Wuhan Union Hospital, who had more than five years of working experience in image reading and orthopedics. These five orthopedic physicians were independent of this study and did not participate in any processes of this project. The panel of orthopedic physicians were set to individually diagnose the "blind" (without any reminder or radiological report) test dataset as TPF or non-TPF and they were not given a time limit. The diagnostic results of each physician were collected and judged by the same two senior orthopedic physicians to determine their accuracy, which was considered with the time taken for diagnosis as the performance of the orthopedic physicians.

To compare the performance of the improved RetinaNet and orthopedic physicians, the accuracy and time spent on analysis of the algorithm and orthopedic physicians were collected. The indexes were evaluated using the Student's t-test, and the procedure of the entire study was illustrated in fig. 3 .

The data from this study were presented as the mean±standard deviation (SD) or percentage, and the statistical analysis was performed using GraphPad Prism 7.0 software (GraphPad Corp., USA). The significance between the algorithm group and orthopedic physician group was evaluated using the Student's t-test. P<0.05 was considered to indicate statistical significance.

The final loss-curve-graph and P-S-curve-graph of the improved RetinaNet model were produced using Python script (https://www.Python.org) and are shown in fig. 4 . The loss was composed of three parts: regression loss, classification loss, and total loss. As shown in fig. 4A and B, the loss curve of the training and test datasets decreased rapidly and then flattened out, which indicated that the model could converge well. To quantify the detection performance of the model, the P-S curve was drawn to show the diagnostic capability of the algorithm (fig. 4C) Fig. 3 Flow diagram for the entire study was created by changing the threshold value, in which the abscissa represented sensitivity, and the ordinate represented precision. The detection performance of the algorithm could then be intuitively demonstrated by calculating the area under the curve.

After training the algorithm, the test process of the algorithm automatically labeled the suspicious TPF area on the X-ray with a rectangle (fig. 5 ). According to the personal judgments by two senior orthopedic physicians, there were seven diagnostic errors (missed diagnosis or misdiagnosis) in the 84 files, and the accuracy of the algorithm was 0.91. The time spent on the entire test process was 47 s, which gave an average of 0.56 s per image.

The manual diagnostic results from the panel of orthopedic physicians were collected to calculate the accuracy, as well as time spent, and these data were shown in table 1.

The performance of the algorithm and orthopedic physicians were compared, and the results were shown in table 2 and fig. 6 . The results show an accuracy of 0.91 for the algorithm and 0.92±0.03 for the orthopedic physicians, and this difference in accuracy between the two groups was not statistically significant (P=0.86). However, the average time spent by the algorithm was 16 times faster than human performance (0.56 s versus 8.44±3.26 s), which indicated a statistically significant difference between the two groups (P<0.001).

TPF is a complex fracture in the knee joint caused by high-impact events. The clinical diagnosis of suspected TPF is often insufficient due to an unclear X-ray presentation, such as a minor fracture, non-displaced fracture, or occult-fracture, which can lead to missed diagnoses or misdiagnoses especially in emergency situations [15] [16] [17] . A missed diagnosis or misdiagnosis in the diagnostic process can result in severely negative clinical outcomes for patients, including delayed treatment and poor recovery. Therefore, a valid solution to assist orthopedic physicians in avoiding incorrect diagnoses is required to improve patient outcomes.

In this study, AI technology was used to construct an improved-RetinaNet algorithm that could assist orthopedic physicians to recognize TPF on knee joint X-rays, and it showed satisfying performance, with an accuracy of 0.91 and average time spent of 0.56 s. In comparison to human performance (accuracy of 0.92±0.03), the AI algorithm was found to exhibit a similar accuracy of TPF recognition. However, due to the nonemergency and time-free constraints on the diagnosis process for the clinical panel in this study, which does not simulate the real circumstances in the emergency department and time-limited situations, the authors believe that the performance of the AI algorithm in this study would be even better in the real clinical environment. As for the time spent on the diagnosis of TPF in the present study, the AI algorithm greatly reduced the time (16 times faster) compared with orthopedic physicians, which undoubtedly could be a powerful auxiliary method and relieve the burden of TPF rapid-diagnosis in the clinic. Various studies have already testified the ability and feasibility of AI models for the detection of different clinical diseases involving X-ray images, which have shown similar results to the current study. For instance, in pulmonary disease detection, Hyunsuk designed an AI model trained by 5485 chest X-rays, and the sensitivity and specificity of lung nodule recognition reached 86.2% and 85.0%, respectively, which further arrived at 75.0% and 83.3% for lung cancer detection. In Hyunsuk's research, the AI model performed even better than the professional radiologist, and could also be a favorable assistant to reduce excessive workload and save medical resources [18] . In the detection of pneumonia, Wang set up a CNN algorithm to learn from 1647 chest images of confirmed patients and 800 from normal patients. After the training process, the sensitivity and specificity of the model for pneumonia detection reached 92.3% and 85.1%, respectively. The AI-diagnostic average time Fig. 6 The comparison between the algorithm and orthopedic physicians n.s.: not significant. *** P<0.001 spent was 0.55 min, which was 15 min less compared to human performance, providing great help for rapid clinical diagnosis [19] . Parallel studies have been conducted for the detection of orthopedic diseases. Similar to TPF in this study, the distal radius fracture (DRF) is a common fracture in orthopedics, whose rapid diagnosis is also confronted with difficulty. Because of this, Gan set up a CNN algorithm based on 2340 DRF patients and assessed the diagnostic ability for the detection of DRF on X-ray images. In their research, Gan indicated that the network exhibited a similar performance to the orthopedic physicians and could be feasible in clinical application as an auxiliary method under extended conditions [20] . Also concerning the detection of DRF, Lindsey developed a CNN model using a larger database that consisted of 135 409 cases of DRF patients. After the DL process by analyzing the database, the accuracy of the model was higher than the diagnostic ability of 18 senior professional orthopedic physicians. The sensitivity and specificity increased by 10.7% and 9.4% on average, which effectively reduced the risk of missed diagnosis and misdiagnosis in DRF detection and provided substantial improvements to patient care [21] . Similarly, for the detection of humeral supracondylar fractures (HSFs), Choi collected data of HSF patients who visited the orthopedic department within five years and designed an AI network. After training the algorithm, it was able to achieve efficient results with high sensitivity for HSF recognition, which plays an important role in rapid clinical diagnosis [22] . In the detection of proximal humeral fractures (PHFs), Chung analyzed four classifications of PHF (346 greater tuberosity fractures, 514 surgical neck fractures, 269 three-part fractures, and 247 fourpart fractures) using an AI model. The final result proposed an effective algorithm for the detection of PHF, and Chung further noted that the AI was able to improve the Neer classification, which brought profound significance for clinical PHF diagnosis and treatment [23] . Besides fracture detection, AI technology has also played an inspiring role in the diagnosis of other orthopedic diseases, such as scoliosis, arthritis, bone tumors, and meniscus and ligament injuries [24] [25] [26] . Taken together, these previous studies have confirmed the validity of AI-aided methods for clinical diagnosis.

In the current study, an AI algorithm was applied for the detection of TPF. The results indicated that the TPF diagnostic effects were greatly facilitated, and the accuracy even reached the level of human performance under non-emergent circumstances. Therefore, we believe the AI algorithm can be fully competent as an assistant of orthopedic physicians for the detection of TPF by X-ray. With the support of AI technology, the clinical workload could be reduced, medical resources could be saved, and more importantly, the health and safety of patients could be better guaranteed to a greater extent. However, the current study also has some deficiencies. (1) Despite the data-augmentation to expand the training test, the database in this study was small, which could influence the final performance of the AI algorithm. In future research, the scale of database should be enlarged to improve the performance of the algorithm. (2) The X-ray images collected in this study only consisted of anteroposterior film, which might not be optimal for the automated diagnosis of TPF. Moreover, the normal knee joint X-ray was not included in this study. As such, the index of sensitivity and specificity could not be accessed. In future research, lateral X-rays and X-ray images of normal knee joints should be included.

(3) The algorithm in this study still has room for improvement to achieve better performance. Finally, (4) the automatic diagnosis in this study merely stayed at the fracture line recognition, which did not include fracture classification. The classification is a necessary process of fracture treatment and provides a crucial reference to develop a surgical plan. In future research, fracture classification recognition will be added to the algorithm. Additionally, the algorithm achieved a satisfying performance but lacks further confirmation. If a miscalculation occurs during real clinical application, the medical responsibility still cannot be identified due to incomplete relevant laws. However, this should not be the reason to limit the development of AI-aided diagnostic tools, as new medical models still have a bright future.

In conclusion, the AI-based diagnostic algorithm is a valid and efficient method for the diagnosis of TPF. It can serve as a useful intelligent assistant for orthopedic physicians, which can streamline clinical workflow and help to guarantee the health and security of patients.

Fractures of tibial plateaus. A review of the literature

Approaches and fixation of the posterolateral fracture fragment in tibial plateau fractures: a review with an emphasis on rim plating via modified anterolateral approach

Comparative Analysis of Mechanism-Associated 3-Dimensional Tibial Plateau Fracture Patterns

Autologous Iliac Bone Graft Compared with Biphasic Hydroxyapatite and Calcium Sulfate Cement for the Treatment of Bone Defects in Tibial Plateau Fractures: A Prospective, Randomized, Open-Label, Multicenter Study

Diagnostic errors in an accident and emergency department

Defending the "missed" radiographic diagnosis

Errors in imaging patients in the emergency setting

Artificial Intelligence and Orthopaedics: An Introduction for Clinicians

Pathology Image Analysis Using Segmentation Deep Learning Algorithms

Gastroenterologist-Level Identification of Small-Bowel Diseases and Normal Variants by Capsule Endoscopy Using a Deep-Learning Model

Artificial Intelligence-Based Thyroid Nodule Classification Using Information from Spatial and Frequency Domains

A narrative review of machine learning as promising revolution in clinical practice of scoliosis

Applying Densely Connected Convolutional Neural Networks for Staging Osteoarthritis Severity from Plain Radiographs

3D convolutional neural networks for detection and severity staging of meniscus and PFJ cartilage morphological degenerative changes in osteoarthritis and anterior cruciate ligament subjects

3D mapping and classification of tibial plateau fractures

The effect of coronal splits on the structural stability of bi-condylar tibial plateau fractures: a biomechanical investigation

Arthroscopy Assisted Reduction Percutaneous Internal Fixation versus Open Reduction Internal Fixation for Low Energy Tibia Plateau Fractures

Validation of a Deep Learning Algorithm for the Detection of Malignant Pulmonary Nodules in Chest Radiographs

Deep learning-based triage and analysis of lesion burden for COVID-19: a retrospective study with external validation

Artificial intelligence detection of distal radius fractures: a comparison between the convolutional neural network and professional assessments

Deep neural network improves fracture detection by clinicians

Using a Dual-Input Convolutional Neural Network for Automated Detection of Pediatric Supracondylar Fracture on Conventional Radiography

Automated detection and classification of the proximal humerus fracture by using deep learning algorithm

Applications of Artificial Intelligence in Musculoskeletal Imaging: From the Request to the Report

The Use of Artificial Intelligence in the Evaluation of Knee Pathology

An Application of Artificial Intelligence to Diagnostic Imaging of Spine Disease: Estimating Spinal Alignment From Moire Images

The authors would like to thank the generous help and support from the Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology.

The authors declare that there are no conflicts of interest relevant to this article.