key: cord-0976608-e27qej9l authors: Fernández-Miranda, P. Menéndez; Bellón, P. Sanz; del Barrio, A. Pérez; Iglesias, L. Lloret; García, P. Solís; Aguilar-Gómez, F.; González, D. Rodríguez; Vega, J. A. title: Developing a Training Web Application for Improving the COVID-19 Diagnostic Accuracy on Chest X-ray date: 2021-03-08 journal: J Digit Imaging DOI: 10.1007/s10278-021-00424-7 sha: 9c4ece28ff54c1c898317f09253e12a99653a44e doc_id: 976608 cord_uid: e27qej9l In December 2019, a new coronavirus known as 2019-nCoV emerged in Wuhan, China. The virus has spread globally and the infection was declared pandemic in March 2020. Although most cases of coronavirus disease 2019 (COVID-19) are mild, some of them rapidly develop acute respiratory distress syndrome. In the clinical management, chest X-rays (CXR) are essential, but the evaluation of COVID-19 CXR could be a challenge. In this context, we developed COVID-19 TRAINING, a free Web application for training on the evaluation of COVID-19 CXR. The application included 196 CXR belonging to three categories: non-pathological, pathological compatible with COVID-19, and pathological non-compatible with COVID-19. On the training screen, images were shown to the users and they chose a diagnosis among those three possibilities. At any time, users could finish the training session and be evaluated through the estimation of their diagnostic accuracy values: sensitivity, specificity, predictive values, and global accuracy. Images were hand-labeled by four thoracic radiologists. Average values for sensitivity, specificity, and global accuracy were .72, .64, and .68. Users who achieved better sensitivity registered less specificity (p < .0001) and those with higher specificity decreased their sensitivity (p < .0001). Users who sent more answers achieved better accuracy (p = .0002). The application COVID-19 TRAINING provides a revolutionary tool to learn the necessary skills to evaluate COVID-19 on CXR. Diagnosis training applications could provide a new original manner of evaluation for medical professionals based on their diagnostic accuracy values, and an efficient method to collect valuable data for research purposes. In December 2019, a new coronavirus named 2019-nCoV, also known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was isolated in the airway epithelial cells of a cluster of patients with pneumonia of unknown cause in Wuhan, China [1] . Since then, the infection caused by 2019-nCoV has rapidly spread globally [2] affecting more than 4 million people in about 215 countries with more than 280.000 reported deaths to date [3] . On the 11th of March 2020, the World Health Organization (WHO) declared the novel coronavirus outbreak a global pandemic [4] . Most patients with coronavirus disease 2019 (COVID-19) are mild cases, whose symptoms are usually self-limiting and recover within 2 weeks [5] . However, others progress rapidly and develop acute respiratory distress syndrome (ARDS) and septic shock, eventually resulting in multiple organ failure [6] . At the time of this writing, fever and cough were the main clinical manifestations, followed by dyspnea, myalgia or weakness, and chest tightness [7] . Furthermore, a variable percentage of patients report decreased smell function or even anosmia and dysgeusia [8] . Currently, COVID-19 is being diagnosed using molecular detection methods such as the reverse-transcriptase polymerase chain reaction (RT-PCR) test, regarded as the standard of reference [9] [10] [11] . Nevertheless, despite RT-PCR COVID-19 testing has a specificity of 100% [11] , it shows a potentially high false negative rate deteriorating the sensitivity, and then it is not a definitive diagnostic method [10, 12] . Other techniques for the detection of COVID-19 are being used to increase the efficiency in the diagnosis, especially medical imaging modalities such as chest X-rays (CXR) and computed tomography (CT) scans [9, 10, 13] . In patients with a high level of clinical suspicion of COVID-19 and negative RT-PCR, CXR can be key to identify false negatives to RT-PCR COVID-19 testing as CXR abnormalities may appear before eventually testing positive on RT-PCR [13] . Moreover, in populations around the world with limited access to reliable real-time molecular diagnostic methods, the utilization of CXR for early disease detection also plays a capital role [14] . Additionally, imaging is also critical in assessing the severity and disease progression in a COVID-19 infection [15] . Thus, physicians evaluating COVID-19 on medical images should be aware of the imaging manifestations and radiological features that have been well-described [16] . However, the interpretation of chest radiographs is a challenging task, requiring experience and expertise [17] [18] [19] . The American College of Radiology (ACR) recommends that qualified radiologists be available to interpret all radiographs obtained in the Emergency Departments (ED) [20] , and previous studies have reported suboptimal performance in the interpretation of CXR by ED physicians compared with expert radiologists [21] [22] [23] . Importantly, the number of chest radiographs per ED visit have increased during the last decades [24] which is a practical limitation with regard to the full-time availability of expert radiologists [25] . This situation has deteriorated since the pandemic outbreak appeared in the worldwide clinical scenario. In the context of COVID-19 pandemic and the high demand of CXR reporting, many CXR are being interpreted by nonexpert physicians who have been forced to acquire the competence of detecting and evaluating the radiological features of COVID-19. Being aware of it, we have developed a free Web application to ease the learning process of interpreting COVID-19 CXR for physicians, residents, students, and anybody else interested in acquiring this competence. As far as we know, this application called COVID-19 TRAINING (https ://xray.covid .ifca.es/en) is the first available tool that allows a single user to calculate their diagnostic accuracy values: sensitivity, specificity, positive predictive value, negative predictive value, and global accuracy. The COVID-19 TRAINING application can set a precedent for further applications because this is the first time that a physician is considered as a single diagnostic tool by themselves. Usually, diagnostic accuracy values are calculated for diagnostic techniques or collectives of physicians but not for a single professional. In our opinion, providing this kind of tool to physicians can help them to evaluate performance by setting specific metrics, and to follow up their progression in their diagnostic efficiency. We collected 196 CXR in our Institution belonging to 33 females (35.48%) and 60 males (64.51%). The average age of these patients was 61.43 years with a standard deviation (SD) of 17.20 years and a range from 19 to 88 (Table 1) . The chest radiographs were all hand-labeled and classified by four expert thoracic radiologists into 3 categories: nonpathological, pathological compatible with COVID-19, and pathological non-compatible with COVID-19. Classification and inclusion criteria for each category were as follows: (1) non-pathological: (a) the CXR was obtained before the appearance of SARS-CoV-2 virus; and (b) four reports made The exclusion criteria for the three categories were as follows: (a) low-quality image, (b) improper alignment of the X-ray tube to the film, and (c) did not meet any of the other inclusion criteria. The chest radiographs were obtained with different equipment (portable and conventional machines), in different projections (posterior-anterior and anterior-posterior) and with different patient positions (standing and supine decubitus), in order to reproduce the most realistic clinical scenario. Data about patient distribution for each category is shown in Table 1 . This distribution tries to guarantee that the application shows the user a proportional number of images from each category with a slight predominance of the category pathological compatible with COVID-19, which are the most interesting images for the users. In addition, the application also resembles the real clinical context where females are less frequently infected than males [26] . This work was approved by the Ethical Committee of our Institution, and all the data collected to develop the application were fully de-identified before transfer to the development team. Subject information and links to the corresponding images are stored in a SQLite database. The application has been implemented using Python 3 [27] for the backend and Flask, a Python framework, for generating the client side or frontend. Some external libraries were used in order to expand the possibilities of the frontend, achieving multilanguage (currently supporting Spanish and English) or more structured styles hierarchy. Some other good practices based on the Clean Code principles are being applied. All the code is publicly available in an open access repository [28] . User management is supported by an Authentication and Authorization Infrastructure based on Open ID Connect standard. A public beta version was published on the 11th of April 2020, and the 24th of April we launched the final version. The COVID-19 TRAINING application was designed to help all doctors, residents, medical students, and anybody else interested in acquiring the skills to recognize the radiological findings of COVID-19 on CXR. It includes CXR with different levels of difficulty so everyone will be able to join. The application is available for free and adapted both for mobile and tablet ( Fig. 1 ) as well as for computer screens (Fig. 2 ) in Spanish and English languages. The access and navigation on the application is shown on a video added in the Appendix 1. First step: The first screen that the user finds when opening the application is the welcome page (Figs. 1 and 2). It serves as the starting point of the application and shows a headline and a paragraph describing briefly its functionalities. On this interface two buttons are displayed: "COVID-19 TRAINING" that provides easy access to the login screen, and "Radiological features of COVID-19 on X-Ray," which opens a summary of the radiological COVID-19 findings that the user should be able to identify on CXR before starting the training. Second Step: Login screen requires the user to accept the terms and conditions and to press the "Login" button to access the authentication interface (Fig. 3a) . Authentication can be easily done either just entering a Google Gmail or creating a new account by pressing on the "Register" button in the lowest part of the screen (Fig. 3b) . If the user's preferred option is the second, they will only need to provide their name, surname, email address, and password. An email will be automatically sent with a link to verify the account. Third Step: Finally, a drop-down menu will allow the user to select their specialty or professional category (Fig. 3c, d) . Available options are shown on Table 2 . Once the authentication is done and the profile category is selected, a new screen with CXR will be open. Besides the CXR, patient's age and gender are also provided (Fig. 4a) . On the left side of the screen, the user can choose in a drop-down menu one diagnosis among the three possible categories described before: non-pathological, pathological compatible with COVID-19, and pathological noncompatible with COVID-19 (Fig. 4b) . After selecting the desired option, by pressing the "Next" button, the answer will be registered and a new case will be charged. At any time, the user can finish the training session by pressing the "End Test" button ( Fig. 4) . Each time the user takes a test, a new set of training images is selected. A virtual magnifying glass is also implemented. Users simply need to press on the image to summon the magnifying glass, and they will see a zoomed image within its radius, without disturbing the rest of the page (Fig. 5) . To remove it, they only need to press on the image again. The "End Test" button links to the results interface, where the user's sensitivity, specificity, predictive values and global accuracy are provided after each training (Figs. 6 and 7). If the users scroll down on this interface, those questions that they answered incorrectly will be displayed, indicating the given response and the correct answer, so they will learn from their mistakes. Pathological non compatible with COVID-19 images also include a reference to the real pathology of the patient. The COVID-19 TRAINING application allows the user to evaluate themselves through the estimation of their diagnostic accuracy values: sensitivity, specificity, positive predictive value, negative predictive value, and global accuracy. Diagnostic accuracy values are calculated for COVID-19 diagnosis. The application registers the answers of the users and classifies them into the following four conventional categories [29, 30] The sensitivity represents the user's ability to determine the COVID-19 cases correctly. It accounts for the proportion of true positives in patient-cases: TP/(TP + FN) [29] . The specificity shows the user's capacity to rule out COVID-19 correctly. To estimate specificity, the proportion of true negatives in healthy cases should be calculated: TN/ (TN + FP) [29] . Positive predictive value (PPV) defines the probability of having COVID-19 when the user classifies the CXR into the category pathological compatible with COVID-19. Therefore, it represents the proportion of COVID-19 patients within the patients with positive CXR for COVID-19 according to the user's criteria: TP/(TP + FP) [30] . By contrast, negative predictive value (NPV) describes the probability of not having COVID-19 when the user does not [30] . Finally, global accuracy depicts the ability of the user to differentiate COVID-19 patients and non-COVID-19 patients. It is the proportion of true positives and true negatives in all evaluated cases: (TP + TN)/ (TP + TN + FP + FN) [29] . The information gathered by the application includes TP, TN, FP, FN, sensitivity, specificity, PPV, NPV, and global accuracy for each user and each time they take the test. The specialty or professional category selected after the authentication and the answers given to each CXR are also stored. Nevertheless, the specialty or category chosen before starting the training have only been recorded since the launch of the last version on the 24th of April. Beta test did not record it. Engagement with the application is tracked using a customized data capture system and validated with Google Analytics [31] . This program is an effective resource for measuring the diffusion and understanding the geodemographics of users [32] . Statistical analysis was performed with IBM SPSS Statistics [33] . Averages for diagnostic accuracy values were computed following two different approaches: firstly, population diagnostic values were calculated from the totals of TP, TN, FP, and FN collected; differently, the means for the diagnostic values obtained by the users were also estimated. The test used to assess differences in the performances between two groups was chi-squared test. To indicate a statistically significant difference, a p < 0.05 was considered. After the beta launch on the 11th of April, the application had 431 users within the first 3 days and 704 within the first week, with a total number of answers of 23,130. This version had users in more than 20 countries, according to Google Analytics reports. Table 5 ). The hardest CXR obtained an odds of answering correctly of 0.32 (Table 6 ). On the other side, the easiest CXR odds of answering correctly was 18.70 (Table 7) . Both CXR belonged to the category pathological compatible with COVID-19 (Figs. 11a and 12) . As would be expected, users who achieved a sensitivity equal or higher to the average sensitivity (0.72), also registered less specificity (p < 0.0001). Similarly, the user's with higher specificity, those who achieved the average specificity (0.64) or more, decrease their sensitivity (p < 0.0001). In addition, the users who sent more answers than the 50th percentile (68 Fig. 6 Screen of results, as viewed from a desktop device answers) achieved better final average global accuracy (p < 0.0001); excluding users who sent less than 10 answers from the analysis, results were also statistically significant (p = 0.0002) ( Table 8) . Beta version did not ask for the user's category or specialty, so no data about it was recorded before the final version that was launched on the 24th of April. Several medical educational applications have been developed [33] [34] [35] , but only a few of them are focused on the diagnosis training, and even less on COVID-19 diagnosis. Currently, the society is facing a health situation which has no precedents [36] . In this context, the role of the medical community is vital [37, 38] and the medical education of those who are taking part of the solution is essential [18, 39] . The application COVID-19 TRAINING was developed to help professionals to acquire the required competencies to diagnose COVID-19 on CXR. This application brings a new manner to ease the learning of the necessary skills to successfully evaluate COVID-19 on CXR. According to us, this is a different educational technique that could set a precedent for developing further applications on the training on the diagnosis of other pathologies. Furthermore, we have introduced an original evaluation method based on the estimation of the users' diagnostic accuracy values for COVID-19. In our opinion, this mechanism of assessment could prove useful for medical professionals since it provides them with information about their sensitivity, specificity, predictive values, and global accuracy. Armed with that knowledge, they will recognize and improve their weaknesses. The results obtained by the first version of the application showed that those who achieved better sensitivity also decrease their specificity. Similarly, those who registered better specificity also had lower sensitivity. These results evidence that usually, increasing sensitivity entails decreasing specificity, and vice versa. Our application can help the users to find an optimal balance between their sensitivity and specificity and consequently, to achieve their best possible global accuracy. In addition, the analysis and display of the failed answers is also a key feature of this tool. At the end of the test, users can review their incorrect responses so they will become aware about their mistakes and they will recognize their weak points. This feature can be also useful in research to identify the cases missed by numerous users. Later on, the examination of these problematic images may give some clues to find solutions to improvable areas in the medical practice not detected before. Following this approach, we have analyzed the hardest and the easiest CXR trying to understand why users found difficult or obvious the respective cases. The most difficult CXR was a COVID-19 case that had subtle and ill-defined ground-glass opacities in the right lung. The most evident findings on this image were in the right upper lobe, nearly overlapped with the first costochondral junction (Fig. 11 ). This first costochondral junction is a wellrecognized pitfall on CXR and it sometimes mimics a rounded opacity [40] . Perhaps, in the situations where a COVID-19 is suspected and a prominence on either the first costochondral or first costosternal or sternoclavicular junctions is seen on CXR, further evaluation should be performed. On the other side, the easiest CXR was also a COVID-19 patient, but in this case, the image illustrated evident peripheral bilateral opacities, which are one of the most typical radiological finding of this disease [7, 13] (Fig. 12) . Overall, this application seems to be helpful. From the beginning, it had a dizzyingly fast diffusion with hundreds of users registered from more than 20 different countries within the first days and more than 20.000 answers sent. Additionally, many medical societies spread and shared this first beta version on their webpages. Results showed that the users who sent more answers achieved better global accuracy. In addition, only 6 users (0.9%) exceeded the number of cases contained in our dataset, so most of them did not repeat answers. Users who sent less than 10 answers were excluded from the analysis to avoid potential biases. These facts could indicate that the application improves the users' diagnostic skills. Finally, we also found that medical educational applications may be used in research as a new method to collect relevant information. Since the application was officially launched on the 24th of April, data about the users' specialty or professional category have been recorded. Our intention is to use this data to analyze the difference in the diagnostic values between the users belonging to different specialties or categories and to try to estimate the real utility of the CXR on the evaluation of COVID-19. . 11 a The most difficult chest X-ray. It is a 66-year-old female with a chest X-ray belonging to the category pathological compatible with COVID-19. On this image, subtle ground-glass opacities on the right lung are visible (arrow): the most evident opacity is in the right upper lobe (circle). b On a subsequent chest X-ray performed to the same patient after three days, this ground-glass opacity in the upper lobe became even clearer (circle). In addition, new opacities appeared in the lower lobes on this chest X-ray (arrows) Fig. 12 The easiest chest X-ray. It is a 69-year-old male diagnosed of COVID-19. The image shows extensive, multiple and bilateral opacities (arrows) indicating a severe form of COVID-19 pneumonia Table 8 Comparison of the results between groups and p values for chi-squared tests CI confidence interval, TP true positive, FN false negative, TN true negative, FP false positive a Group 1: users who achieved a specificity lesser than the mean (.64); group 2: users who achieved a specificity equal to or higher than the mean (.64) b Group 1: users who registered a lower sensitivity than the mean (.72); group 2: users who registered a sensitivity equal to or higher than the mean (.72) c Group 1: users who sent a number of answers equal to the 50th percentile (68 answers) or less; group 2: users who sent more than 68 answers d Group 1: users who sent a number of answers between 10 and the 50th percentile (68 answers); group 2: users who sent more than 68 answers In present COVID-19 pandemic, the medical education of the professionals involved in patients care is vital and COVID-19 TRAINING brings a different solution to help them in this purpose. Applications focused on the training on diagnosis could provide a new original manner of evaluation for medical professionals. The assessment of users by estimating their diagnostic accuracy values make them aware of their weak points. In addition, this kind of application also collects valuable information that can be used for research purposes. A Novel Coronavirus from patients with pneumonia in China Coronavirus infections: Epidemiological, clinical and immunological features and hypotheses WHO's Coronavirus disease (COVID-19) outbreak situation dashboard WHO Declares COVID-19 a Pandemic COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention Insight into 2019 novel coronavirus -an updated interim review and lessons from SARS-CoV and MERS-CoV Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2: A systematic review and meta-analysis Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (COVID-19): a multicenter European study Diagnosing COVID-19: The Disease and Tools for Detection Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19 Frequency and distribution of chest radiographic findings in COVID-19 positive patients Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review COVID-19 outbreak in Italy: Experimental chest X-ray scoring system for quantifying and monitoring disease progression Coronavirus (COVID-19) Outbreak: What the Department of Radiology Should Know Deep learning for chest radiograph diagnosis in the Emergency Department How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology A think-aloud study to inform the design of radiograph interpretation practice ACR practice parameter for radiologist coverage of imaging performed in hospital emergency departments Interpretation of Emergency Department radiographs: a comparison of emergency medicine physicians with radiologists, residents with faculty, and film with digital display Chest radiographs in the emergency department: is the radiologist really necessary? Accuracy of radiographic readings in the emergency department Increasing Utilization of Chest Imaging in US Emergency Departments From 1994 to 2015 Survey of afterhours coverage of emergency department imaging studies by US academic radiology departments COVID-19: What has been learned and to be learned about the novel coronavirus disease El Ashal G: Part 1: Simple Definition and Calculation of Accuracy Using Google Analytics to evaluate the impact of the CyberTraining project App Review: The Radiology Assistant 2.0 A systematic review of healthcare applications for smartphones Available at https :// www.who.int/dg/speec hes/detai l/who-direc tor-gener al-s-openi ngremar ks-at-the-media -briefi ng-on-covid Supporting the Health Care Workforce During the COVID-19 Global Epidemic Fighting COVID-19: Enabling Graduating Students to Start Internship Early at Their Own Medical School Clinical and computed tomographic (CT) images characteristics in the patients with COVID-19 infection: What should radiologists need to know? Pulmonary pseudonodules on computed tomography: a common pitfall caused by degenerative arthritis We would like to acknowledge all the professionals who have battled in an exemplary manner to safeguard the health and life of all citizens worldwide since the beginning of this pandemic. The authors received no financial support for this work. The authors declare that they have no conflict of interest. A link to a video navigating the application: https ://api.cloud .ifca.es:8080/swift /v1/covid 19/VIDEO %20APP .mov.