key: cord-0866328-wpbf716k authors: El Naqa, Issam; Li, Hui; Fuhrman, Jordan; Hu, Qiyuan; Gorre, Naveena; Chen, Weijie; Giger, Maryellen L. title: Lessons learned in transitioning to AI in the medical imaging of COVID-19 date: 2021-10-01 journal: J Med Imaging (Bellingham) DOI: 10.1117/1.jmi.8.s1.010902 sha: 16e7f8de2a2a45bc3b90c202dea397b5dd8f83d6 doc_id: 866328 cord_uid: wpbf716k The coronavirus disease 2019 (COVID-19) pandemic has wreaked havoc across the world. It also created a need for the urgent development of efficacious predictive diagnostics, specifically, artificial intelligence (AI) methods applied to medical imaging. This has led to the convergence of experts from multiple disciplines to solve this global pandemic including clinicians, medical physicists, imaging scientists, computer scientists, and informatics experts to bring to bear the best of these fields for solving the challenges of the COVID-19 pandemic. However, such a convergence over a very brief period of time has had unintended consequences and created its own challenges. As part of Medical Imaging Data and Resource Center initiative, we discuss the lessons learned from career transitions across the three involved disciplines (radiology, medical imaging physics, and computer science) and draw recommendations based on these experiences by analyzing the challenges associated with each of the three associated transition types: (1) AI of non-imaging data to AI of medical imaging data, (2) medical imaging clinician to AI of medical imaging, and (3) AI of medical imaging to AI of COVID-19 imaging. The lessons learned from these career transitions and the diffusion of knowledge among them could be accomplished more effectively by recognizing their associated intricacies. These lessons learned in the transitioning to AI in the medical imaging of COVID-19 can inform and enhance future AI applications, making the whole of the transitions more than the sum of each discipline, for confronting an emergency like the COVID-19 pandemic or solving emerging problems in biomedicine. Challenges associated with this disease have also identified new opportunities for developing better diagnostic tools and therapeutics that are expected to survive this pandemic and lead to a brighter future in medical imaging analytics. 1, 16 Among the many lessons learned from this pandemic is the development of efficacious predictive diagnostics using medical imaging analyses [17] [18] [19] [20] [21] [22] [23] [24] and advanced artificial intelligence (AI) tools. [25] [26] [27] [28] There has been a tremendous uptake in the utilization of AI for analyzing chest radiographs and computed tomography lung images for COVID-19, as presented in Fig. 1 . However, many of these studies replicate similar analyses, with limitations on datasets and validation methods and, thus, unclear findings and paths to clinical utility. 29 Moreover, earlier studies have utilized pediatric datasets, whereas most of the COVID-19 patients have been adults, which created an age bias effect. 30, 31 Despite AI's known potentials in radiological sciences, 32 issues related to image quality, preprocessing, representative training/tuning/testing [The "training/validation/testing" terminology has been popularly used, especially in the computer science community. However, the terminology has been found to be confusing for many in the medical imaging and clinical communities. The "validation" refers to using a dataset (typically small) to examine the performance of a trained model under certain hyperparameters and tune the hyperparameters based on the model performance on that dataset. Therefore, we use "training/tuning/testing."] datasets, choices of improper AI algorithms and structures and interpretability of AI findings seem to have contributed to irreproducible and conflicting results in the literature. 33 Though this convergence of many forces to solve a global pandemic is admirable, to meet its intended goals, it has to be conducted not in a single step but as a transitional process requiring collective efforts among medical imaging professionals and data analytic experts to bring to bear the best of these fields in solving biomedical challenges such as the diagnose and prognosis of COVID-19 pandemic. As noted earlier, there is a need to bring varying scientific and clinical expertise to develop better AI tools for COVID-19 diagnosis and prognosis, in which medical imaging will continue to play a pivotal role. Depending on the practitioner's background and expertise, there are three types of career transitions that can be extended from the COVID-19 AI era as discussed below and summarized in Fig. 2 . (1) Transition of an AI expert with no imaging background into AI of medical imaging data, which is typically the case of a computer scientist with interest in medical imaging. (2) Transition of a medical imaging clinician (e.g., radiologist) into AI of medical imaging. (3) Transition of combined (1) and (2) of AI of medical imaging into confronting an emerging challenge such as AI of COVID imaging. Fig. 1 Number of publications on COVID-19 with a PubMed search of "COVID" and "imaging" and ("machine learning" or "deep learning" or "AI"). 2 Potential Use Cases and Issues In addition to AI, knowledge of both medical presentation and imaging physics is necessary for successfully developing AI applications in medical imaging. This generally applies to computer scientists who are AI experts but with no previous background in imaging in general or medical imaging specifically. In this scenario, the AI expert may overly simplify the treatment of medical images, with their 2D and 3D pixels or components (i.e., slices), as deterministic independent data elements on which AI tools can be directly adapted, without realizing the intricates of medical imaging modalities, particularly various sources of variability that may affect the robustness/ generalizability of AI algorithms. For instance, akin to medical imaging are the distinct multidimensional nature of 2D (e.g., CXRs) or 3D images (e.g., CT), relative or absolute quantitative nature, the different manufacturers or models of image acquisition systems or the different acquisition parameters (e.g., mAs and kVp) for a particular system, image reconstruction methods (e.g., analytical or iterative), the need for clinical expertise in annotation/truthing (as compared with crowdsourcing in natural image annotation), and the specific clinical context that needs to be considered part of the evaluation process. 34 Among the many potential problems, we highlight the following cases due to their prevalence in the reported literature. Leakage between training and testing, in transitioning from AI to medical imaging AI Understanding the nature of medical images and the clinical context in which they are used is a necessary prerequisite for any successful application of an AI algorithm for diagnostic or prognostic medical purposes. How the training models should be built and how the data will be used in that process can be a main source of confusion and error and subsequent bias as well. 35 For instance, in a diagnostic learning task, a CT scan with many images could be erroneously perceived and then evaluated on a 2D slice-by-slice analysis based on an inaccurate assumption of independence between slices from the same patient while they are typically correlated. As such, slices from the same patient may be included in both training and testing sets, thereby yielding overly optimistic testing results due to this data leakage. Similarly, in the case of prognostic studies for predicting death or ventilation need, it is sometime unclear whether images from the same patient at different time points were mistakenly used for both training and testing. 36 In traditional machine learning models based on feature engineering, feature selection is an essential process in the building of computational tools for the AI interpretation of medical images. This process is prone to statistical bias and potentially overly optimistic results if not done properly; for example, if the entire dataset was first used for feature selection and then partitioned into apparently non-overlapping subsets for training and testing, the test results would be optimistically biased. 37 Proper methods should be used to avoid such pitfalls, such as the use of nested cross validation or other resampling or independent testing methods in algorithm development and evaluation. 38 Although the new generation of deep learning algorithms bypasses the feature selection process by embedding the data representation as part of the algorithm training itself [e.g., the convolutional layers in the convolutional neural networks (CNNs)], proper safeguards should still be applied when tuning a large number of hyperparameters in this case (i.e., model selection). Another commonly encountered issue is false positive findings due to multiple testing, i.e., when multiple hypotheses are tested without appropriate statistical adjustment, a truly null hypothesis may be found significant due purely to chance. An infamous problem in this area was cited in the analysis of fMRI data from an image of a dead Atlantic salmon. Using standard acquisition, preprocessing, and analysis techniques, the authors showed that active voxel clusters could be observed in the dead salmon's brain when using uncorrected statistical thresholds for multiple comparisons, which led to such false positives, i.e., active voxels clusters in dead fish. 39 To avoid such pitfalls, appropriate statistical methods must be prespecified in a study to control the type I error rate. 40,41 This arises when there is no clarity in the description of the clinical problem. 29 For instance, a clinical problem might be posed as a diagnostic problem for differentiating between COVID-19 cases and control cases. In this scenario, the control cohort might be healthy individuals or might be individuals with other lung diseases such as pneumonia (in which presentation of the disease on the medical image might mimic that of COVID-19). Those two different control cohorts will yield a wide range of classification performances and, thus, confusion on the ultimate clinical use of the AI. In another context, confusion can arise in the purpose of the AI algorithm. Is the AI algorithm for identifying which images have COVID-19 presentations (classification task) or is the AI algorithm for localizing regions of COVID-19 involvement within an image (segmentation task)? This can also manifest in the performance evaluation. These clinical tasks will require different evaluation metrics and clarification of sensitivity and/or specificity, especially when imbalances arise in the dataset. An AI algorithm trained in a non-medical context may not transfer well to medical imaging AI, in part due to restrictive deidentification or HIPAA concerns, which may limit some of the data availability. For instance, the required data may be fragmented or distributed across multiple systems. This is in addition to issues related to the need for proper data curation processes to ensure that the necessary quality is available. Different hospitals may apply their CT protocols differently such as variations in image acquisition systems and protocols (i.e., slice thickness and reconstruction) indicating the need for harmonization. Otherwise, the process may be subject to the "garbage in garbage out" pitfall. On the other hand, transfer learning, 42, 43 with fine tuning and feature extraction, has proven to be quite useful in AI of medical imaging, especially for tasks related to image analysis such as segmentation, denoising, and classification (leading to diagnosis and other tasks). Therefore, to mitigate the problem of mismatch between the intended training process and the clinical task, there needs to be transparent communication between the AI and the medical imaging domain experts. Several dynamic algorithms such as reinforcement learning tend to continuously update their training by exploring the environment, which could work well in a safe setting such as a board game but may have legal and ethical ramifications in the real world (e.g., self-driving accidents) and especially in the context of medicine, where life or death decisions are being made. The traditional approach to "continuous" learning is implemented through manually controlled periodic updates to "locked" algorithms. 44 Recent technology advances in continuous learning AI have the potential to provide fully automatic updates by a computer with the promise of adapting continuously to changes in local data. 45 However, how to assess such an adaptive AI system to ensure safety and effectiveness in the real world is an open question, and there are several ongoing research efforts and discussions on this topic across academia and regulatory agencies to ensure that the deployed AI algorithm is still correctly performing the originally intended task. [44] [45] [46] This process may require an AI monitoring program and periodic evaluations over the lifetime of the algorithm. The ready availability and user-friendliness of recent deep learning software codes have allowed novice to experts in AI to rapidly develop, apply, and/or evaluate AI on sets of medical imaging data. However, lack of sufficient technical expertise on medical image acquisition, subtleties in AI algorithm engineering, and statistics may result in potentials for bias, inappropriate statistical tests, and erroneous interpretations of findings. As machine learning methods are increasingly developed for medical imaging applications, there are numerous models made available by investigators either publicly or through collaborations. However, direct applications of off-the-shelf models are often unsuccessful. Needless to say, models trained for a specific task, e.g., detecting a certain disease on a certain imaging modality, are not suitable for being directly applied to a different task, because useful features for performing tasks can be vastly different. Moreover, even when the task in question remains the same, directly applying off-the-shelf models developed on one dataset often fails to generalize their performance when applied to another dataset. There are several reasons for the lack of robustness across different datasets for machine learning models in medical imaging applications, including differences in imaging acquisition protocols, scanners, patient populations, and image preprocessing methods. Features extracted by machine learning models, both human-engineered features and automatically learned features by deep learning, are often sensitive to these differences between the source domain and the target domain. Therefore, it is good practice to preprocess the images in the target domain the same way as for the development set, harmonize differences in images resulting from image acquisition and population characteristics, and fine-tune the pretrained model on a subset of the images in the target domain when appropriate. Another significant issue is cross-institution generalizability of deep learning CNNs, i.e., a CNN trained with data from one clinical site or institution is found to have a significant drop in performance on data from another site. This has been frequently reported in the recent literature, for instances, on the classification of abnormal chest radiographs, 47 on the diagnosis of pneumonia, 48 and on the detection of COVID-19 from chest radiographs. 49 In addition to the possible reasons discussed in the previous paragraph, a particular issue found by these authors is that CNNs have readily learned features specific to sites (such as metal tokens placed on patients at a specific site) rather than the pathology information in the image. As such, proper testing procedures that mimic real-world implementation are crucial before any AI clinical deployment. 50 Often off-the-shelf texture analysis software packages are used to quantify the variation perceived in a medical image, the output of which is subsequently related to disease states using AI algorithms. 51 Assessment of texture provides quantitative values of pattern intensity, spatial frequency content (i.e., coarseness), and directionality. 52 Caution is needed, however, since, as noted by Foy et al., 51 algorithm parameters, such as those used for normalization of pixel values prior to processing, can vary between software packages that are calculating the same mathematical feature. An example is in the calculation of texture features from gray-level co-occurrence matrices (GLCM), in which GLCM parameters may be unique to the specific software package. Entropy, for example, involves the calculation of a logarithm, which might be to the base e, the base 2, or the base 10, with each yielding different output texture values of entropy. 51 There have been attempts to standardize the calculation of radiomic features. 53 It is important to understand both the mathematical formulation of the radiomic feature as well as the software package implementation to achieve the intended results as recommended by the Image Biomarker Standardization Initiative. 53 The novice developer of AI should be careful to avoid overfitting and bias in AI algorithms, often caused by limited datasets in terms of number, distribution, and collection sites of cases. 54 While even independent testing using cases from a single institution may yield promising performance levels, the AI algorithm may not be assumed to be generalizable until it is tested on an independent multi-institutional dataset with results given in terms of performance in the specific task including its variation (reproducibility and repeatability) as well as assessment of whether or not the algorithms are biased. Sharing of data in such instances may be challenging; therefore, the availability of public resources such as the Medical Imaging Data and Resource Center (MIDRC) or the utilization of emerging methods based on federated learning would be helpful in this process. 55 The situation of an AI/CAD researcher with medical imaging experience transitioning to COVID-19 applications is anticipated to be the most likely of the aforementioned scenarios. The researcher is generally familiar with common radiological practices, algorithms, and applications of AI to medical images but has focused on other problems such as deep learning applied to image reconstruction or interpretation for non-COVID-19 disease. However, there are aspects of COVID-19 AI research that may be unfamiliar to the researcher. We discuss three instances here but note that there are other unique aspects to COVID-19 imaging; these three were selected due to their frequent occurrence and interest. Observer variability refers to the concept of different readers assessing the same input image and producing different results. 56, 57 While not uncommon in medical imaging, the degree to which observer variability may impact an AI model either during training or evaluation is dependent on the imaging task. [58] [59] [60] [61] [62] For example, in COVID-19 detection problems, the standard ground truth identifying disease positivity is RT-PCR, which may have less variability compared with imaging. 63, 64 Alternatively, medical image segmentation tasks generally utilize a ground truth mask delineated by expert radiologists. Several studies have analyzed observer variability for tasks related to COVID-19 assessment, including disease detection, diagnosis, segmentation, and prognosis, including detection of abnormalities on CXRs and segmentation of disease on CT scans, even in other diagnostic tests such as RT-PCR. [63] [64] [65] [66] [67] [68] [69] Methods that can alleviate complications related to observer variability include statistical resampling techniques, such as bootstrapping or imbalance correction such as synthetic minority over-sampling technique and using appropriate evaluation metrics such as precision-recall and receiver operating characteristic analyses. 70, 71 Furthermore, implementation of clearly defined task-relevant scores such as CO-RADS can standardize evaluations and reduce the impact of such interobserver variability. 72,72 Most diseases have existed for many years and thus have undergone extensive investigation of determining appropriate imaging protocols for detection, diagnosis, and prognosis evaluations. However, protocols for COVID-19 patient imaging have rapidly changed over the past year as researchers investigated optimal practices for a variety of situational evaluations. 73 Further, different radiological societies may have inconsistent or conflicting recommendations based on regional and national standards. [74] [75] [76] [77] [78] [79] [80] The effect of these phenomena is twofold. (1) Existing COVID-19 imaging datasets may contain images that would not be appropriate for general application due to outdated image acquisition protocols and thus should not be included in AI training without careful curation, especially if acquired externally to the region or institution in which the AI system will be utilized. (2) Task selection for an AI system should be performed based on local practice, which in turn should be based on radiological society or WHO recommendations. For example, most imaging societies do not recommend the use of CT for COVID-19 screening due to the increased patient dose, with likely no information gain compared with RT-PCR and chest radiography; thus AI algorithms trained for COVID-19 detection may prioritize detection on CXR rather than CT for widespread implementation. While these studies may still provide some value, AI algorithms should generally be developed with appropriate task selection and data curation that is relevant with current image acquisition guidelines for their intended application. One of the most common obstacles in training unbiased medical imaging AI systems is insufficient relevant patient imaging data. 81 This obstacle is exacerbated due to the brief existence of COVID-19 as the total pipeline of image acquisition, curation, and labeling for ML usage can be time-consuming. During the pandemic, several teams have attempted to utilize publicly available datasets of both COVID-19 and non-COVID-19 patient images to develop generalized models for clinical use. 35 Thus two key problems currently are posed: (1) data, particularly private data, are generally skewed such that the amount of COVID-19 positive data is notably smaller than COVID-19 negative data and (2) the use of publicly available datasets can lead to a high risk of model bias, either due to a lack of information (such as consistent confirmation of COVID-19 diagnosis through RT-PCR and/or other clinical tests) or through crossover of images between databases leading to biased evaluations. 64 Further, some public datasets include images not in DICOM format (e.g., images scrubbed from published papers). Thus it is critically important to understand the source of the imaging data and evaluate appropriately. For example, evaluation should be performed on completely independent private (when possible) and public datasets, such as the image testing data in the MIDRC. In this section, we discuss the transition from AI of medical imaging to AI of COVID-19 imaging in the clinical context of pan-or multi-omics. 82 Clinicians nowadays are equipped with a wealth of information that may include patient demographics and history, routine biochemical and hematological data, genetic/genomic testing, and multi-modality imaging exams. Two tracks of research in such a clinical context hold great promise to benefit patients and our healthcare system: correlation and integration. Correlation of different types of information not only may offer better understanding of the disease characteristics but also may allow for replacement of invasive/expensive testing with non-invasive and less costly imaging exams or identify the role of imaging in different clinical tasks or stages of patient management. On the other hand, when multiple types of measurements are found to be complementary, optimal integration of them is of great interest for better patient outcomes. 83 Despite these potential benefits, lessons have been learned from the pre-COVID-19 era mostly in the area of AI for cancer diagnosis and treatment: 32 limited datasets and lack of validation data, lack of expandability, and the need for rigorous statistical analysis, to name a few. 26, [84] [85] [86] Like many other diseases, 52, [87] [88] [89] [90] there are many clinical tasks related to COVID-19 that can be interrogated using information extracted from medical images. Below we discuss the potential role of AI for COVID-19 imaging in the clinical context of integrative modeling with pan-or multi-omics. Although RT-PCR is the reference standard for COVID-19 diagnosis, molecular testing may not be readily available or may lead to false negative results for patients under investigation of COVID-19, and thus under these circumstances, CXR/CT can be critically useful in the diagnosis of the disease. CXR/CT in COVID-19 diagnosis has moderate sensitivity and specificity 91 in current radiological practice possibly because of clinicians' limited experience in reading COVID-19 cases. AI/ML models can be trained using these images with RT-PCR as the reference standard. These trained models can be applied to patients to make rapid diagnosis when only imaging data are available. [92] [93] [94] AI/ML models have been shown to be able to help clinicians differentiate COVID-19 induced pneumonia from other types of pneumonia, including common and unique radiographic features differentiation. 26 Thus AI/ML models including both deep learning and human-engineered radiomics have the potential to aid clinicians with improved diagnostic performance. Baseline pulmonary CXR/CT imaging characteristics can be used to assess the severity/extent of disease. 95 A severity score derived from lung images can be used for triaging the patient to decide home discharge, regular hospital admission, or intensive care unit admission. 96 The imaging characteristics can be used for prognosis assessment 97 and evaluation of the extent of disease, 11 and it has potential to predict the length of hospital stay. 98 Although COVID-19 is seen as a disease that primarily affects the lungs, it may also damage other organs, including heart 93,99 and brain, 93,99-102 and many long-term COVID-19 effects are still unknown. The COVID-19 patients have to be closely monitored even after their recovery, when imaging and AI may play a pivotal role. Baseline pulmonary CXR/CT imaging characteristics can also be used for patient management to determine the therapeutic treatment plan. The temporal changes can be used to monitor the patients' response to the treatment and to inform the decision for discharge. 97 Lung ultrasound has also been suggested as the imaging tool for COVID-19 pandemic in disease triage, diagnosis, prognosis, severity assessment, and monitoring the treatment response. 13 Radiomics is high-throughput extraction of a large number of quantitative features from medical images. 103 By integrating quantitative image features with clinical, pathological, and genomic data, i.e., radiogenomics, we will be able to study the relationship or association between imaging-based phenotypes and genomic biomarkers. This would potentially allow us to discover radiomic features that can be used as diagnostic or prognostic biomarkers for monitoring the patients, assessing response to the treatment, and predicting the risk of recurrence. Studying the association between image-based phenotypes and underlying genetic mechanisms 104 may help better understand the disease mechanism and potentially precision medicine of COVID-19 or similar diseases in the future. However, small datasets will be the challenge for radiogenomic studies since there are only limited cases with comprehensive data, including imaging, clinical, pathological, genomic, and other-omic data available for this relatively new disease. In addition, the batch effect has to be considered with diverse data with both imaging and genomic data. Data harmonization/calibration has to be performed prior to any radiogenomic analysis. Moreover, some data resources such as lung pathology images are limited to postmortem evaluations. 105 Nevertheless, radiogenomics is a valuable tool for linking imaging finding to the underlying biological mechanisms of disease. Medical imaging data are well structured and stored with an industry standard, such as DICOM format. However, other required information, including clinical records, genomics, and pathological data, which are required to carry out scientific investigation and answer relevant clinical questions, is not as well structured as medical images. Major efforts are needed to extract relevant information from these different sources to solve clinical questions and advance the knowledge to better understand the disease. The preprocessing, such as data aggregation and data harmonization, is needed for these not-well-structured data to be suitable for use in the subsequent AI development. 106 Medical imaging data, even if in DICOM format, may still need to be harmonized since imaging data collected from different institutions may be acquired using different acquisition protocols and using machines from different manufacturers. Pathology slides have varying staining protocols and imaging formats; however, advances in digital pathology promise to alleviate many of these issues. 107 Genomic data from different sources may need to be harmonized/corrected to minimize the batch effect. Natural language processing algorithms may be needed to efficiently process unstructured clinical records. 108 All of these preprocessing and query procedures must be done prior to performing any prediction and association analysis to advance the knowledge of disease (Fig. 3 ). Though several recommendations have emerged to improve the reporting of AI for biomedical applications [109] [110] [111] [112] [113] and imaging specifically, 114 little have been yet developed on the career transition process across radiology, medical imaging physics, and computer science. We recognize the great potential of imaging and AI in a variety of applications related to COVID-19 (such as those discussed in Sec. 3), and three transitions were identified with the following precautions. 1. AI of non-imaging data to AI of medical imaging data • The imaging nature (multi-slice) may increase the risk of leakage between training and testing sets and should be cautiously addressed. • Though feature and model selections are similar across AI algorithms, given the large number of free and intercorrelated parameters in imaging applications, they would require additional safeguards to avoid bias. • Imaging tasks can share similar synonymous terminology that would require comprehensive understanding for successful implementation. This also needs to be addressed in the development and preparation of the corresponding training datasets. • The nature of medicine is that its data continue to evolve and change over time, which imposes further needs for continuous learning protocols. • The application of pretrained models (transfer learning) needs to be consistent and informed by the clinical task at hand. • Packaged AI and texture analysis, though they are handy, would require training before their proper use in clinical applications. • AI systems are prone to overfitting and bias; therefore, dedicated training and consultation with experts is helpful to ensure generalizability and reproducibility. 3. AI of medical imaging to AI of COVID-19 imaging • COVID-19 is a complex disease and observer variability should be considered in the application of AI algorithms. • Understanding COVID-19 imaging protocols is needed to mitigate acquisition inconsistencies and to properly harmonize data. There are many different COVID-19 datasets that are available; however, little standardization had gone into them, which would require proper preprocessing for successful application to yield a reproducible AI model. M. L. G. is a stockholder in R2 Technology/Hologic and QView, receives royalties from Hologic, GE Medical Systems, MEDIAN Technologies, Riverain Medical, Mitsubishi, and Toshiba, and was a cofounder in Quantitative Insights (now consultant to Qlarity Imaging). H. L. receives royalties from Hologic. It is the University of Chicago Conflict of Interest Policy that investigators disclose publicly actual or potential significant financial interest that would reasonably appear to be directly and significantly affected by the research activities. I. E. N. is a deputy editor for Medical Physics and reports a relationship with Scientific Advisory Endectra, LLC. Emerging lessons from COVID-19 for the US clinical research enterprise COVID-19 in 2021-continuing uncertainty Calling for benefit-risk evaluations of COVID-19 control measures Most patients hospitalized with COVID-19 have lasting symptoms The potential future of the COVID-19 pandemic: will SARS-CoV-2 become a recurrent seasonal infection? 2020 revealed how poorly the US was prepared for COVID-19-and future pandemics Portable chest x-ray in coronavirus disease-19 (COVID-19): a pictorial review Frequency and distribution of chest radiographic findings in patients positive for COVID-19 Chest CT in COVID-19: what the radiologist needs to know Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study Relation between chest CT findings and clinical conditions of coronavirus disease (COVID-19) pneumonia: a multicenter study Diagnosing COVID-19 pneumonia in a pandemic setting: lung ultrasound versus CT (LUVCT)-a multicentre, prospective, observational study Is there a role for lung ultrasound during the COVID-19 pandemic? Lung ultrasound score predicts outcomes in COVID-19 patients admitted to the emergency department Lung ultrasound findings in patients with coronavirus disease (COVID-19) RSNA international trends: a global perspective on the COVID-19 pandemic and radiology in late 2020 Imaging of coronavirus disease (COVID-19), a pictorial review Chest CT in patients with COVID-19: toward a better appreciation of study results and clinical applicability Automated assessment of COVID-19 reporting and data system and chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence Sensitivity and specificity of chest computed tomography scan based on RT-PCR in COVID-19 diagnosis CT-derived chest muscle metrics for outcome prediction in patients with COVID-19 Chest radiography features help to predict a favorable outcome in patients with coronavirus disease 2019 Visualization of SARS-CoV-2 in the lung Debate of chest CT and RT-PCR test for the diagnosis of COVID-19 Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: value of artificial intelligence Artificial intelligence augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other origin at chest CT The potential of artificial intelligence to analyze chest radiographs for signs of COVID-19 pneumonia DeepCOVID-XR: an artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large US clinical dataset Artificial intelligence of COVID-19 imaging: a hammer in search of a nail A retrospective cohort study of 12,306 pediatric COVID-19 patients in the United States Pediatric chest x-ray in covid-19 infection Artificial Intelligence: reshaping the practice of radiological sciences in the 21st century Current limitations to identify COVID-19 using artificial intelligence with chest x-ray imaging Artificial intelligence in cancer imaging: clinical challenges and applications Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification Development of a fully cross-validated Bayesian network approach for local control prediction in lung cancer The principled control of false positives in neuroimaging A general introduction to adjustment for multiple comparisons A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion Deep learning in medical image analysis Understanding the mechanisms of deep transfer learning for medical images Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD)-discussion paper and request for feedback Continuous learning AI in radiology: implementation principles and early applications Approval policies for modifications to machine learning-based software as a medical device: a study of bio-creep Generalizable inter-institutional classification of abnormal chest radiographs using efficient convolutional neural networks Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study Discovery of a generalization gap of convolutional neural networks on COVID-19 x-rays classification Prospective clinical deployment of machine learning in radiation oncology Variation in algorithm implementation across radiomics software Breast image analysis for risk assessment, detection, diagnosis, and treatment of cancer The Image Biomarker Standardization Initiative: standardized quantitative radiomics for high-throughput image-based phenotyping A roadmap for foundational research on artificial intelligence in medical imaging: from the 2018 NIH/RSNA/ACR/the academy workshop Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data Statistical methods for assessing observer variability in clinical measures Assessing observer variability: a user's guide Analysis of interobserver and intraobserver variability in CT tumor measurements Inter-observer variability of manual contour delineation of structures in CT Intra-observer and interobserver variability of biventricular function, volumes and mass in patients with congenital heart disease measured by CMR imaging Intra-and interobserver variability in CT measurements in oncology Observer variability for classification of pulmonary nodules on lowdose CT images and its effect on nodule management Sensitivity of chest CT for COVID-19: comparison to RT-PCR Diagnostic performance of CT and reverse transcriptase polymerase chain reaction for coronavirus disease 2019: a meta-analysis RSNA expert consensus statement on reporting chest CT findings related to COVID-19: interobserver agreement between chest radiologists Diagnostic accuracy of North America Expert Consensus Statement on reporting CT findings in patients suspected of having COVID-19 infection: an Italian Single-Center Experience From community-acquired pneumonia to COVID-19: a deep learning-based method for quantitative analysis of COVID-19 on thick-section CT scans Lung ultrasound predicts clinical course and outcomes in COVID-19 patients Analyzing inter-reader variability affecting deep ensemble learning for COVID-19 detection in chest radiographs Evaluating Learning Algorithms: A Classification Perspective SMOTE: synthetic minority over-sampling technique CO-RADS: a categorical CT assessment scheme for patients suspected of having COVID-19-definition and evaluation Use of chest imaging in the diagnosis and management of COVID-19: a WHO rapid advice guide The role of CT in patients suspected with COVID-19 infection ACR recommendations for the use of chest radiography and computed tomography (CT) for suspected COVID-19 infection CT and COVID-19: Chinese experience and recommendations concerning detection, staging and follow-up International expert consensus statement on chest imaging in pediatric COVID-19 patient management: imaging findings, imaging study reporting, and imaging study recommendations The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society Radiological Society of North America expert consensus document on reporting chest CT findings related to COVID-19: endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA Imaging of coronavirus disease 2019: a Chinese expert consensus statement The effectiveness of data augmentation in image classification using deep learning Biomedical informatics and panomics for evidence-based radiation therapy Integrating multiomics information in deep learning architectures for joint actuarial outcome prediction in non-small cell lung cancer patients after radiation therapy Machine and deep learning methods for radiomics Machine learning and modeling: data, validation, communication challenges Radiation therapy outcomes models in the era of radiomics and radiogenomics: uncertainties and validation Fusion of quantitative image and genomic biomarkers to improve prognosis assessment of early stage lung cancer patients Predicting outcomes in glioblastoma patients using computerized analysis of tumor shape: preliminary data Validation of quantitative analysis of multiparametric prostate MR images for prostate cancer detection and aggressiveness assessment: a cross-imager study Computerized detection of colonic polyps at CT colonography on the basis of volumetric features: pilot study Chest radiographic and CT findings of the 2019 novel coronavirus disease (COVID-19): analysis of nine patients treated in Korea Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 COVID-19 and the cardiovascular system Role of standard and soft tissue chest radiography images in COVID-19 diagnosis using deep learning Severity assessment of COVID-19 using CT image features and laboratory indices The role of initial chest x-ray in triaging patients with suspected COVID-19 during the pandemic Cascaded deep transfer learning on thoracic CT in COVID-19 patients treated with steroids Chest CT findings in coronavirus disease-19 (COVID-19), relationship to duration of infection COVID-19 can affect the heart Brain and lung imaging correlation in patients with COVID-19: could the severity of lung disease reflect the prevalence of acute abnormalities on neuroimaging? A global multicenter observational study Neurologic manifestations of hospitalized patients with coronavirus disease How COVID-19 can damage the brain Radiomics: images are more than pictures, they are data Deciphering genomic underpinnings of quantitative MRI-based radiomic phenotypes of invasive breast carcinoma Pulmonary pathology and COVID-19: lessons from autopsy. The experience of European Pulmonary Pathologists Regulatory frameworks for development and evaluation of artificial intelligence-based diagnostic imaging algorithms: summary and recommendations Digital pathology: advantages, limitations and emerging perspectives Natural language processing in medicine: a review Reporting of artificial intelligence prediction models Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI steering group AI in medical physics: guidelines for publication Checklist for artificial intelligence in medical imaging (CLAIM), a guide for authors and reviewers This research was part of the Collaborative Research Project #10 of the Medical Imaging Data Resource Center and was made possible by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health, under Contract Nos. 75N92020C00008 and 75N92020C00021. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors also would like to thank Dr. Berkman Sahiner from the FDA for valuable discussions and suggestions.