key: cord-0784774-wtpj72oo authors: Bai, Harrison X.; Thomasian, Nicole M. title: RICORD: A Precedent for Open AI in COVID-19 Image Analytics date: 2021-01-05 journal: Radiology DOI: 10.1148/radiol.2020204214 sha: 3a1608ca86ea360a2d58ca1bf6a7419e2c8a826f doc_id: 784774 cord_uid: wtpj72oo nan continues to surge across the globe, recently eclipsing prior peak records in daily cases (1) . The finding that a novel pathogen known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes COVID-19 was the result of an open science endeavor to provide rapid sequencing of the viral genome (2) . Data sharing initiatives to foster accelerated COVID-19 research have since followed, running the gamut from open access academic journal resource hubs and bioinformatics consortiums to open partnerships in therapeutics discovery (3) . In recognizing the role of open data curation in supporting the global pandemic response, the RSNA, with partners at four international sourcing institutions and the National Institutes of Health (NIH), have developed the RSNA International COVID-19 Open Annotated Radiology Database (RICORD), presented by Tsai and Simpson et al in this issue of Radiology (4). The RICORD data set is the first public, expertly annotated data set for COVID-19 thoracic imaging that encompasses both multinational and multimodal data. The artificial intelligence (AI) community can leverage the robustness of the RICORD imaging data to accelerate advances in clinical diagnostics, prognostics, and management of SARS-CoV-2. Characterization of SARS-CoV-2 on chest images is ongoing, but evidence to date suggests that the syndrome possesses the radiographic qualities of an organizing pneumonia. Stereotyped imaging features on chest CT scans include ground-glass opacities in a peripheral distribution, often rounded with bi-or multilobar involvement (5, 6) . The field has already moved beyond early efforts in COVID-19 computer vision techniques that centered around obtaining an early diagnosis and indexing the severity of SARS-CoV-2 disease. At this stage, the primary utility for AI-based COVID-19 chest imaging applications is forecasting disease and monitoring therapy in correlation with clinical data (7) . RICORD is a groundbreaking undertaking in the promotion of quality and accessible imaging data. First, a lack of heterogeneous and high-volume data complicated early efforts to use machine learning to characterize SARS-CoV-2 on chest images. The diversity in the RICORD images answers this need for better data generalizability with its 240 chest CT and 1000 chest radiograph images across four international sites. RICORD also circumvents another critical barrier to the advancement of AI-driven bioinformatics research: a paucity of large, labeled data sets with open data use privileges. In particular, the lack of access to large volume data undermines the development of deep learning applications, but the recent resurgence of this access has enabled key advances in image interpretation. Until workaround data manipulation solutions for overfitting in low-volume settings like pretraining or data augmentation become routine, small cohort size will continue to impede deep learning model use. We suspect RICORD will be helpful to many different stakeholders within the machine learning community, as these stakeholders can tailor its use depending on their individual needs. For clinical radiology groups, RICORD will likely serve as a pristine external validation set to test AI algorithms developed on their respective multi-institutional data set. Other nonaffiliated clinicians or academics, such as computer scientists or engineers without easy access to clinical data repositories, can use RICORD thoracic imaging data for primary algorithm development. Beyond advancing AI data quality, RICORD also promotes consensus methodologies in image preprocessing and database curation. For example, RICORD features a harmonized annotation schema collated by the RSNA, the Society of Thoracic Radiology, and the European Society This copy is for personal use only. To order printed copies, contact reprints@rsna.org of Medical Imaging Informatics. Future AI endeavors can leverage this new common syntax for machine learning, thereby reducing interinstitutional variability in local pipelines. Further, for uniform image acquisition and de-identification, all thoracic imaging data in RICORD is in the Digital Imaging and Communications in Medicine format, which is the international standard. Taken together, these consensus methodologies enhance data interoperability to increase the total pool of available data for easy extraction by the research community. The rapid deployment of safe and effective machine learning solutions is necessary to keep pace with the ever-shifting clinical and therapeutic needs of the evolving COVID-19 pandemic. One of the most time-consuming and resource-intensive aspects of machine learning algorithm development is image data preprocessing. Open data curation can ease some of the burden of preprocessing techniques by preventing the duplication of efforts within the machine learning community as it relates to data cleaning. RICORD is again distinguished here by its commitment to the utmost rigor in annotation practices. CT scans were annotated by six thoracic subspecialist radiologists, and chest radiographs were triple-annotated, with final adjudication by an experienced thoracic subspecialist (average 15 years of experience) in cases without a majority consensus. In this way, with its ready-to-use data that can be effortlessly siphoned into AI development pipelines, RICORD slashes the workload for researchers without sacrificing quality. Another strategy for optimizing open machine learning data curation is through cloud-based infrastructures that can provide ease of scaling and intuitive coupling to analytic pipelines. Poor data interoperability associated with conventional servers can manifest as poor image elasticity and compatibility. Migration to the cloud can circumvent this barrier to unlock earlier siloed data for use by the machine learning community. With the launch of the NIH's Imaging Data Commons in October 2020, efforts to converge The Cancer Imaging Archive repository with the cloud are already underway (8) . The prospect of featuring RICORD on the NIH's emerging cloud-based Imaging Data Commons infrastructure would further improve the data archiving, exchange, viewing, latency, and distribution experience for developers. Moving forward, we expect that RICORD will enable powerful advances in computer vision applications for COVID-19 through links to clinical metrics, such as laboratory and outcome data. Machine learning algorithms can leverage the inclusion of longitudinal follow-up data to forecast SARS-CoV-2 disease progression and to support clinical trial monitoring of therapeutic candidates for COVID-19 (9) . The addition of more detailed clinical data to RICORD is coming soon in the form of the Medical Imaging and Data Resource Center, which forms a larger RSNA COVID-19 collaboration with partners at the American College of Radiology and the American Association of Physicists in Medicine. This next iteration would also be further strengthened by the addition of more data about the clinical distributions and characteristics of the SARS-CoV-2-negative cohort. RICORD lays the groundwork for future AI data sharing initiatives via a delineation of consensus methodologies in a superb public data set curation. The RICORD imaging initiative also fosters an ethos of collaboration and transparency in medicine that highlights the importance of open bioinformatics as a path to ethical AI. We envision that RICORD will not only power the development of machine learning algorithms in the context of COVID-19 but will also act as a future catalyst for the rapid deployment of AI solutions to meet future global health needs. An interactive web-based dashboard to track COVID-19 in real time A new coronavirus associated with human respiratory disease in China COVID-19 Resources The RSNA International COVID-19 Open Radiology Database (RICORD) Performance of Radiologists in Differentiating COVID-19 from Non-COVID-19 Viral Pneumonia at Chest CT CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV) CT quantification of pneumonia lesions in early days predicts progression to severe illness in a cohort of COVID-19 patients Imaging Data Commons Prognostic Value and Reproducibility of AI-assisted Analysis of Lung Involvement in COVID-19 on Low-Dose Submillisievert Chest CT: Sample Size Implications for Clinical Trials