key: cord-0903870-affb4yln authors: Jacob, Joseph; Alexander, Daniel; Baillie, J. Kenneth; Berka, Rosalind; Bertolli, Ottavia; Blackwood, James; Buchan, Iain; Bloomfield, Claire; Cushnan, Dominic; Docherty, Annemarie; Edey, Anthony; Favaro, Alberto; Gleeson, Fergus; Halling-Brown, Mark; Hare, Samanjit; Jefferson, Emily; Johnstone, Annette; Kirby, Myles; Mcstay, Ruth; Nair, Arjun; Openshaw, Peter J.M.; Parker, Geoff; Reilly, Gerry; Robinson, Graham; Roditi, Giles; Rodrigues, Jonathan C.L.; Sebire, Neil; Semple, Malcolm G.; Sudlow, Catherine; Woznitza, Nick; Joshi, Indra title: Using imaging to combat a pandemic: rationale for developing the UK National COVID-19 Chest Imaging Database date: 2020 journal: Eur Respir J DOI: 10.1183/13993003.01809-2020 sha: 38795ba78b5bceab7746eaf9f8f1444eb0bc04ae doc_id: 903870 cord_uid: affb4yln The National COVID-19 Chest Imaging Database (NCCID) is a repository of chest X-Ray, CT and MRI images and clinical data from COVID-19 patients across the UK, to support research and development of AI technology that may proffer insights into the disease. Since the emergence of the novel coronavirus SARS-CoV-2 in Wuhan, China in 2019(1), the resulting COVID-19 disease has rapidly transitioned into a global pandemic with over 31,000 deaths and 215,000 infections in the UK as of 10th May 2020 (2) . Confirmation of SARS-CoV-2 infection requires reverse-transcriptase polymerase chain reaction (RT-PCR) testing (3) . Chest radiographic and/or computed tomography imaging is also central to diagnosis and management (4) . The scale of the COVID-19 pandemic has resulted in the acquisition of huge volumes of imaging data. Traditionally, research using imaging data constituted collation of data within single hospitals or groups of hospitals at most. Endeavours on a local scale have the constraint that not all patient subgroups or disease manifestations might be captured in the collected data. It has long been recognised that there is an acute need to curate larger more comprehensive datasets to better understand a disease. COVID-19 has arrived in an era where advances in computational power, aligned with an increased availability of big data and the development of self-learning neural networks have begun to redefine research in medicine. In recent years computer algorithms trained on imaging data, widely available on the internet, have been adapted to the task of medical image analysis (5, 6) . For computer algorithms to be successfully applied to medical image analysis, it is imperative that they train on large volumes and representative examples of imaging data. These are typically orders of magnitude larger than traditional imaging research datasets, and beyond the capacity of traditional research e-infrastructure. The National Health Service in the United Kingdom has long sought national repositories of linked clinical and imaging data, which are essential for applications of artificial intelligence (AI) computing systems in healthcare. Historically, logistical barriers to this have seemed insurmountable and progress has been notoriously slow. Yet one aspect of the COVID-19 response has been the issue of a notice under Regulation 3(4) of the Health Service Control of Patient Information Regulations 2002 (COPI notice) in the UK which has temporarily eased data sharing restrictions to facilitate COVID-19specific public health research and scientific collaboration over the course of the emergency. The British Society of Thoracic Imaging research network(7) began a multicentred COVID-19 imaging study which grew into a partnership with NHSX to create the National COVID-19 Chest Imaging Database (NCCID)(8). NCCID has put in place mechanisms to collate all chest imaging and prespecified clinical data from every UK hospital where patients undergo a RT-PCR test for COVID-19. This will include all RT-PCR positive patients and a representative sample of RT-PCR negative patients. The study aims to identify information in COVID-19 imaging that may be inconspicuous to the human eye, but which is extractable by computer algorithms. Such buried information may allow the early identification of patients at risk of deterioration thereby anticipating future intensive care needs. AI algorithms may also demonstrate how comprehensive characterisation of COVID-19 may improve care/outcomes. The NCCID data and image transfer solutions are robust and secure, including those having been adapted from techniques tried and tested on numerous research studies involving large-scale medical image collection (9) . To maximise efficient resource utilisation in busy hospitals during the course of the pandemic, we are linking our imaging data to the ISARIC WHO Clinical Characterisation Protocol for Severe Emerging Infection UK (ISARIC CCP-UK)(10) and aim to link to ICNARC(11). ISARIC investigators are collating clinical information and biological samples for COVID-19 cases of all ages admitted to hospitals, whilst ICNARC collates detailed data from adults in the intensive care setting. The study has also been supported by Health Data Research UK as part of its UK response to COVID- An endeavour of this scale, rarely attempted before in the NHS, can only succeed if the radiology and scientific community contribute their time, effort and available data. ISARIC has open source processes and already has established data and material sharing from UK COVID-19 cases (12) . The NCCID initiative will create an open, well-governed database for researchers from academia and industry to add to COVID-19 knowledge collectively. Public, patient and professional trust is vital to NCCID and COVID-19 in general, and we consider accessibility and feedback on data uses as central to good governance. The NCCID data access committee aim to link to researchers who are asking related scientific questions, using complementary methodologies. NCCID will also set aside a portion of its data to enable validation of AI models. This will maintain the highest standards of governance throughout the emergency and allow technology developers to validate their algorithms promptly. 1) Chest Radiographs: The United Kingdom prioritised the use of chest radiographs in the clinical work-up of patients suspected of COVID-19 (13) . The choice of chest radiographs as the primary diagnostic imaging test was pragmatic given the reported limited sensitivity of CT imaging in COVID-19 diagnosis (14) . The choice also reflected concerns regarding seeding of infection via CT scanners contaminated with virus particles, as well as the limited numbers of CT scanners serving the UK population when compared with other European countries (15) . Accordingly, a large proportion of imaging in NCCID will consist of chest radiographs acquired at initial presentation of the patient to hospital and throughout the patients hospital stay (Figure 1 ). NCCID will collect chest radiographs in all RT-PCR COVID-19 positive patients in hospitals throughout the UK. In addition, a number of chest radiographs from RT-PCR COVID-19 negative patients will be collected from sites as a representative control population. 2) Computed tomography chest imaging: NCCID will collect all chest CT imaging in RT-PCR COVID-19 positive patients. This will include non-contrast enhanced chest CT imaging, CT pulmonary angiograms and CT coronary angiograms. 3) For all RT-PCR COVID-19 positive patients NCCID will acquire all chest imaging performed in the previous 3 years. For RT-PCR COVID-19 negative patients NCCID will acquire any chest imaging performed in the previous 4 weeks. Clinical data collected on RT-PCR COVID-19 positive patients will include demographic information, patient co-morbidity, smoking and medication histories, clinical observations, admission blood test results and outcomes including time of intensive care unit admission and survival. The imaging will link to other clinical databases such as ISARIC to allow analyses against more detailed clinical information. We anticipate that in May 2020 the NCCID data will be released to interested academic and commercial groups. Given that the NCCID will be of particular interest to AI researchers, a portion of the data will be segregated to allow rigorous and independent validation of AI models. Data access will be initiated through online applications to be filled in and submitted according to the instructions on the website of NCCID (8). Applications will then be assessed by a central data access committee comprising scientific advisors, technology advisors, information-governance advisors, patient/ethics advisors, and system advisors to evaluate the positive impact on the NHS overall. Successful applicants can access the data on a cloud-based Amazon S3 bucket, and transfer it on to their computing infrastructure, required to fulfil high standards of IT security. In these challenging times, research efforts necessary to better understand COVID-19 cannot afford to be fragmented and uncoordinated. Accelerating insights and discovery necessitates leveraging economies of scale and resource amongst researchers, institutions and companies. We believe NCCID will quickly improve COVID-19 understanding and patient care and build on early insights into COVID-19 (16) (17) (18) . The scale of the current pandemic requires the best minds, data, algorithms and research programmes to come together quickly. We are fortunate in the UK to have an abundance of internationally recognised researchers working on COVID-19. We hope that NCCID provides a common UK resource to fuel international efforts in tackling COVID-19 and allows better preparedness for similar future needs. Unsupervised deep learning methods could help identify discrete COVID-19 patient phenotypes that manifest differing disease courses. Image analysis may also allow prediction of those patients that are likely to manifest chronic multisystem sequelae following their acute infection, which may inform future resource prioritisation in healthcare systems. Clinical features of patients infected with 2019 novel coronavirus in Wuhan Strategies for the prevention and management of coronavirus disease 2019 An update on COVID-19 for the radiologist -A British society of Thoracic Imaging statement Clinically applicable deep learning for diagnosis and referral in retinal disease Dermatologist-level classification of skin cancer with deep neural networks A UK-wide British Society of Thoracic Imaging COVID-19 imaging repository and database: design, rationale and implications for education and research OPTIMAM Mammography Image Database: a large scale resource of mammography images and clinical data Open source clinical science for emerging infections. The Lancet Infectious diseases A British Society of Thoracic Imaging statement: considerations in designing local imaging diagnostic algorithms for the COVID-19 pandemic Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection Healthcare resource statistics -technical resources and medical technology Clinical and CT features of early stage patients with COVID-19: a retrospective analysis of imported cases in High-resolution computed tomography features of 17 cases of coronavirus disease 2019 in Sichuan province, China The clinical dynamics of 18 cases of COVID-19 outside of Wuhan, China