key: cord-0979363-73su71zn authors: Chiu, W. H. K.; Poplavskiy, D.; Zhang, S.; Yu, P. L. H.; Kuo, M. D. title: Dynamic Prediction of SARS-CoV-2 RT-PCR status on Chest Radiographs using Deep Learning Enabled Radiogenomics date: 2021-01-15 journal: nan DOI: 10.1101/2021.01.10.21249370 sha: 40f32d96de31a86adf6f46b67c01ebc533809bbd doc_id: 979363 cord_uid: 73su71zn Reverse Transcription-Polymerase Chain Reaction (RT-PCR) is the gold standard for diagnosis of SARS-CoV-2 infection, but requires specialized equipment and reagents and suffers from long turnaround times. While valuable, chest imaging currently only detects COVID-19 pneumonia, but if it can predict actual RT-PCR SARS-CoV-2 status is unknown. Radiogenomics may provide an effective and accurate RT-PCR-based surrogate. We describe a deep learning radiogenomics (DLR) model (RadGen) that predicts a patient's RT-PCR SARS-CoV-2 status solely from their frontal chest radiograph (CXR). The RadGen architecture, based on SE-ResNeXt-50-32x4d, was pretrained on ImageNet and ChestX-ray14 and 28,430 CXR from PadChest, and Kaggle before fine-tuned using CXR from a multinational cohort of RT-PCR tested patients from Hong Kong, GITHUB, SIRM and BIMCV (6,326 images) [3] [4] [5] . The model first predicted and selected only frontal CXR images, then predicted a segmentation mask of the cropped lung areas to reduce model fitting to unrelated parts of the image before using the segmented area as input for the RT-PCR SARS-CoV-2 binary classification task. The final prediction score was an ensemble consisting of the average of 4 models. RadGen time course analysis by autocorrelation function (ACF) plot, which describes how well RadGen predicts a patient's RT-PCR SARS-CoV-2 status over the course of their entire All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.10.21249370 doi: medRxiv preprint SARS-CoV-2 infection period, was performed revealing a peak lag at 2 days for radiogenomic signature manifestation on CXR after initial RT-PCR diagnosis. The per-film false negative rate was 10.0% (26/261) and 9.3% (21/225) within and after 7 days of the first RT-PCR positive test with a false positive rate of 68.1% (32/47) and 11.3% (45/397) within and after one week of achieving RT-PCR confirmed viral clearance (Fig 1) . Leveraging a DLR strategy and a rich body of training datasets including Asian and Western countries (reflecting a diverse set of clinical containment protocols), a wide spectrum of clinical presentations including mild and asymptomatic disease, and a prospectively collected multi-timepoint RT-PCR SARS-CoV-2 positive patient cohort, we generated a DLR model capable of predicting a patient's RT-PCR status from CXR. Interestingly, we also show that RadGen can non-invasively 'track' RT-PCR SARS-CoV-2 status over the course of their infection, from diagnosis to viral clearance. A time-delayed correlation between RadGen and RT-PCR seen at the time of initial RT-PCR positivity and at the time of achieving RT-PCR viral clearance was observed. This is not unexpected as SARS-CoV-2 genomic dosage changes have been shown to take time to accumulate and be phenotypically reflected on a cellular, organ and systems level. Further, it is known that SARS-CoV-2 RNA can persist long after active infectivity and symptom resolution 6 ; thus, that RadGen performs this well, particularly in a mild/asymptomatic cohort, is notable. In conclusion, the feasibility for DLR models to dynamically track RT-PCR SARS-CoV-2 changes on an individual level significantly expands the scope of radiogenomics. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.10.21249370 doi: medRxiv preprint Behind the Numbers: Decoding Molecular Phenotypes with Radiogenomics-Guiding Principles and Technical Considerations Decoding global gene expression programs in liver cancer by noninvasive imaging Detection of COVID-19 Using Deep Learning Algorithms on Chest Radiographs A large chest xray image dataset with multi-label annotated reports. arXiv e-prints All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted