key: cord-0792322-3wt7jih4 authors: Sonnweber, T.; Tymoszuk, P.; Sachanic, S.; Boehm, A.; Pizzini, A.; Luger, A.; Schwabl, C.; Nairz, M.; Katharina, K.; Koppelstaetter, S.; Aichner, M.; Bernhard, P.; Egger, A.; Hoermann, G.; Ewald Woell, E.; Weiss, G.; Widmann, G.; Tancevski, I.; Loeffler-Ragg, J. title: Investigating phenotypes of pulmonary COVID-19 recovery: a longitudinal observational prospective multicenter trial date: 2021-06-25 journal: nan DOI: 10.1101/2021.06.22.21259316 sha: e5744dce830b5ae11c7543b4cece11de059cc136 doc_id: 792322 cord_uid: 3wt7jih4 Background: COVID-19 is associated with long-term pulmonary symptoms and may result in chronic pulmonary impairment. The optimal procedures to prevent, identify, monitor, and treat these pulmonary sequelae are elusive. Research question: To characterize the kinetics of pulmonary recovery, risk factors and constellations of clinical features linked to persisting radiological lung findings after COVID-19. Study design and methods: A longitudinal, prospective, multicenter, observational cohort study including COVID-19 patients (n = 108). Longitudinal pulmonary imaging and functional readouts, symptom prevalence, clinical and laboratory parameters were collected during acute COVID-19 and at 60-, 100- and 180-days follow-up visits. Recovery kinetics and risk factors were investigated by logistic regression. Classification of clinical features and study participants was accomplished by k-means clustering, the k-nearest neighbors (kNN), and naive Bayes algorithms. Results: At the six-month follow-up, 51.9% of participants reported persistent symptoms with physical performance impairment (27.8%) and dyspnea (24.1%) being the most frequent. Structural lung abnormalities were still present in 45.4% of the collective, ranging from 12% in the outpatients to 78% in the subjects treated at the ICU during acute infection. The strongest risk factors of persisting lung findings were elevated interleukin-6 (IL6) and C-reactive protein (CRP) during recovery and hospitalization during acute COVID-19. Clustering analysis revealed association of the lung lesions with increased anti-S1/S2 antibody, IL6, CRP, and D-dimer levels at the early follow-up suggesting non-resolving inflammation as a mechanism of the perturbed recovery. Finally, we demonstrate the robustness of risk class assignment and prediction of individual risk of delayed lung recovery employing clustering and machine learning algorithms. Interpretation: Severity of acute infection, and systemic inflammation is strongly linked to persistent post-COVID-19 lung abnormality. Automated screening of multi-parameter health record data may assist the identification of patients at risk of delayed pulmonary recovery and optimize COVID-19 follow-up management. Clinical Trial Registration: ClinicalTrials.gov: NCT04416100 The ongoing COVID-19 pandemic challenges health care systems worldwide. As of June 2021, the John Hopkins dashboard 1 reports 178 million global cases and 3.8 million COVID-19-related deaths 2 . Although the vast majority of COVID-19 patients display mild disease, approximately 10-15% of cases progress to a severe condition and approximately 5% suffer from critical illness 3, 4 . Similar to severe acute respiratory syndrome (SARS), a significant portion of COVID-19 patients report lingering or recurring clinical impairment and cardiopulmonary recovery may take several months to years [5] [6] [7] [8] [9] [10] [11] . This observation has led to the introduction of the term 'long COVID', defined by persistence of COVID-19 symptoms for more than four weeks, and the 'post-COVID-19 syndrome' referring to symptom persistence for more than twelve weeks 12, 13 . Evidence-based strategies for prediction, monitoring and treatment of post-acute COVID-19 sequelae are urgently needed. We herein prospectively analyzed prevalence of non-resolving lung abnormalities, risk factors and clinical feature sets associated with delayed pulmonary recovery during the first six months of COVID-19 convalescence and tested whether a multi-parameter machine learning approach may help discerning subjects at risk of persistent lung damage. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06. 22 .21259316 doi: medRxiv preprint The CovILD ("Development of interstitial lung disease in COVID-19") study 5 was initiated in April 2020. Adult residents of Tyrol, Austria, with typical clinical presentation and a positive SARS-CoV-2 PCR test from a nasal or oropharyngeal swab 14 outbreaks, the regional health system was able to guarantee the best standard of care including intensive therapy and mechanical ventilation if necessary. None of the participants received corticosteroids as a therapy of the acute infection. In total 190 COVID-19 patients were screened for study participation. N = 18 subjects denied to give an informed consent, N = 27 were declared difficulties to appear at the study follow-ups. Of the 145 enrolled participants, 37 were excluded from analysis due to an incomplete data record precluding classification analyses (Supplementary Figure S1 ). All participants gave written informed consent. The study was approved by the institutional review board at the Medical University of Innsbruck (approval number: 1103/2020), and registered at ClinicalTrials.gov (NCT04416100). We retrospectively assessed patient characteristics during acute COVID-19 and performed followup investigations at 60 days (63 ± 23 days (mean ± SD); visit 1), 100 days (103 ± 21); visit 2) and 180 days (190 ± 15; visit 3) after the diagnosis of COVID-19. Each visit included clinical examination, assessment of symptoms and performance status with a standardized questionnaire, lung function testing, capillary blood gas analysis, trans-thoracic echocardiography, and low-dose computed tomography (CT) scan of the chest. CT scans were evaluated for the presence of groundglass opacities (GGO), consolidations, bronchial dilation, and reticulations as defined by the Fleischner society. Lung findings were graded with a CT severity score (0-25 points), as previously published 5 . Lung function impairment was defined by at least one of the following: (1) forced vital capacity (FVC) < 80% predicted, (2) forced expiratory volume in 1 second (FEV 1 ) < 80% predicted, 5 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) FEV1:FVC <70% predicted, total lung capacity (TLC) < 80% predicted or diffusing capacity of carbon monoxide (DLCO) < 80% predicted. The recorded laboratory parameters encompassed blood hemoglobin, ferritin, C-reactive protein (CRP), interleukin-6 (IL6), N-terminal pro natriuretic peptide (NT-proBNP), D-dimer and anti-S1/S2 protein SARS-CoV2 immunoglubulin gamma (anti-S1/S2 IgG). The full list of variables with stratification scheme and procedure details are provided in Supplementary Table S1and Supplementary Methods. Statistical analyses were performed with R version 4.0.3 as presented in Supplementary Figure S1 . Kinetics of symptom and radiological lung finding resolution were assessed with mixed-effect logistic regression 15 . Risk factor modeling was performed with fixed-effect logistic regression. Clustering of binary clinical features and of study participants was analyzed with the k-means algorithm 16 . Prediction of lung lesions by distance weighted kNN 17 and naive Bayes 18 algorithms was tested in 200 random training/test subset splits of the cohort data (training n = 80, test n = 28). P values were corrected for multiple comparisons by Benjamini-Hochberg method 19 , effects were termed significant for p < 0.05. Details of statistical analysis are provided in Supplementary Methods. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 25, 2021. The CovILD study cohort subset included in the current report (n = 108) predominately consisted of males (54.6%), and participants were aged between 19 to 87 years (Table 1) . Most participants displayed preexisting comorbidity (79.6%), predominantly cardiovascular and metabolic diseases. The cohort included patients with mild (outpatient care, n = 26, 24.1%), moderate (hospitalization without oxygen supple, n = 31, 28.7%), severe (hospitalization with oxygen supply, n = 33, 30.6%), and critical (ICU treatment, n = 18, 16.7%) acute COVID-19. During 180 days of COVID-19 convalescence, most patients, irrespective of the severity of the acute infection, demonstrated a significant contraction of the surveyed disease symptoms ( Figure 1A ). Nevertheless, 180 days after the disease onset, 51.9% of the study subjects still reported COVID-19-related complaints, with self-reported impaired physical performance (27.8%) and exertional dyspnea (24.1%) being the most frequent ( Figure 1B) . Prevalence of all investigated symptoms except for sleep disorders declined significantly, even though the pace of their resolution was remarkably slower in the late (100-and 180-day follow-ups) than in the early-recovery phase (till 60-day follow-up). Impairment of lung function could be discerned in 33% of the entire cohort. Remarkably, except for the critical acute COVID-19 subjects (60 days: 72%, 180 days post-COVID-19: 53%), no significant reduction of the functional lung impairment prevalence was observed ( Figure 2 ). Abnormal structural lung findings were still found in 45.4% of patients and moderate-to-severe radiological lung alterations (CT severity score > five points) were present 20% of participants. Interestingly the radiological lung findings demonstrated only weak co-occurrence (less than 50%) with the impaired lung function at all post-COVID-19 visits (Supplementary Figure S2 ). As expected, the prevalence and recovery of CT lung findings were related to the severity of acute infection. The highest prevalence of any abnormalities, GGO and lesions scored above five CT severity points at the 180-day follow-up was observed in the individuals with severe and critical acute disease ( Figure 2) . Notably, the hospitalized group with oxygen therapy demonstrated the fastest recovery kinetics (91% and 52% subjects with any abnormalities at the 60 and 180-day visit, respectively). Furthermore, the remaining severity strata showed only a minor drop in the lung 7 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.22.21259316 doi: medRxiv preprint finding prevalence between the day 100 and day 180 visits, in particular regarding the moderate-tosevere pulmonary lesions ( Figure 2 ). To search for risk factors associated delayed pulmonary recovery, we screened a set of 50 binary demographic, clinical and biochemical parameters recorded during the acute SARS-CoV2 infection and at the 60-day visit (Supplementary Table S2 ) for correlation with the three readouts of persistent lung abnormalities at the six-month follow-up (Supplementary Table S2 ). Among the candidate features, 22 were significantly associated with both the risk of any radiological lung abnormality and GGOs, and only eight were linked to moderate-to-severe CT To discern constellations of non-CT parameters linked to protracted lung recovery, we subjected the initial variable pool and the readouts of persistent lung abnormality to unsupervised clustering analysis. By this means, four clusters of clinical features were identified ( Figure 4A Table S3) . Surprisingly, whereas any CT lung abnormality and GGOs were assigned to one common cluster (Cluster #3), CT pathology scored above five severity points was associated with a separate set of co-occurring features (Cluster #4) ( Figure 4A ). The ten closest cluster neighbors of any nonresolving radiological lung findings and GGOs included anti-S1/S2 IgG above the cohort median as a readout of the anti-viral immune response strength, elevated D-dimer as a marker of coagulation dysfunction and microvascular injury determined at the early follow-up together with prolonged hospitalization, oxygen and anti-infective therapy during acute COVID-19, male sex, multimorbidity, cardiovascular disease and metabolic disorders ( Figure 4B ). In turn, more severe pulmonary pathology was closely linked to elevated markers of inflammation (IL6, CRP) and anemia at the early follow-up, along with hallmarks of critical severity of acute COVID-19 such as anti-coagulative/anti-platellet therapy and ICU stay. Furthermore, long-term immunosuppressive 8 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. treatment, immunodeficiency, chronic kidney disease and history of smoking or COPD were also tightly linked to non-resolving moderate-to-severe CT lesions ( Figure 4B ). Next, we sought to define a subset of subjects at risk of the delayed pulmonary recovery with a similar unsupervised clustering procedure applied to the study participants. For clustering, we used the set of non-CT 50 clinical features used in the risk analysis and, subsequently, investigated the prevalence of lung abnormalities at the 180-day follow-up in the participant subsets. By this approach, three sub-populations could be discerned, termed further 'low-', 'intermediate-' and 'high-risk' subsets ( Figure 5 , Supplementary Figure S3B ). In sum, 23 clustering factors were significantly more frequent in the intermediate-or high-risk group than in the low-risk subsets, including primarily readouts of the anti-SARS-CoV2 immunity (anti S1/S2 IgG), disease severity (hospitalization, oxygen therapy and ICU stay, antibiotic therapy, weight loss), multi-morbidity (more than three comorbidities, CVD, hypertension, metabolic disorders), impaired lung function, age and male sex together with polysmptomatic acute COVID-19 (more than six symptoms, cough, fever). Interestingly, participants older than 65, suffering from hypercholesterolemia, males and those with functional lung impairment, elevated NT-proBNP and D-dimer levels at the 60-day follow-up were over-represented in the high-risk compared with the intermediate-risk group. In turn, polysymptomatic acute COVID-19 was more specific for the intermediate risk subset ( Figure 6A ). Most importantly, any persistent CT lung abnormalities and GGO were significantly more prevalent in the intermediate-and high-risk that in the low-risk subset ( Figure 6B ). In turn, CT lung lesions scored more than five severity points were significantly enriched only in the high-risk subset and displayed comparable low prevalence in the remaining sub-populations ( Figure 6B Figure S4A) . However, adjustment of the risk modeling for the hospitalization/ventilation and ICU status had no substantial effect on the prediction of persistent lung abnormalities by the risk subset assignment (Supplementary Figure S4B) . Finally, given the significant differences in pulmonary recovery between the participant risk subsets, we asked if the long-term lung abnormality could be reliably predicted based solely on the non-CT parameters available till the 60-day follow-up. To this end, we applied two simple machine 9 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.22.21259316 doi: medRxiv preprint learning procedures, k-nearest neighbor (kNN) 17 and naive Bayes algorithm 18 to the pool of non-CT 50 binary clinical features and multiple random splits of the study cohort into the training and test data sets. This approach differentiated between the complete pulmonary recovery and presence of any CT lung lesions at the 180-day follow-up with an accuracy exceeding 70% (kNN: 71% for kNN, naive Bayes: 75%) and sensitivity ranging from 63% (naive Bayes) to 75% (kNN). Of note, similar prediction quality was achieved at detection of persistent GGOs by both tested procedures. In contrast, only the naive Bayes algorithm succeeded at predicting the less frequent moderate-tosevere lung lesions (accuracy: 64%, sensitivity: 71%) whereas the kNN procedure failed to identify the majority of them (sensitivity: 33%) (Figure 7 , Supplementary Table S4 ). Importantly, the investigated procedures efficiently identified the non-resolving pulmonary findings both in the mild-to-moderate (outpatients and inpatients without oxygen) and severe-to-critical (ventilated inpatients and ICU) participant subsets, even though the specificity varied between the procedures and CT abnormality readouts (Supplementary Figure S5 and S6). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. A reported previously 5,10 , we could discern a measurable to complete pulmonary recovery assessed by lung function or CT at the six-month follow-up even if the acute disease was severe or critical. However, persistent COVID-19 symptoms and structural lung abnormalities were detected in more than 40% and reduced lung function in approximately one-third of the participants. Furthermore, we could observe a deceleration of the pulmonary and symptom recovery between the three-and sixmonth assessments, which may point toward chronicity of the post-COVID-19 pulmonary damage. Notably, such prolonged structural and functional lung recovery in a time window of two to five years after acute disease was reported for SARS 9 Employing an in-depth clustering, we could in addition observe distinct patterns of clinical features co-occurring with any CT lung abnormality, GGOs and moderate-to-severe lung lesions at the sixmonth follow-up. Interestingly, high anti-S1/S2 IgG and D-dimer levels at the two-month follow-up but not the inflammatory markers were closely associated with the first two abnormality readouts. This let us speculate that the magnitude of the adaptive anti-viral immunity and organ damage without systemic inflammatory background are drivers of mild and moderate long-term lung abnormalities frequently observed in our cohort. In turn, CT lesions graded over five severity points were primarily associated with elevated IL6, CRP and inflammatory anemia 28 during early convalescence, smoking and COPD. Thus, genesis and persistence of less frequent moderate-tosevere pulmonary lesions may additionally require an interplay between strong, prolonged inflammation and pre-existing lung injury. With a similar classification technique based on non-CT clinical features of acute COVID and early recovery we could characterize three sub-populations of convalescents significantly differing in the overall prevalence and severity of CT lung findings at the six-month follow-up. Importantly, although multiple readouts of the COVID-19 severity were implicitly included in the clustering algorithm, the intermediate-or high-risk subset assignment remained strongly predictive of longterm lung abnormality even upon adjustment for the hospitalization/ventilation and ICU status. This underlines further the vital importance of the parameters not directly connected to the severity of acute infection, e. g. ongoing inflammation or pre-existing lung injury for the comprehensive risk assessment. Finally, in addition to the unsupervised clustering, we demonstrate the utility of two technically unrelated machine learning procedures, kNN 17 and naive Bayes 18 , at assessing the individual risk of a perturbed recovery based solely on non-CT readouts available till the two-month follow-up. Despite lacking optimization of the variable pool and small study cohort sub-populations as model training sets, high prediction correctness was achieved for any CT lung findings and GGOs at the 180-day visit by both procedures. In turn, only the naive Bayes algorithm inherently more sensitive 12 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.22.21259316 doi: medRxiv preprint towards rare events succeeded at identification of the less frequent moderate-to severe lesions. In clinical practice, such automated procedures provided with a larger multi-center training data set may pose inexpensive, fast and reliable tools for screening e. g. medical records for the COVID-19 convalescents at risk of poor pulmonary recovery requiring a denser follow-up and lung imaging. Recently, a similar approach was proposed for monitoring COVID-19 patients for the need of respiratory support 29 . Our study bears limitations primarily concerning the low sample size and the cross-sectional character of the trial. Furthermore, data incompleteness and selection bias linked to disease severity (e. g. mild cases were not subjected to CT scans during acute COVID-19) resulted in a considerable dropout rate and potentially confounded the clustering and risk prediction analyses. Additionally, the candidate risk factors and the risk-assessment algorithms of perturbed pulmonary recovery presented here call for verification in a larger, independent multi-center collective of COVID-19 convalescents. In summary, we herein present a comprehensive description of the resolution of symptoms and structural pulmonary abnormalities in the first 6 months of COVID-19 convalescence. We report a high frequency of lung abnormalities and symptoms present in almost half of the studied population and a flattened recovery kinetics after three-months post-COVID-19. Systematic risk modeling and clustering analysis reveled a set of clinical variables linked to protracted recovery apart from the severity of acute infection such as inflammatory markers, anti-S1/S2 IgG, multi-morbidity, and male sex. Of practical importance, we demonstrate that automated classification algorithms may help to identify individuals at risk of persistent lung lesions and relocate resources to prevent longterm disability. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We acknowledge the commitment of the staff and providers of our institutions through the COVID-19 crisis and the suffering and loss of our patients as well as their families. 14 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.22.21259316 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; (12) 17 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; Percentages of any symptoms present in the study cohort stratified by the severity of acute disease (A) and particular symptom frequencies in the entire cohort (B) were calculated. Statistical significance was assessed by mixed-effect logistic regression and p values obtained by LRT test. In (B) separate models were fitted to each severity group. P values were corrected for multiple comparisons by Benjamini-Hochberg method. Outpatient: n = 26, hospitalized without oxygen: n = 31, hospitalized with oxygen: n = 33, ICU: n = 18, entire cohort: n = 108. Percentages of subjects with any lung lesions detected by CT, GGOs, CT lung abnormalities scored > five severity points and functional lung impairment in the study cohort stratified by the severity of acute disease were calculated. Statistical significance was assessed by mixed-effect logistic regression and p values obtained by LRT test. Separate models were fitted to each severity group. P values were corrected for multiple comparisons by Benjamini-Hochberg method. Outpatient: n = 26, hospitalized without oxygen: n = 31, hospitalized with oxygen: n = 33, ICU: n = 18, entire cohort: n = 108. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. ; points at the 180-day follow-up was investigated by univariate logistic regression. Statistical significance of OR estimates was assessed by Wald Z test, p values were corrected for multiple comparisons by Benjamini-Hochberg method. Points with whiskers represent OR with 95% CI, point color codes for significance and the correlation sign. Dashed lines represent OR = 1. N = 108. V0: acute COVID-19, V1: 60-day follow-up, V3: 180-day follow-up. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Current smoker Anti-coagulatives @V0 Immune deficiency Iron deficiency @V1 COPD CKD Asthma Anemia @V1 Elevated IL6 @V1 Anti-platellet @V0 Malignancy Diabetes GITD Elevated CRP @V1 ICU @V0 Immunosuppression @V0 Elevated ferritin @V1 Obesity CT Severity Score @V3 > 5 Hypercholesterolemia Anti-S1/S2 IgG > 75 perct @V1 PD Over 6 symptoms @V0 Age over 65 Hypertension Sleep disorders @V0 Over 3 comorbidities Elevated NTproBNP @V1 Lung function impairment @V1 GGOs @V3 GI symptoms @V0 CVD Ex-smoker Anosmia @V0 Metabolic disorders Elevated D-dimer @V1 CT abnormalities @V3 Anti-S1/S2 IgG > 50 perct @V1 Oxygen therapy or ICU @V0 Over 7 days hospitalized @V0 Male sex Pain @V0 Anti-infectives @V0 Night sweat @V0 Overweight or obesity Cough @V0 Weight loss @V0 Dyspnoe @V0 Fever @V0 Hospitalized @V0 Persistent symptoms @V1 Any comorbidity Impaired performance @V0 absent present B . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 25, 2021. Study participants (n = 108) were subjected to k-means clustering in respect to 50 non-CT binary clinical features recorded at the onset and the 60-day follow-up and Jaccard distance measure. Three separate participant subsets were identified termed 'Low-', 'Intermediate-' and 'High-Risk Subset'. V0: acute COVID-19, V1: 60-day follow-up, V3: 180-day follow-up. (A) Cluster assignment plot. For visualization, the distance matrices were subjected to threedimensional MDS (multi-dimensional scaling). Each point represents a single study participant, color codes for the cluster assignment. (B) Presence or absence of the clustering features in the participants assigned to the low, intermediate and high risk subsets. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.22.21259316 doi: medRxiv preprint Study subjects were assigned to the low-, intermediate-and high-risk subsets by k-means clustering as presented in Figure 5 . Differences in prevalence of clinical features between the intermediate-or high-risk subset and the low-risk subset were modeled with logistic regression. Statistical significance of OR estimates was assessed by Wald Z test, p values were corrected for multiple comparisons by Benjamini-Hochberg method. V0: acute COVID-19, V1: 60-day follow-up, V3: 180-day follow-up. (A) Points with whiskers represent OR with 95% CI, point color codes for significance and the correlation sign. Dashed lines represent OR = 1. (B) Prevalence of any lung abnormalities, GGOs and lesions graded > five severity points in CT at the 180-day follow-up in the low, intermediate and high-risk subsets. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Ability to predict any CT lung lesions, GGOs and lesions graded > five severity score at the 180day follw-up (V3) based on the set of 50 non-CT binary clinical parameters recorded during acute COVID-19 and the 60-day follow-up was tested with the distance-weighted k-nearest neighbors (kNN) (k = 5, Jaccard distance between the subjects, random tie resolution) and naive Bayes algorithms. Correct prediction rate, sensitivity and specificity of the algorithm were assessed with 200 random training/test splits of the initial data set (training: n = 80, test = 28). The significance of the correct prediction rates, sensitivity and specificity versus random predictions was determined by the Mann-Whitney U test. Each point represents a single training/test split analyzed, diamonds with whiskers represent expected values (median), 2.5% and 97.5% percentile of the statistic. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.22.21259316 doi: medRxiv preprint An interactive web-based dashboard to track COVID-19 in real time COVID-19 Map -Johns Hopkins Coronavirus Resource Center Serology-informed estimates of SARS-CoV-2 infection fatality risk in Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Cardiopulmonary recovery after COVID-19 -an observational prospective multi-center trial Functional characteristics of patients with SARS-CoV-2 pneumonia at 30 days post-infection Abnormal pulmonary function in COVID-19 patients at time of hospital discharge Persistent symptoms in patients after acute COVID-19 The long-term impact of severe acute respiratory syndrome on pulmonary function, exercise capacity and health status 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study Mental morbidities and chronic fatigue in severe acute respiratory syndrome survivors long-term follow-up Managing the long term effects of covid-19: Summary of NICE, SIGN, and RCGP rapid guideline Overview | COVID-19 rapid guideline: managing the long-term effects of COVID-19 | Guidance | NICE Fitting linear mixed-effects models using lme4 Algorithm AS 136: A K-Means Clustering Algorithm Pattern recognition and neural networks Naïve bayes classification in R Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Attributes and predictors of long COVID NICE guideline on long COVID The prevalence of long COVID symptoms and COVID-19 complications -Office for National Statistics COVID-19 interstitial pneumonia: monitoring the clinical course in survivors Radiologic outcomes at 5 years after severe ARDS Quality of life, pulmonary function, and tomographic scan abnormalities after ARDS Medium-term impact of COVID-19 on pulmonary function, functional capacity and quality of life risk factors, and biomarkers in systemic sclerosis with interstitial lung disease Persisting alterations of iron homeostasis in COVID-19 are associated with non-resolving lung pathologies and poor patients' performance: a prospective observational cohort study Symptom clusters in COVID-19: A potential clinical prediction tool from the COVID Symptom Study app Radial plots displaying 10 nearest cluster neighbors for any lung lesions and GGOs (cluster 3) and of CT lung lesions graded > five severity points