key: cord-0989998-3nb1op6g authors: Orlando, Valentina; Rea, Federico; Savaré, Laura; Guarino, Ilaria; Mucherino, Sara; Perrella, Alessandro; Trama, Ugo; Coscioni, Enrico; Menditto, Enrica; Corrao, Giovanni title: Development and validation of a clinical risk score to predict the risk of SARS-CoV-2 infection from administrative data: a population-based cohort study from Italy date: 2020-07-23 journal: bioRxiv DOI: 10.1101/2020.07.23.217331 sha: e7537b3b7a898f28b4f4e1729263b7278603335f doc_id: 989998 cord_uid: 3nb1op6g Background The novel coronavirus (SARS-CoV-2) pandemic spread rapidly worldwide increasing exponentially in Italy. To date, there is lack of studies describing clinical characteristics of the population most at risk of infection. Hence, we aimed to identify clinical predictors of SARS-CoV-2 infection risk and to develop and validate a score predicting SARS-CoV-2 infection risk comparing it with unspecific surrogates. Methods Retrospective case/control study using administrative health-related database was carried out in Southern Italy (Campania region) among beneficiaries of Regional Health Service aged over than 30 years. For each subject with Covid-19 confirmed diagnosis (case), up to five controls were randomly matched for gender, age and municipality of residence. Odds ratios and 90% confidence intervals for associations between candidate predictors and risk of infection were estimated by means of conditional logistic regression. SARS-CoV-2 Infection Score (SIS), was developed by generating a total aggregate score obtained from assignment of a weight at each selected covariate using coefficients estimated from the model. Finally, the score was categorized by assigning increasing values from 1 to 4. SIS was validated by comparison with specific and unspecific predictors of SARS-CoV-2 infection. Results Subjects suffering from diabetes, anaemias, Parkinson’s disease, mental disorders, cardiovascular and inflammatory bowel and kidney diseases showed increased risk of SARS-CoV-2 infection. Similar estimates were recorded for men and women and younger and older than 65 years. Fifteen conditions significantly contributed to the SIS. As SIS value increases, risk progressively increases, being odds of SARS-CoV-2 infection among people with the highest SIS value (SIS=4), 1.74 times higher than those unaffected by any SIS contributing conditions (SIS=1). Conclusion This study identified conditions and diseases making individuals more vulnerable to SARS-CoV-2 infection. Our results are a decision-maker support tool for identifying population most at risk allowing adoption of preventive measures to minimize a potential new relapse damage. Introduction specific class of disease and/or one hospital discharge with the diagnoses coded with the specific 121 ICD-9-CM (S2 Table) . 122 Conditional logistic regression was used to estimate odds ratios (ORs), with 90% confidence 123 intervals (CIs), for the association between candidate predictors and the odds of SARS-CoV-2 124 infection. Predictors entered as dichotomous covariates into the model, i.e., with value 0 or 1 125 according to whether the specific condition was not or was recorded at least once within two-years 126 prior baseline (2018-2019). Unadjusted and mutually adjusted models were fitted by including one 127 by one covariate, and all covariates together, respectively. Power considerations suggested 128 excluding covariates with prevalence  0.12% among controls, i.e., predictors for which our 129 sample size was not sufficient for detecting OR of at least 3, with a 0.80 power, and by accepting 130 a 0.10 two-sided first type error. In addition, some conditions were grouped together when strong 131 uncertainty of algorithm did not allow for distinguishing them. 132 With the aim of testing the hypothesis that predictors may affect severity of clinical manifestations 133 of SARS-CoV-2 infection, rather than infection per se, analyses were restricted to strata having 134 fatal infection. Stratifications for sex and age categories (<65 years, ≥65 years) were performed as it by 10 and rounding it to the nearest whole number [31] . The weights thus obtained were then 144 summed to generate a total aggregate score. To simplify the system, i.e., with the aim of accounting 145 for excessive heterogeneity of the total aggregate score, the latter was categorized by assigning 146 increasing values of 1, 2, 3 and 4 to the categories of the aggregate score of 0, 1-2, 3-4, ≥ 5, 147 respectively. The so obtained index was denoted SARS-CoV-2 Infection Score (SIS). Performance of SIS was explored by applying the corresponding weights to the so-called 149 validation set consisting of the 1,048 1:5 case-control sets who did not enter into the training set. To evaluate the clinical utility of SIS for predicting infection, we considered the receiver operating 151 characteristic (ROC) curve analysis and used area under the ROC curve (AUC) as a global 152 summary of the discriminatory capacity of the scores [32] . 154 Some unspecific scores surrogating general clinical profile of each case and control 155 included into the study were considered. In particular, the number of drugs with different 3rd level 156 ATC dispensed to, and comorbidities with different ICD-9-CM experienced by each case and 157 control within two-years prior baseline (2018-2019) was recorded. Categorization was made by 158 assigning increasing values of 1, 2, 3 and 4 to 0, 1-4, 5-9 and ≥10 drugs (comedication score) and 159 1, 2, 3 and 4 to 0, 1-2 and ≥3 comorbidities (comorbidity score). In addition, cases and controls 160 were categorized according to the Multisource Comorbidity Score (MCS), a new index of patients' 161 clinical status derived from inpatients diagnostic information and outpatient drug prescriptions 162 provided by the regional Italian data and validated for outcome prediction [22, 33] . To simplify (Table 1) . In particular, patients suffering from diabetes, anaemias, mental disorders (dementia / Alzheimer's 183 disease, psychosis and anxiety), Parkinson's disease, glaucoma, diseases of the circulatory system 184 (heart failure and hypertension), chronic respiratory, inflammatory bowel, and rheumatologic 185 conditions showed statistical evidence of increased risk of infection with respect to patients who 186 did not suffer from them. Likely because of low power, only 7 conditions resulted significantly associated with the risk of fatal Covid-19 disease, but there was no relevant difference in the 188 estimates with respect to the risk of SARS-CoV-2 infection as a whole (Table 1) . 189 Same separate analysis was conducted for women and men positive to showing 190 statistical evidence of increased risk infection for women suffering from anaemias, 191 dementia/Alzheimer, psychosis, anxiety, epilepsy, hearth failure, kidney diseases and particularly 192 cystic fibrosis (S3 Table) . Otherwise, higher risk of infection was observed among men suffering Table) . Fifteen conditions significantly contributed to the SIS, the corresponding weights being reported in Table 2 . Factors which most contributed to the total aggregate score were dementia / Alzheimer's disease, kidney disease, psychosis, inflammatory bowel disease and rheumatologic conditions, while diabetes, anaemias, anxiety, Parkinson's disease, glaucoma, heart failure, hypertension, arrhythmia, thyroid disorders and chronic respiratory disease provided small, although significant, contributions. Generic/unspecific scores surrogating clinical profile showed to be associated with the risk of SARS-CoV-2 infection, showing patients with ≥ 10 drug treatments, those with ≥ 3 comorbidities, and those with MCS value ≥ 4, increased risk of 65%, 36% and 45% with respect to patients cotreatments, comorbidities and MCS value = I, respectively (Table 3) . AUC (90% CI) of SIS, cotreatment and comorbidity scores and MCS respectively had values of 0.54 (0.52 to 0.56), 0.52 (0.50 to 0.54), 0.53 (0.51 to 0.55), and 0.53 (0.51 to 0.55) (Fig 2) . There was no evidence that specific and unspecific scores had different discriminatory ability. Our study shows that several diseases and conditions are significantly and independently associated with the risk of SARS-CoV-2 infection. Beyond conditions making particularly vulnerable the respiratory system (e.g., chronic obstructive pulmonary disease and asthma), comorbidities including practically all diagnostic categories are involved. Predictors belonging to nutritional and metabolic (diabetes), cardiovascular (heart failure and hypertension) and renal diseases were widely expected, since it has accepted that SARS-CoV-2 has major implications for the cardiovascular system. Indeed, patients with heart failure [34] , diabetes [35] [36] [37] , hypertension [38] and kidney disease [39] [40] [41] have been consistently identified as particularly vulnerable populations, and these findings were consistently found in our study. In addition, we confirmed that people with weakened immune systems from a medical condition or treatment are at a higher risk. Among these, those living with haemoglobin disorders [42] , inflammatory bowel disease [43] and immune-rheumatological diseases [44] must be considered vulnerable groups for Covid-19 infection. Mental health and cognitive function might have independent utility in understanding the burden of respiratory disease, since they may influence the risk of contracting the infection, at least in part by impairing innate or adaptive immunity [45] and diminishing the precautions taken to minimize risk. Another explanation of our findings is that people with history of depression [46] , psychosis [47] and stress disorders [48] could experience elevated rates of an array of respiratory infections because these conditions often require treatment in a psychiatric care facility, and the risk of infection can be particularly high in these structures. Finally, our study adds evidence regarding the impact of diseases and conditions on the risk of SARS-CoV-2 infection between men and women. As pointed out by a recent study [49] , sex and age disaggregated data are essential for understanding the distributions of risk infection in the population and the extent to which they affect clinical outcomes. Despite our results confirm that a wide range of diseases and conditions likely increase vulnerability to SARS-CoV-2 infection, and probably its more severe clinical manifestations, we have not been able to develop a score that accurately may predict the risk of infection. In addition, we found that predictive ability of the score obtained by weighting risk factors of SARS-CoV-2 infection, was not better than generic scores of comorbidity and comedication. This expands upon previous findings of individual comorbidities as independent risk factors for SARS-CoV-2 infection [50, 51] , and confirms our substantial inability to predict the risk of SARS-CoV-2 infection. The reasons are likely linked with the several limitations of our approach that, in general, generates estimates biased towards the null. First, exposure misclassification regards our inability to careful capturing conditions and diseases through algorithms based on healthcare utilization databases [52] . Second, it is well known that outcome misclassification can bias epidemiologic results. For Covid-19, suboptimal test sensitivity, despite excellent specificity, results in an overestimation of cases in the early stages of an outbreak, and substantial underestimation of cases as prevalence increases [53] . It should be noticed, however, that both, exposure and outcome misclassification likely drew estimates towards the null (i.e., underestimate the strength of the association between their presence and the outcome risk) so generating uncertainty for the weighting approach of score developing. Third, the lack of specific data regarding the clinical outcome for the stratification of Covid-19 positive patients in terms of home isolation, hospitalization and admission in intensive care. Fourth, the lack of information on biologic markers potentially able to predict infection, and severity of its clinical manifestations, is another limitation of our study, as for example, according to the current literature, some laboratory hallmarks have been shown to predict infection, particularly in more severe cases [54] . Finally, our choice of accepting a 0.10 first type error, and of consequently reporting 90% confidence intervals, is justified by the exploratory nature of our study, but at the same time likely generate false positive signals, so limiting discriminant power of the score. In conclusion, taking the limitations we discussed into account, we identified conditions and diseases that make people more vulnerable to SARS-CoV-2 infection. These findings contribute to inform public health, and clinical decisions regarding risk stratifying. However, further research is need for developing a score reliably predicting the risk, possibly by integrating healthcare utilization with clinical and biological data. Our results can be an important tool supporting all clinical and political stakeholders allowing the identification of the population most at risk of contracting Covid-19 and facilitating the provision of appropriate preventive/therapeutic measures, especially with the hypothetic prediction of a new autumn outbreak. Adopting preventive measures can help to minimize the damage generated by a potential new relapse that the health systems will face. An interactive web-based dashboard to track COVID-19 in real time Case-Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19 Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19 A simple algorithm helps early identification of SARS-CoV-2 infection patients with severe progression Epidemiological and Clinical Predictors of COVID-19 Development and validation of the COVID-19 severity index (CSI): a prognostic tool for early respiratory decompensation. medRxiv preprint Risk factors of critical & mortal COVID-19 cases: A systematic literature review and meta-analysis Treatment Patterns of Diabetes in Italy: A Population-Based Study Biological therapy utilization, switching, and cost among patients with psoriasis: retrospective analysis of administrative databases in Southern Italy Prevalence of antibiotic prescription in southern Italian outpatients: real-world data analysis of socioeconomic and sociodemographic variables at a municipality level Osteoporosis drugs in real-world clinical practice: an analysis of persistence Prescription Patterns of Antidiabetic Treatment in the Elderly. Results from Southern Italy Adherence to chronic medication in older populations: application of a common protocol among three European cohorts Assessment and potential determinants of compliance and persistence to antiosteoporosis therapy in Italy Drug Utilization Pattern of Antibiotics: The Role of Age, Sex and Municipalities in Determining Variation Drug-utilisation Profiles and COVID-19: Retrospective Cohort Study in Italy Detection of 2019 novel coronavirus (2019-nCoV) by 9 real-time RT-PCR General authorisation to process personal data for scientific research purposes -1 The prevalence and ingredient cost of chronic comorbidity in the Irish elderly population with medication treated type 2 diabetes: A retrospective crosssectional study using a national pharmacy claims database The validity of the Rx-Risk Comorbidity Index using medicines mapped to the Anatomical Therapeutic Chemical (ATC) Classification System Developing and validating a novel multisource comorbidity score from administrative data: a large populationbased cohort study from Italy Von Willebrand's Disease Using pharmacy data to identify those with chronic conditions in Naltrexone: A Pan-Addiction Treatment? A Systematic Review of Case-Identification Algorithms Based on Italian Healthcare Administrative Databases for Three Relevant Diseases of the Nervous System: Parkinson's Disease, Multiple Sclerosis, and Epilepsy Prevalence of Multiple Sclerosis in Tuscany (Central Italy): A Study Based on Validated Administrative Data Prevalence of multiple sclerosis in the Lazio region, Italy: use of an algorithm based on health information systems Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sources The lasso method for variable selection in the Cox model A combined comorbidity score predicted mortality in elderly patients better than existing scores Estimating the receiver operating characteristic curve in matched case control studies Measuring multimorbidity inequality across Italy through the multisource comorbidity score: a nationwide study Susceptibility and prognosis of COVID-19 patients with cardiovascular disease Risk of Infection in Type 1 and Type 2 Diabetes Compared With the General Population: A Matched Cohort Study Immune dysfunction in patients with diabetes mellitus (DM) Common infections in diabetes: pathogenesis, management and relationship to glycaemic control Renin-Angiotensin-Aldosterone System Blockers and the Risk of Covid-19 Should COVID-19 Concern Nephrologists? Why and to What Extent? The Emerging Impasse of Angiotensin Blockade Kidney disease is associated with in-hospital death of patients with COVID-19 Human kidney is a target for novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. medRxiv Management of Hemoglobin Disorders During the COVID-19 Aspectos y consideraciones generales en la enfermedad inflamatoria intestinal durante la pandemia por COVID-19 COVID-19 infection and rheumatoid arthritis: Faraway, so close! Psychological stress and susceptibility to the common cold Depression and the risk of severe infections: prospective analyses on a nationwide representative sample Risk of pneumonia and pneumococcal disease in people with severe mental illness: English record linkage studies Posttraumatic Stress Disorder and Incident Infections: A Nationwide Cohort Study Sex Differences in Mortality from COVID-19 Pandemic: Are Men Vulnerable and Women Protected? Comorbidity and its impact on 1590 patients with COVID-19 in China: a nationwide analysis Charlson Comorbidity Index Score and Risk of Severe Outcome and Death in Danish Effects of non-differential exposure misclassification on false conclusions in hypothesis-generating studies Towards reduction in bias in epidemic curves due to outcome misclassification through Bayesian analysis of time-series of laboratory test results: Case study of COVID-19 in Alberta, Canada and Philadelphia, USA Prompt predicting of early clinical deterioration of moderate-to-severe COVID-19 patients: usefulness of a combined score using IL-6 in a preliminary study Supporting Information S1 Table. Campania Region Database (CaReDB) characteristics. ATC = Anatomical Therapeutic Chemical ICD-9-CM = International Classification of Diseases, 9th Revision, Clinical Modification 2018 for hospital-discharge records and 2014-2019 for outpatient pharmacy records List of diseases and conditions candidate for predicting SARS-CoV-2 infection, and corresponding ICD-CM and ATC codes used for detecting they Odds ratio (OR), and 90% confidence intervals (CI), for the relationship between selected diseases/conditions and the risk of SARS-CoV-2 infection, stratified according to gender Odds ratio (OR), and 90% confidence intervals (CI), for the relationship between selected diseases/conditions and the risk of SARS-CoV-2 infection