key: cord-0900973-2szz1jmi authors: Benetti, E.; Giliberti, A.; Emiliozzi, A.; Velentino, F.; Bergantini, L.; Fallerini, C.; Anedda, F.; Amitrano, S.; Conticini, E.; Tita, R.; DAlessandro, M.; Fava, F.; Marcantonio, S.; Baldassarri, M.; Bruttini, M.; Mazzei, M. A.; Montagnani, F.; Mandala, M.; Bargagli, E.; Furini, S.; COVID-19 MULTICENTER STUDY,; Renieri, A.; Mari, F. title: Clinical and molecular characterization of COVID-19 hospitalized patients date: 2020-05-25 journal: nan DOI: 10.1101/2020.05.22.20108845 sha: cae00defe18b093ee6db3b50ae3c9e4ebaa1abc8 doc_id: 900973 cord_uid: 2szz1jmi Clinical and molecular characterization by Whole Exome Sequencing (WES) is reported in 35 COVID-19 patients attending the University Hospital in Siena, Italy, from April 7 to May 7, 2020. Eighty percent of patients required respiratory assistance, half of them being on mechanical ventilation. Fiftyone percent had hepatic involvement and hyposmia was ascertained in 3 patients. Searching for common genes by collapsing methods against 150 WES of controls of the Italian population failed to give straightforward statistically significant results with the exception of two genes. This result is not unexpected since we are facing the most challenging common disorder triggered by environmental factors with a strong underlying heritability (50%). The lesson learned from Autism-Spectrum-Disorders prompted us to re-analyse the cohort treating each patient as an independent case, following a Mendelian-like model. We identified for each patient an average of 2.5 pathogenic mutations involved in virus infection susceptibility and pinpointing to one or more rare disorder(s). To our knowledge, this is the first report on WES and COVID-19. Our results suggest a combined model for COVID-19 susceptibility with a number of common susceptibility genes which represent the favorite background in which additional host private mutations may determine disease progression. mechanical ventilation. Fiftyone percent had hepatic involvement and hyposmia was ascertained in 3 patients. Searching for common genes by collapsing methods against 150 WES of controls of the Italian population failed to give straightforward statistically significant results with the exception of two genes. This result is not unexpected since we are facing the most challenging common disorder triggered by environmental factors with a strong underlying heritability (50%). The lesson learned from Autism-Spectrum-Disorders prompted us to re-analyse the cohort treating each patient as an independent case, following a Mendelian-like model. We identified for each patient an average of 2.5 pathogenic mutations involved in virus infection susceptibility and pinpointing to one or more rare disorder(s). To our knowledge, this is the first report on WES and COVID-19. Our results suggest a combined model for COVID-19 susceptibility with a number of common susceptibility genes which represent the favorite background in which additional host private mutations may determine disease progression. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020 Italy has been the first European Country experiencing the epidemic wave of SARS-CoV-2 infection, with an apparently more severe clinical picture, compared to other countries. Indeed, the case fatality rate has peaked to 14% in Italy, while it remains stable around 5% in China. fully explain the differences in clinical severity. A reasonable hypothesis is that at the basis of these different outcomes there are host predisposing genetic factors leading to different immunogenicity/cytokine responses as well as specific receptor permissiveness to virus and antiviral defence [3] [4] [5] [6] . Similarly, during the study of host genetics in influenza disease, a pattern of genetic markers has been identified which underlies increased susceptibility to a more severe clinical outcome (as reviewed in [7] ). This hypothesis is also supported by a recent work reporting 50% heritability of COVID-19 symptoms [8] . The identification of host genetic variants associated with disease severity is of Fig. 1 ). In the two most severe groups (groups 1 and 2, including 13 patients) there are 11 males and 2 females, while in the two mildest groups (groups 3 and 4 including 22 patients) males are 13 while females are 9. Patients were also assigned a lung imaging grading according to X-Rays and CT scans. The mean value is 13 for high care intensity group, 12 for intermediate care intensity group, 8 for low care intensity group and 5 for very low care intensity group. Regarding immunological findings, a decrease in the total number of peripheral CD4 + T cells were identified in 13 subjects, while NK cells' count was impaired in 10 patients. Six patients showed a reduction of both parameters. IL-6 serum level was elevated in 13 patients. Based on blood groups, the cohort is divided into 15 patients of group 0, 16 patients of group A, 4 patients of group B and none of group AB. Hyposmia was present in 3 out of 34 evaluated cases (8.8%), and hypogeusia was present in the same subjects plus another case. These four cases belong to the first three severity groups. Liver involvement was present in 7 cases (20%), while pancreas involvement in 4 cases (11%); 10 patients presented both (29%). Heart involvement was detected in 13 cases (37%). 9 patients (25%) showed kidney involvement. Fibrinogen values below 200 mg/dL were identified in 2 cases (6%), between 200 and 400 mg/dL in 7 cases (20%), and above 400mg/dL in 22 cases (63%). D-dimer value below 500 ng/mL was present in 1 case (3%), between 500 and 5000 ng/mL in 26 cases (74%), and in 7 cases (20%) was 10 times higher than the normal value (>5000 ng/mL) ( Table 1) . c a l c h a r a c t e r i s t i c s C O V I D 1 9 p a t i e n t s a d m i t t e d t o t h e U n i v e r s i t y H o s p i t a l o f S i e n a ( I t a l y ) S u b j e c t c h a r a c t e r i s t i c s G r o u p 1 G r o u p 2 G r o u p 3 G r o u p 4 N o . o f s u b j e c t s 6 7 1 5 7 M e d i a n a g e ( r a n g e ) a 6 5 ( 5 5 -7 0 ) 5 8 ( 3 1 -7 4 ) 6 6 ( 4 9 -9 8 ) 5 8 ( 3 1 -7 4 ) G e n d e r M a l e 5 6 7 6 F e m a l e 1 1 8 1 B l o o d G r o u p A 2 3 6 5 B 2 1 1 0 0 2 3 8 r m a l v a l u e 2 3 1 > 4 0 0 ( m g / d L ) h y p e r f i b r i n o g e n e m i a 3 3 1 2 4 U n k n o w n 0 2 0 e s s t h a n 1 0 t i m e s h i g h e r t h a n n o r m a l v a l u e 3 5 1 3 5 n o r m a l 0 0 0 1 M o r e t h a n 1 0 t i m e s h i g h e r t h a n n o r m a l 3 2 2 0 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. e p a t i c ( H ) / P a n c r e a t i c i n v o l v e m e n t ( P ) H a n d P 2 5 6 1 H o n l y 3 0 1 1 P o n l y 0 0 1 1 N o n e 1 2 7 4 K i d n e y i n v o l v e m e n t Y e s 0 3 5 1 N o 6 4 1 0 6 C o -m o r b i d i t i e s C a r d i o v a s c u l a r d i s e a s e 1 2 3 H y p e r t e n s i o n 2 2 8 T u m o r 2 1 2 1 D i a b e t e s 4 P u l m o n a r y d i s e a s e 1 Table S1 ) and FAM104B and NDUFAF7, although to a lesser extent ( Fig. 2 and Supplementary Table S1 ). For all these genes, the susceptibility factor is represented by the functioning (or more functioning) gene. We also identified two additional genes, Table S2 ). In these latter cases, the functioning gene represents indeed a protective factor. We then tested the hypothesis that COVID-19 susceptibility is due to different variants in different individuals. A recently acquired knowledge on the genetic bases of Autism Spectrum Disorders suggests that a common disorder could be the sum of many different rare disorders and this genetic landscape can appear indistinguishable at the clinical All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.22.20108845 doi: medRxiv preprint level [9] . Therefore, we analyzed our cohort treating each patient as an independent case, following a Mendelian-like model. According to the "pathogenic" definition in ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/), for each patient, we identified an average of 1 mutated gene involved in viral infection susceptibility and pinpointing to one or more rare disorder(s) or a carrier status of rare disorders (Fig. 1) . Following the pipeline used in routine clinical practice for WES analysis in rare disorders we then moved forward checking for rare variants "predicted'' to be relevant for infection by the means of common annotation tools. We thus identified an average of additional 1-5 variants per patient which summed up to the previous identified pathogenic variants (Fig. 1, Supplementary Table S3 ). We then checked the cohort for known non rare variants classified as either "pathogenic" or "protective" in ClinVar database and related to viral infection. Variants in six different genes matched the term of "viral infection" and "pathogenic" according to ClinVar (Fig. 1) . Overall, a mean of 3 genes with "pathogenic" common variants involved in viral infection susceptibility were present (Fig. 1) . Among the common protective variants, we list as example three variants which confer protection to Human Immunodeficiency Virus (HIV), the first two, and leprosy, the third one: a CCR2 variant (rs1799864) identified in 8 patients, a CCR5 (rs1800940) in one patient and a TLR1 variant (rs5743618) in 26 patients (not shown). A IL4R variant (rs1805015) associated with HIV slow progression was present in 8 patients (not shown). Although not identified by unbiased collapsing gene analysis a number of obvious candidate genes were specifically analyzed. First of all, we noticed that SARS-CoV-2 receptor, ACE2 protein is preserved in the cohort, only a silent mutation V749V being present in 2 males and 2 heterozygous females. This is in line with our previous suggestion that either rare variants or polymorphisms may impact infectivity [10] The IFITM3 polymorphism (rs12252) was found in heterozygosity in 4 patients as expected by frequency. Eight patients had heterozygous missense mutations in CFTR gene reported as VUS/mild variants, 7 / 8 being among the more severely affected patients. In this study, we present a cohort of 35 COVID-19 patients admitted between April and May 2020 to the University Hospital of Siena who were clinically characterized by a team of 29 MDs belonging to 7 different specialties. As expected, the majority of hospitalized patients are males, confirming previously published data reporting a predominance of males among the most severe COVID-19 affected patients [11] . The distribution of blood types in our patients is not statistically different from that of the general population according to a chi-square test with alpha equal to 0.05 (data not shown). Lung imaging involvement, evaluated through a modified lung imaging grading system [12] , did not completely correlate with respiratory impairment since among the 13 patients who required mechanical ventilation (group 1 and 2), grading was either moderate (10) or mild (3). In line with our previous data, lymphocyte subset immunophenotyping revealed a decrease in the total number of CD4 and NK cells count, especially in the most severe patients [13] . Laboratory tests revealed a multiple-organ involvement, confirming that COVID-19 is a systemic disease rather than just a lung disorder (Fig. 1) . We thus propose that only a detailed clinical characterization can allow to disentangle the complex relationship between genes and signs/symptoms. In order to test the hypothesis that the COVID-19 susceptibility is due to one or more genes in common among patients, we used the gene burden test to compare the rate of disrupting mutations per gene. This test has already been successfully applied to discover All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.22.20108845 doi: medRxiv preprint susceptibility genes for Respiratory Syncytial Virus infection [14] . We identified 2 pretty robust genes whose damage represents a protective factor: OR4C5 and ZNF717. OR4C5 is a "resurrected" pseudogene, known to be non functioning in half of the European population. In our genome, in addition to 413 Olfactory Receptor (OR) loci, there are 244 segregating pseudogenes, 26 of which are "resurrected" from a pseudogene status. The OR4C5 locus belongs to this subgroup and it has the highest intrapopulation variability, Expression of the "resurrected" pseudogene OR4C5 may help in triggering the natural immunity leading to virus and cell death. It is interesting to note that protein atlas shows OR4C5 protein expression in the liver without the corresponding mRNA expression (www.proteinatlas.org). Usually, dissociation between protein and mRNA means that the protein is produced elsewhere and then transported into the organ. This is the case, for instance, for neurotransmitters that are synthetized in neuronal cell bodies located in All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.22.20108845 doi: medRxiv preprint abdominal ganglia and are subsequently transported to the liver through the axons innervating the organ [19] . Thus, we may speculate that OR4C5 reaches the liver through nerve terminals. If this is the case, those individuals expressing the resurrected OR4C5 gene may have more triggers of innate immunity and subsequently higher organ damage. It is noteworthy that in our cohort the putative expression of OR4C5 (white boxes) is identified in patients with liver damage (Fig. 1) . Previous studies reported a prevalence of olfactory disorders in COVID-19 population ranging from 5% to 98%. Kruppel-associated box zinc-finger protein 717 (ZNF717) belongs to a large group of transcriptional regulators playing important roles in different cellular processes, including cell proliferation, differentiation and apoptosis, and in the regulation of viral replication and transcription. ZNF717 variations were detected in more than 10% of the WGS samples in Hepatitis B Virus (HBV)-related hepatocellular carcinoma (HCC) [22] where it has been identified as a potential driver gene with high frequency mutations at both single-cell and population levels. Moreover, this gene is one of the most recurrent somatically mutated genes in gastric cancer, together with TP53 [23] . From a functional standpoint it acts through the regulation of the IL-6/STAT3 pathway. ZNF717 knockdown in HCC cell lines results in increased levels of IL-6 and upregulation of STAT3 and its target genes [24] . According to our results those people who have this gene more damaged are more protected from SARS-CoV-2 infection, likely because they have a smarter innate immunity. PRKRA (protein kinase activator A, also known as PACT; OMIM# *603424) is a All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . protein kinase activated by double-stranded RNA which mediates the effects of interferon in response to viral infection, resulting in antiviral activity [25] . Multiple transcript variants have been identified, including a polymorphism removing the canonical ATG. Mutations in the PRKRA gene have been associated with autosomal recessive dystonia [26] . The innate immune response is activated by the detection of viral structures, potentially via dsRNAbinding partners such as PRKRA [27] . In line with PRKRA antiviral activity, in our cohort we found a higher burden of potentially deleterious variants in COVID-19 patients, suggesting that these variants might reduce PRKRA functionality, potentially impairing IFN-mediated immune response. (Lysosomal Protein Transmembrane 4 Beta) gene, which was selected using burden gene test (P-val 4.04625E-06 and adj P-val 0.02911682). LAPTM4B protein is involved in the endosomal network, which eventually enables productive viral infection [28] . In particular, in HBV infection, endocytosis of EGF Receptor (EGFR) drives the translocation of HBV particles from the cell surface to the endosomal network, enabling productive viral infection [28] . In this process, LAPTM4B down regulates the formation of late lysosomes, suppressing EGFR lysosomal degradation and thus leading to a prolonged permanence of EGFR on the cell surface [29] . Accordingly, LAPTM4B knockdown significantly promotes HBV infection [28] . This perfectly fits with our results, indicating a significantly increased probability of deleterious changes in COVID-19 patients compared to controls. Being affected by a rare disorder and/or being a carrier of rare disorders may represent a susceptibility factor to infections (Fig. 1) . Having this in mind and driven by the lesson learned from the studies on the genetics bases of Autism Spectrum Disorders, we explored the possibility that each patient could have one or a unique (personalized approach) combination of rare pathogenic or highly relevant variants related for different reasons to infection susceptibility [9] . For instance, one male patient is affected by Glucose-6-All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . phosphate dehydrogenase (G6PD)-deficiency (rs137852318), the most common enzymopathy in humans, affecting 400 million people worldwide. G6PD-deficient cells are more susceptible to several viruses including coronavirus and in G6PD-deficient cells, innate immunity is down regulated, in line with the observed very low levels of IL-6 in this patient ( Fig. 1) [30] . The same patient also presents a ZEB1-linked corneal dystrophy. ZEB1 gene is known to function in immune cells, playing an important role in establishing both the effector response and future immunity in response to pathogens [31]. In addition, two sisters have TGFBI mutations, associated with corneal dystrophy, several patients are carriers of pseudoxanthoma elasticum bearing ABCC6 gene mutations, others have likely hypomorphic mutations in CHD7 or COL5A1/2 variants. All these genes play a role as modulators of immune cells activity and/or response to infections [32-39]. Other rare variants were identified in the following interesting genes: ADAR, involved in viral RNA editing; CLEC4M, an alternative receptor for SARS-CoV [40] ; HCRTR1/2, receptors of Hypocretin, important in the regulation of fatigue during infections [41] ; FURIN, a serine protease that cleaves the SARS-Cov-2 minor capsid protein important for ACE2 contact and viral entry into the host cells [42, 43] . Additional interesting variants have been identified in NOS3 and OPRM1. COVID-19 aggravates NitricOxide (NO) production deficit in patients with NOS3 polymorphisms. Management of eNOS/iNOS ratio (endothelial/inducible NO synthase) and NO level can prevent development of severe acute respiratory distress syndrome [44] . From an immunological point of view, NO is mainly produced through the iNOS, which can be selectively expressed both in epithelial and white blood cells. It is postulated that NO plays a crucial role in innate and specific host defense, particularly against protozoa and bacteria. However, its role in viral infection is debated, as conflicting results have been reported in literature. Even though iNOS is generally overexpressed in patients with active viral infection, experiments conducted in murine models show that the inhibition of iNOS leads All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.22.20108845 doi: medRxiv preprint to a significant improvement of HSV-viral pneumonia, despite the impairment of viral clearance [45] . Moreover, NO production may facilitate virus mutations and selection of more resistant strains. However, it seems that NO may have different effects according to causative agents. Notably, few studies demonstrated that NO was able to significantly reduce viral infection and replication of SARS-CoV through two distinct mechanisms: impairment of the fusion between the spike protein and its receptor ACE2, and reduction of viral RNA production [46] . Very few data are currently available on NO specific effects in COVID-19. The promising results reported in SARS-CoV infection may suggest a similar effectiveness also in COVID-19, considering that SARS-CoV and SARS-CoV2 share more than 70% of RNA sequence and have shown similar mechanisms of infection and viral replication. Clinical trials are ongoing to evaluate the effectiveness of inhaled NO in COVID-19 patients: although inhaled NO is conceived as a vasodilator therapy and therefore is indicated for the optimization of ventilation/perfusion mismatch in intubated patients, the results may be helpful to elucidate its potential antiviral properties [47, 48] . Opioid ligands may regulate the expression of chemokines and chemokine receptors [49] . Due to immunomodulatory effect of morphine, OPRM1 has been supposed to be involved in immune response and in HIV expansion [49, 50] (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. We also identified common "pathogenic" variants in genes known to be linked to viral infection, such as MBL2, IRGM and SAA1, and/or specific organ damage as PRSS1. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. For the last above reported genes and pathogenic variants or predicted variants relevant for infection, a statistically significant difference in variant's frequency was not found between cases and controls looking at either the single variant or the single gene, as a burden effect of variants. However, as depicted in the overall Fig. 1 , we could hypothesize a combined model in which common susceptibility genes will sum to less common or private susceptibility variants. A specific combination of these 2 categories may determine type (organotropism) and severity of the disease. Our observations related to the huge amount of data, both on phenome and genome sides, and represented in Figure 1 , could also lay the bases for association rule mining approaches. Artificial intelligence techniques based on pattern recognition may discover an intelligible picture which appears blurred at present. Further analyses in larger cohort of cases are mandatory in order to test this hypothesis of a combined model for COVID-19 susceptibility with a number of common susceptibility genes which represent the fertile background in which additional private, rare or low frequency mutations confer to the host the most favourable environment for virus growth and organ damage. Thirty-five patients admitted to the University Hospital in Siena, Italy, from April 7 to May 7, 2020 were recruited. The study was consistent with Institutional guidelines and approved by the University Hospital (Azienda Ospedaliera Universitaria Senese) Ethical Review Board, Siena, Italy (Prot n. 16929, dated March 16, 2020). Written informed consent was All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.22.20108845 doi: medRxiv preprint obtained from all patients. Peripheral blood samples in EDTA-containing tubes and detailed clinical data were collected. All these data were inserted in a section dedicated to COVID-19 of the established and certified Biobank and Registry of the Medical Genetics Unit of the Hospital. An example of the Clinical questionnaire is illustrated in Supplementary Fig. S1 . Each patient was assigned a continuous quantitative respiratory score, the PaO2/FiO2 ratio (normal values >300) (P/F), as the worst value during the hospitalization. Patients were also assigned a lung imaging grading according to X-Rays and CT scans. In particular, lung involvement was scored through imaging at the time of admission and during hospitalization (worst score), annotating the chest X-Ray (CXR) score (in 34 patients) and CT score in 1 patient for whom X-Rays were not available. To obtain the score (from 0 to 28) each CXR was divided in four quadrant (right upper, right lower, left upper and left lower) and for each quadrant the presence of consolidation (0= no consolidation; 1 <50%, 2>50%), ground glass opacities (GGOs: 0= no GGOs, 1<50%, 2 >50%), reticulation (0= no GGOs, 1<50%, 2 >50%) and pleural effusion on left or right side (0= no, 1= minimal; 2= large) were recorded. The same score was applied for CT (1 patient). For each patient, the presence of hyposmia and hypogeusia was also investigated through otolaryngology examination, Burghart sniffin' sticks [59] and a visual analog scale (VAS). Whenever the sign was present, a score ranging from 0 to 10 was assigned to each patient using VAS where 0 means the best sense of smell and 10 represents the absence of smell sensation [60] . The presence of hepatic involvement was defined on the basis of a clear hepatic enzymes elevation as glutamic pyruvic transaminase (ALT) and glutamic oxaloacetic transaminase (AST) both higher than 40 UI/L. Pancreatic involvement was considered on the basis of an increase of pancreatic enzymes as pancreatic amylase higher than 53 UI/l and lipase higher than 60UI/l. Heart involvement was defined on the basis of one or more of the following abnormal data: Troponin T (indicative of ischemic disorder), NT-proBNP (indicative of heart failure) and arrhythmias (indicative of elettric disorder). Kidney involvement was defined in the presence of a creatinine value higher than 1,20 mg/dl in males and higher than 1,10 mg/dl in females. Genomic DNA was extracted from peripheral blood using the MagCore®Genomic DNA Whole Blood kit (RBC Biosciences) according to manufacturer's protocol. Whole exome sequencing analysis was performed on Illumina NovaSeq 6000 system (Illumina, San All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. Reads were mapped to the hg19 reference genome by the Burrow-Wheeler aligner BWA [61] . Variants calling was performed according to the GATK4 best practice guidelines [62] . Namely, duplicates were first removed by MarkDuplicates, and base qualities were recalibrated using BaseRecalibration and ApplyBQSR. HaplotypeCaller was used to calculate Genomic VCF files for each sample, which were then used for multi-sample calling by GenomicDBImport and GenotypeGVCF. In order to improve the specificitysensitivity balance, variants quality scores were calculated by VariantRecalibrator and ApplyVQSR. Variants were annotated by ANNOVAR [63] , and with the number of articles answering the query "gene_name AND viral infection" in Pubmed, where gene_name is the name of the gene affected by the variant. In order to identify candidate genes according to the Mendelian-like model, rare variants were filtered by a prioritization approach. We used the ExAC database (http://exac.broadinstitute.org/), in particular the ExAC_NFE reported frequency to filter variants according to a minor allele frequency < 0.01. Synonymous, intronic and non-coding variants were excluded from the analysis. Mutation disease database ClinVar (ncbi.nlm.nih.gov/clinvar/) was used to identify previous pathogenicity classifications and variants reported as likely benign/benign were discarded. Filtering and prioritization of variants was completed using the CADD_Phred pathogenicity prediction tool. Finally, we selected genes involved in infection susceptibility using the term "viral infection" as Pubmed database search. In order to identify genes with a different prevalence of functionally relevant variants between COVID-19 patients and control samples, the following score was calculated: , All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. [64] , which provides an estimate of the likelihood that the variant has deleterious functional effects (i.e. variants more likely to have a functional effect contribute more to the score). The sum in equation (1) was performed over all the variants in the gene where the DANN score was available. Genes with less than 5 annotated variants were discarded from the analysis. The scores calculated by equation (1) were ranked for all the samples, and the sum of the ranking for the COVID-19 samples, named , was calculated. Then, sample labels were permuted 10.000 times, and these permutations were used to estimate the average value and the standard deviation of under the nullhypothesis. The p-value was calculated assuming a normal distribution for the sum of the ranking [65] . Data about the gene-based analyses and variants are available as Supplementary Material. The results of variant calling are available as aggregated data in the Network for Italian Genomes database (http://www.nig.cineca.it). The datasets generated during the current study are available from the corresponding author on reasonable request. [3] Liu, R., et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell. 86, 367-377 (1996). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In-silico predicted deleterious variants in genes relevant for infection and pathogenic variants (both common and rare) reported in ClinVar Database are described and a further subdivision between genes involved in a mendelian disorder and/or viral infection susceptibility is provided. For all these gene categories, dark grey is used to identify the homozygous status of the variants while light grey for the heterozygous status. In the end, statistically significant genes obtained after Gene Burden analysis are listed: gene represents a susceptibility factor. For this category, white color underlies a higher mutational burden while grey color indicates lower mutational burden. ILs and ILRs IL2RB (N43S) C l i n i c a l c h a r a c t e r i s t i c s C O V I D 1 9 p a t i e n t s a d m i t t e d t o t h e U n i v e r s i t y H o s p i t a l o f S i e n a ( I t a l y ) S u b j e c t c h a r a c t e r i s t i c s G r o u p 1 G r o u p 2 G r o u p 3 G r o u p All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. L e s s t h a n 1 0 t i m e s h i g h e r t h a n n o r m a l v a l u e 3 5 1 3 5 n o r m a l 0 0 0 Risk of meticillin resistant Staphylococcus aureus and Clostridium difficile in patients with a documented penicillin allergy: population based matched cohort study TNF-α polymorphisms affect persistence and progression of HBV infection CD40 polymorphisms were associated with HCV infection susceptibility among Chinese population Host Single Nucleotide Polymorphisms Modulating Influenza A Virus Disease in Humans Self-reported symptoms of covid-19 including symptoms most predictive of SARS-CoV-2 infection, are heritable Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism ACE2 gene variants underlie interindividual variability and susceptibility to COVID-19 in Italian population Sex difference and smoking predisposition in patients with COVID-19. The Lancet Respiratory Medicine COVID-19 outbreak in Italy: Experimental chest xray scoring system for quantifying and monitoring disease progression Peripheral lymphocyte subset monitoring in COVID19 patients: a prospective Italian real-life case series Exome capture sequencing reveals new insights into hepatitis B virus-induced hepatocellular carcinoma at the early stage of tumorigenesis Comprehensive characterization of the genomic alterations in human gastric cancer Diverse modes of clonal evolution in HBV-related hepatocellular carcinoma revealed by single-cell genome sequencing Antiviral activity of double-stranded RNA-binding protein PACT against influenza A virus mediated via suppression of viral RNA polymerase The prevalence of PRKRA mutations in idiopathic dystonia PACT is required for MDA5-mediated immunoresponses triggered by Cardiovirus infection via interaction with LGP2. Biochemical and biophysical research communications Epidermal growth factor receptor is a host-entry cofactor triggering hepatitis B virus internalization LAPTM4B is a PtdIns (4, 5) P2 effector that regulates EGFR signaling, lysosomal sorting, and degradation. The EMBO journal Glucose-6-phosphate dehydrogenase deficiency enhances human coronavirus 229E infection. The Journal of infectious diseases Homozygous L-SIGN (CLEC4M) plays a protective role in SARS coronavirus infection Tumor necrosis factor-alpha regulates the Hypocretin system via mRNA degradation and ubiquitination Guanylate-binding proteins 2 and 5 exert broad antiviral activity by inhibiting furin-mediated processing of viral envelope proteins Cell entry mechanisms of SARS-CoV-2 Highlights of COVID-19 pathogenesis. Insights into Oxidative Damage Suppression of herpes simplex virus type 1 (HSV-1)-induced pneumonia in mice by inhibition of inducible nitric oxide synthase (iNOS, NOS2) Dual effect of nitric oxide on SARS-CoV replication: viral RNA production and palmitoylation of the S protein are affected Nitric oxide inhibits the replication cycle of severe acute respiratory syndrome coronavirus Outpatient Inhaled Nitric Oxide in a Patient with Vasoreactive IPAH and COVID-19 Infection Opioid and nociceptin receptors regulate cytokine and cytokine receptor expression Visual analogue scales (VAS): Measuring instruments for the documentation of symptoms and therapy monitoring in cases of allergic rhinitis in everyday health care Fast and accurate long-read alignment with Burrows-Wheeler transform Scaling accurate genetic variant discovery to tens of thousands of samples ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data DANN : a deep learning approach for annotating the pathogenicity of genetic variants Statistical Analysis of Rare Sequence Variants : An Overview of Collapsing Methods. Genetic epidemiology This study is part of GEN-COVID, https://sites.google.com/dbm.unisi.it/gen-covid the Italian multicenter study aimed to identify the COVID-19 host genetic bases The Genetic and COVID-19 Biobank of Siena, member of BBMRI-IT, of Telethon Network of Genetic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 68 70 62 68 55 55 42 60 51 75 81 62 60 62 77 49 98 49 67 66 91 60 65 60 64 74 81 87 55 65 65 58 31 74 31 Blood Group A B A B 0 0 A 0 B A 0 A 0 0 0 0 A 0 0 A 0 A A A B 0 0 A A A A 0 0 A A Respiratory Severity P/F score (worst value) 67 96 80 103 93 149 Unknown 126 200 130 156 162 210 339 293 318 140 285 368 100-200 264 280 Unknown 100-200 200-300 200-300 279 312 >300 304 347 Unknown >300 400 Unknown Lung Imaging Grading (0-28) (worst) 13 13 16 14 11 10 13 11 13 10 15 13 9 5 6 7 14 5 8 11 10 10 3 9 7 14 8 6 4 8 9 9 3 5