key: cord-0771837-r22nmvkq authors: Curtis, David title: Variants in ACE2 and TMPRSS2 Genes Are Not Major Determinants of COVID-19 Severity in UK Biobank Subjects date: 2021-03-22 journal: Hum Hered DOI: 10.1159/000515200 sha: 51a042e31af02c546ec528e01fcc15265a1940f3 doc_id: 771837 cord_uid: r22nmvkq It is plausible that variants in the ACE2 and TMPRSS2 genes might contribute to variation in COVID-19 severity and that these could explain why some people become very unwell whereas most do not. Exome sequence data was obtained for 49,953 UK Biobank subjects, of whom 82 had tested positive for SARS-CoV-2 and could be presumed to have severe disease. A weighted burden analysis was carried out using SCOREASSOC to determine whether there were differences between these cases and the other sequenced subjects in the overall burden of rare, damaging variants in ACE2 or TMPRSS2. There were no statistically significant differences in weighted burden scores between cases and controls for either gene. There were no individual DNA sequence variants with a markedly different frequency between cases and controls. Whether there are small effects on severity, or whether there might be rare variants with major effect sizes, would require studies in much larger samples. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not the main explanation for why some people develop severe symptoms in response to infection with SARS-CoV-2. This research was conducted using the UK Biobank Resource. There is wide variation in the severity of symptoms in patients infected with SARS-CoV-2, and there are reports in the UK that members of ethnic minorities are more severely affected. An obvious possible explanation for these findings would be that genetic polymorphisms affecting the structure or function of key proteins could influence host susceptibility and/or responses to infection. If these polymorphisms varied in frequency between different ethnic groups, this could contribute to differential outcomes. Two key proteins involved in SARS-CoV-2 infective processes are ACE2, which is expressed on the cell surface and acts as a receptor for the viral S protein, and TMPRSS2, which cleaves the S protein to allow fusion of the viral and cellular membranes [1] . Variants in the genes coding for these proteins might contribute to different responses to infection. A recent Italian study examining ACE2 sequence variants in 131 COVID-19 patients and 258 controls reported that overall there was an excess of variants among controls (p = 0.029) [2] . This result was partially driven by two common variants, Asn720Asp (rs41303171), which occurred in 2 cases and 11 controls, and Val749Val (rs35803318), which occurred in 5 cases and 25 controls. Another Italian study, using a different sample of 131 cases who tested positive for COVID-19, of whom 98 re-This is an Open Access article licensed under the Creative Commons Attribution-NonCommercial-4.0 International License (CC BY-NC) (http://www.karger.com/Services/OpenAccessLicense), applicable to the online version of the article only. Usage and distribution for commercial purposes requires written permission. Hum Hered 2 DOI: 10.1159/000515200 quired ventilation, and 1,000 controls found that the cumulative frequency of variants was as expected from population frequencies and there was no association with severity [3] . Here, we present the results of a study comparing frequencies of variants in ACE2 and TMPRSS2 between cases with severe COVID-19 and controls. The COVID-19 results table was downloaded from UK Biobank on April 28, 2020. This contained results for 1,474 subjects who had undergone testing for SARS-CoV-2 infection between March 16 and April 14, 2020 [4] . During this period, testing in the UK was done almost exclusively on patients admitted to hospital with a clinical diagnosis of probable COVID-19, and thus patients testing positive can be assumed to have had severe disease because patients with milder symptoms were generally left at home. Of the subjects tested, 669 tested positive, meaning that they had at least one swab which demonstrated the presence of viral RNA at detectable levels, and of these 82 were exome sequenced. The proportion of infected subjects who require hospitalisation rises with age but is still only 0.18 for those aged 80 or over [5] . Thus, the subjects who tested positive could be regarded as cases with an unusually severe response to infection, whereas the subjects who tested negative or who were not tested could be regarded as unscreened controls, most of whom would not have severe symptoms even if infected. No attempt was made to discriminate between these subjects on other measures of severity, such as use of oxygen or admission to intensive care. The exome sequence data consisted of the variant call files for 49,953 subjects who had undergone exome-sequencing and been genotyped using the GRCh38 assembly with coverage 20× at 94.6% of sites on average [6] . All variants were annotated using VEP, PolyPhen, and SIFT [7] [8] [9] . To obtain population principal components reflecting ancestry, version 1.90 beta of PLINK (https:// www.cog-genomics.org/plink2) was run with the options --maf 0.1 --pca header tabs --make-rel [10] [11] [12] . SCOREASSOC was then used to carry out a weighted burden analysis to test whether, in ACE2 or TMPRSS2, sequence variants which were rarer and/or predicted to have more severe functional effects occurred more commonly in cases, that is, subjects who tested positive for SARS-CoV-2, than all the other sequenced subjects. All available variants in each gene were included in the analyses. As originally described, variants were weighted according to frequency so that rare variants were accorded 10 times the weight of common variants [13] . Variants were additionally weighted according to their functional annotation using the default weights provided with the GENEVARASSOC program, which was used to generate input files for weighted burden analysis by SCOREAS-SOC [13] [14] [15] . For example, a weight of 5 was assigned for a synonymous variant, 10 for a non-synonymous variant, and 20 for a stop-gained variant. Additionally, 10 was added to the weight if the PolyPhen annotation was possibly or probably damaging and also if the SIFT annotation was deleterious, meaning that a non-synonymous variant annotated as both damaging and deleterious would be assigned an overall weight of 30. ACE2 is located on the X chro-mosome and hemizygous males were treated as if they were homozygous for each variant, meaning that variant frequencies would be expected to be equal in males and females. Weighted burden testing using GENEVARASSOC and SCOREASSOC was carried out to see whether the overall burden of rare, functional variants differed between cases and controls using both t tests and likelihood ratio tests using ridge regression analysis incorporating the first 20 principal components, as described previously [15] . The two common variants referred to above, rs41303171 and rs35803318, had been genotyped in the whole UK Biobank sample, so their allele counts were compared between the 669 cases who had tested positive and all the remaining 487,708 subjects using the χ 2 test. The genotype counts and frequencies of variants are presented in online supplementary Table 1 with markedly different frequencies between cases and controls. Of course, for both genes there were many rare variants which were observed in controls but not in cases, but this is as expected given the disparity in sample sizes. With respect to the common variants which had been genotyped in the entire UK Biobank sample, the frequency of rs35803318 was 0.039 in cases and 0.044 in controls, and the frequency of rs41303171 was 0.025 in cases and 0.026 in controls. Neither of these differences was statistically significant. Although the number of severely affected subjects who were sequenced is very small, it is nevertheless possible to draw some preliminary conclusions, and given the importance of the topic, it seems reasonable to communicate these findings. In general, the results are negative. It is not the case that a large proportion of severely affected subjects have a particular genetic variant in one of these genes which is relatively rare in the general population. Nor is it the case that there is a common variant which confers strong protection against severe infection. It remains possible that there might be rare variants which have a major effect on risk in individual subjects, but such effects would only be detected with larger sample sizes. The fact that the weighted burden scores were higher in controls than in cases is consistent with the hypothesis that rare genetic variants in TMPRSS2 with functional effects disrupting functioning of the protein might be protective against severe infection. Although this is biologically plausible, it should be emphasised that the results obtained are not statistically significant. This could be investigated further by carrying out targeted sequencing of this gene in a sample of a few hundred severely affected subjects. In conclusion, genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not the main explanation for why some people develop severe symptoms in response to infection with SARS-CoV-2. SARS-CoV-2 Cell Entry Depends on ACE2 and TM-PRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor ACE2 gene variants may underlie interindividual variability and susceptibility to COVID-19 in the Italian population Analysis of ACE2 genetic variants in 131 Italian SARS-CoV-2-positive patients. Hum Genomics Dynamic linkage of COVID-19 test results between Public Health England's Second Generation Surveillance System and UK Biobank. Microb Genom Estimates of the severity of coronavirus disease 2019: a modelbased analysis Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. bioRxiv The Ensembl Variant Effect Predictor Predicting functional effect of human missense mutations using PolyPhen-2 Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm PLINK: a tool set for whole-genome association and population-based linkage analyses Second-generation PLINK: rising to the challenge of larger and richer datasets International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder A rapid method for combined analysis of common and rare variants at the level of a region, gene, or pathway Pathway analysis of whole exome sequence data provides further support for the involvement of histone modification in the aetiology of schizophrenia A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score This research was conducted using the UK Biobank Resource. The author wishes to acknowledge the staff supporting the High Performance Computing Cluster, Computer Science Department, University College London. UK Biobank obtained ethics approval from the North West Multi-Centre Research Ethics Committee, which covers the UK (approval number: 11/NW/0382), and written informed consent from all participants. The UK Biobank approved application for use of the data (ID 51119). Analysis of the data was approved by the University College London Research Ethics Committee (approval number 11527/001). The author declares that he has no conflict of interest. This work did not receive any external funding but was carried out in part using resources provided by BBSRC equipment grant BB/R01356X/1. The raw data is available on application from UK Biobank. Detailed results with unredacted variant counts cannot be made available because they might be used for subject identification.