key: cord-0844601-3kuisye2 authors: Srivastava, Anshika; Pandey, Rudra Kumar; Singh, Prajjval Pratap; Kumar, Pramod; Rasalkar, Avinash Arvind; Tamang, Rakesh; van Driem, George; Shrivastava, Pankaj; Chaubey, Gyaneshwer title: Most frequent South Asian haplotypes of ACE2 share identity by descent with East Eurasian populations date: 2020-09-16 journal: PLoS One DOI: 10.1371/journal.pone.0238255 sha: 13ebcb3df40fff3b6981078cf1eaade9cb3105f4 doc_id: 844601 cord_uid: 3kuisye2 It was shown that the human Angiotensin-converting enzyme 2 (ACE2) is the receptor of recent coronavirus SARS-CoV-2, and variation in this gene may affect the susceptibility of a population. Therefore, we have analysed the sequence data of ACE2 among 393 samples worldwide, focusing on South Asia. Genetically, South Asians are more related to West Eurasian populations rather than to East Eurasians. In the present analyses of ACE2, we observed that the majority of South Asian haplotypes are closer to East Eurasians rather than to West Eurasians. The phylogenetic analysis suggested that the South Asian haplotypes shared with East Eurasians involved two unique event polymorphisms (rs4646120 and rs2285666). In contrast with the European/American populations, both of the SNPs have largely similar frequencies for East Eurasians and South Asians, Therefore, it is likely that among the South Asians, host susceptibility to the novel coronavirus SARS-CoV-2 will be more similar to that of East Eurasians rather than to that of Europeans. The novel coronavirus SARS-CoV-2, the causative agent of the ongoing pandemic of COVID-19, today presents one of the major challenges to humanity [1] . Recent studies have effectively demonstrated that the Angiotensin-converting enzyme 2 (ACE2) encoded by a gene located on the X-chromosome is the host receptor for the virus [1, 2] . A decreased level of ACE2 expression mitigates the severity of the disease. The over-expression or a unique genetic polymorphism of the receptor among Asians have been ruled out in a recent study [3, 4] . ACE2 also maintains cardiovascular homeostasis and electrolyte balance and protects against lung injury by acid aspiration [5] . A comprehensive understanding of ACE2 variations among various ethnic groups has hitherto been largely unknown. The South Asia subcontinent harbours diverse and endogamous ethnic groups [6] . Most of the genomes of South Asia are autochthonous but show a considerable amount of sharing with East and West Eurasia [7] . However, when we compare overall genome sharing with East vs. West Eurasia, South Asians show greater genetic affinity with West Eurasia [8] [9] [10] . The only exception is Tibeto-Burman speaking populations, who share a large amount of ancestry with East Eurasia [11] . The genetic structure of ACE2 haplotypes among South Asian populations is not known. Therefore, we have analysed the whole genome data of South Asians with respect to various world populations for ACE2 published elsewhere [12, 13] (S1 Table) . The research has been approved by the Institutional Ethical Committee of Banaras Hindu University, Varanasi, India. To analyse the ACE2 among various populations, we have extracted the sequences from the published datasets [12, 13] , by using PLINK 1.9 [14] . It has been shown that the 1000 genome dataset for South Asia does not capture the complete South Asian variation, mainly due to unsampled Austroasiatic populations [15] . Hence, we analysed Pagani et al. [12] by way of primary data and further confirmed the results with the 1000 genome data [13] . We extracted 447 samples designated as a diversity set panel in the Pagani et al. data [12] . After excluding samples from Africa, Sahul and relatives up to the second degree, we used 393 samples in all our analyses (S1 Table) . A total of 248 polymorphisms were observed in the Pagani et al. data [12] (S2 Table) . LD maps for each of the groups were analysed from Haploview [16] (S1 Fig). For both of the datasets, we converted plink file to fasta file (ped to IUPAC) from customised script. Phasing of the data, the calculation of population-wise genetic distances, and Arlequin and Network input files were generated by DnaSP v 6 [17] . The neighbour joining (NJ) tree was constructed by MEGA-X [18] (Fig 1A) . Nei's genetic distances and pairwise differences were calculated from Arlequin 3.5 [19] and plotted by R v 3.1 [20] (Fig 1B and S2 Fig) . Network v5 [21] and Network publisher were used to construct the median joining (MJ) networks (Fig 2 and S3 Fig) . The spatial map of rs4646120 and rs2285666 were drawn from PGG toolkit (S4 Fig) [22] . Our pooled data have yielded 248 high quality polymorphisms (S2 Table) . In the LD (linkage disequilibrium) plot analysis, significant LD blocks of different sizes were present among Caucasus, Central Asians, South Asians, mainland Southeast Asians, insular Southeast Asians and Siberians (S1 Fig). Europeans showed the lowest level of LD. We have used a haplotype based approach for the comparison. In contrast with the genome-wide analysis [8] [9] [10] , the NJ (Neighbour Joining) tree based on Fst distances clustered South Asians together with insular and mainland Southeast Asian populations (Fig 1A) . This unexpected result suggested closer a genetic affinity of South Asians with East Eurasians for ACE2. The pairwise difference analysis suggested lower diversity for South Asian, Southeast Asian and Siberian populations (Fig 1B) . Similarly, the 1000 genome populations showed the lowest diversity for East Asian populations (S2 Fig) . The phylogenetic analysis of various haplotypes among studied populations helped to identify the SNPs responsible for the affinity of South Asians with East Eurasians (Fig 2 and S3 Fig) . Three major distinct haplotypes were observed. Haplotype 1 (ht1) was more common in West Eurasians, including Central Asian populations, whereas haplotype 2 (ht2) was frequent among East Eurasians, South Asians and Americans (Fig 2 and S3 Fig). Haplotype 3 (ht3) was harboured mainly by East Eurasians and South Asians. The haplotype 2 (ht2) originated from SNP rs4646120, whereas ht3 was derived from SNP rs2285666. Phylogenetically both of these Life Sciences Pvt Ltd. India provided support in the form of salaries for author AR. The specific roles of this author is articulated in the 'author contributions' section. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Competing interests: One of our author AR is full time employee of Redcliffe Life Sciences Pvt Ltd. India. This does not alter our adherence to PLOS ONE policies on sharing data and materials. SNPs play a key role in the distinction between East and West Eurasian populations (Fig 2 and S3 and S4 Figs) . Interestingly, the most frequent haplotypes of South Asia involve these SNPs. A recent study has also highlighted the highest frequency of this SNP (rs 2285666) among Chinese populations (0.5) as well as significant frequency differences among 1000 genome populations (S4 Fig) [4] . In our study, we also found high frequency (0.6) of this SNP among South Asians (S2 Table and S4 Fig) . Moreover, we also found that a synonymous coding region variant rs35803318 was most frequent among Americans (0.15), followed by Europeans (0.055), Caucasians (0.051) and Central Asians (0.021), whilst this site was not polymorphic for West Asians, South Asians, Southeast Asians and Siberians (S2 Table) . Phylogenetic analysis has suggested that the majority of South Asian samples share with East Eurasians the monophyletic haplotypes 2 and 3 by the unique polymorphism events (rs4646120) and (rs2285666). Recent studies have suggested that the reference allele has a reduced ACE2 expression of up to 50%, resulting in greater severity of a SARS-CoV-2 infection [23] [24] [25] . Additionally, a synonymous coding region variant rs35803318 was also significantly more polymorphic among Americans and Europeans than among South Asians. Hence, it is likely that among South Asians, the host susceptibility to the novel coronavirus SARS-CoV-2 more closely resembles that of East/Southeast Asians rather than that of Europeans or Americans. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding A pneumonia outbreak associated with a new coronavirus of probable bat origin Asians do not exhibit elevated expression or unique genetic polymorphisms for ACE2, the cell-entry receptor of SARS-CoV-2. Preprints 2020 Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations Prevalence of comorbidities in the novel Wuhan coronavirus (COVID-19) infection: a systematic review and meta-analysis The promise of discovering population-specific disease-associated genes in South Asia Peopling of South Asia: investigating the caste-tribe continuum in India Genetic diversity in India and the inference of Eurasian population expansion Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia Reconstructing Indian population history Population Genetic Structure in Indian Austroasiatic speakers: The Role of Landscape Barriers and Sex-specific Admixture Genomic analyses inform on migration events during the peopling of Eurasia A map of human genome variation from population-scale sequencing Second-generation PLINK: rising to the challenge of larger and richer datasets Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset Haploview: analysis and visualization of LD and haplotype maps DNA sequence polymorphism analysis of large data sets MEGA X: molecular evolutionary genetics analysis across computing platforms Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows R: A language and environment for statistical computing Rö hl A. Median-joining networks for inferring intraspecific phylogenies SNV: understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations ACE2 and TMPRSS2 variants and expression as candidates to sex and country differences in COVID-19 severity in Italy The ACE 2 G8790A Polymorphism: Involvement in Type 2 Diabetes Mellitus Combined with Cerebral Stroke Decoding SARS-CoV-2 Hijacking of Host Mitochondria in Pathogenesis of COVID-19 We thank to both of the reviewers and the Editor for their constructive suggestions. Conceptualization: Rakesh Tamang, Gyaneshwer Chaubey. Rasalkar, Gyaneshwer Chaubey.