key: cord-0740970-27wffjl0 authors: Burkholz, Scott; Pokhrel, Suman; Kraemer, Benjamin R.; Mochly-Rosen, Daria; Carback, Richard T.; Hodge, Tom; Harris, Paul; Ciotlos, Serban; Wang, Lu; Herst, C.V.; Rubsamen, Reid title: Paired SARS-CoV-2 spike protein mutations observed during ongoing SARS-CoV-2 viral transfer from humans to minks and Back to humans date: 2021-05-07 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2021.104897 sha: f688a580a9c8e354b1dca99de419f9a920fb512e doc_id: 740970 cord_uid: 27wffjl0 A mutation analysis of SARS-CoV-2 genomes collected around the world sorted by sequence, date, geographic location, and species has revealed a large number of variants from the initial reference sequence in Wuhan. This analysis also reveals that humans infected with SARS-CoV-2 have infected mink populations in the Netherlands, Denmark, United States, and Canada. In these animals, a small set of mutations in the spike protein receptor binding domain (RBD), often occurring in specific combinations, has transferred back into humans. The viral genomic mutations in minks observed in the Netherlands and Denmark show the potential for new mutations on the SARS-CoV-2 spike protein RBD to be introduced into humans by zoonotic transfer. Our data suggests that close attention to viral transfer from humans to farm animals and pets will be required to prevent build-up of a viral reservoir for potential future zoonotic transfer. Coronaviruses are thought to have ancient origins extending back tens of millions of years with coevolution tied to bats and birds 1 . This subfamily of viruses contains proofreading mechanisms, rare in other RNA viruses, reducing the frequency of mutations that might alter viral fitness 2 . The D614G mutation, which lies outside the RBD, is an example of a fitnessenhancing mutation on the spike glycoprotein that became the most prevalent variant as the virus spreads through human populations 3 . The recently observed, and more infectious, D796H mutation paired with ΔH69/V70, also outside the RBD domain, first observed in January 2020, has spread throughout Southeast England 4 . Data from next-generation sequencing has shown that the SARS-CoV-2 viral genome mutates at about half the rate of Influenza and about a quarter of the rate seen for HIV, with about 10 nucleotides of average difference between samples 5 . The Global Initiative on Sharing Avian Influenza Data (GISAID) 6,7 , has catalogued over 235,299 SARS-CoV-2 sequences to date from samples provided by laboratories around the world. This diversity is profound with well over 12,000 mutations having been shown to exist, with the potential for non-synonymous substitutions, insertions, or deletions resulting in amino acid changes which could result in structural and functional changes in virus proteins 5, [8] [9] [10] . While Coronaviruses initially developed in animals and transferred to humans, transfer back to animals and then back to humans again has recently been observed in in the Neovison vison species of mink, currently being raised in farms around the world. to the WIV04 reference. This was done to avoid the potential for improper mutation identification confounding downstream analysis. After all these filtering procedures, 782 human and 251 mink sequences remained for analysis. The statistical package R 17 was utilized to plot mutation frequency by geography, date, and species to reveal patterns indicative of zoonotic transfer. Sequence identifiers, and the respective authors, utilized for results are shown in supplementary table 1. The SARS-CoV-2 reference sequence WIV04 (MN996528.1) 15 was aligned against a SARS-CoV-1 reference sequence (NC_004718.3) 18 via MAFFT 14 . Residue positions were visualized in Jalview 19 . The PDB file, 7A98 20 , was downloaded from RCSB.org. Positions of interest were visualized in MOE 21 for figure 1b. Molecular Operating Environment (MOE) 21 software was used with PDB 7A98 20 , and prepared with QuickPrep functionality at the default settings, to optimize the H-bond network and perform energy minimization on the system. Affinity calculations were performed using 7A98.A (spike protein monomer) and 7A98.D (ACE2) chains. Residues in spike protein (7A98.A) within (kcal/mol) between the variants and the reference sequence were calculated as per MOE's definition 21 . The potential effect of variants was predicted using PROVEAN 22 with R 17 , and SIFT 23 . Residues in ACE2 protein (7A98.D) within 4.5 Å from spike protein (7A98.A) were selected and the residue scan application was run by defining the spike protein (7A98.A) as the ligand. The changes in affinity (kcal/mol) between mink, mouse, and hamster ACE2 sequences compared to human ACE2 were calculated as per MOE's definition 21 . IQ-TREE 2 24 was used to generate a phylogenic tree via maximum likelihood calculations. The RBD region, plus 25 amino acids on each end, of the filtered, processed human and mink samples was imputed for analysis. The "FLU+I" model was chosen as the best model to fit, with subsequent ultrafast bootstrapping till convergence and branch testing. FigTree (http://tree.bio.ed.ac.uk/software/figtree/) was used to visualize and color the phylogenic tree. 2.6 ACE2 sequence alignment ACE2 protein sequences were obtained from Uniprot 25 for human (identifier: Q9BYF1-1), mouse (identifier: Q8R0I0-1), hamster (identifier: (A0A1U7QTA1-1), and from the NCBI protein database for mink (identifier: QPL12211.1). The alignment was visualized in Jalview 19 and colored according to sequence identity. The similarity scores for the entire protein sequence and the spike receptor binding domain motif were calculated in MOE 21 . Residues in ACE2 within The F486L mutation has a substitution of leucine for phenylalanine occurring within the RBM on the SARS-CoV-2 spike glycoprotein ( figure 1 A, B) . These amino acids are similar in physiochemical properties, with an aromatic ring being replaced by an aliphatic chain. This new variant in F486L, conserved across SARS-CoV-1 and SARS-CoV-2, was first seen via the strain RaTG13 in a Rhinolophus affinis bat sample collected in Yunnan, China during 2013. In 2017, this variant was also found in Manis javanica, a species of pangolin. F486L was not present in human sequences from the dataset at the start of the pandemic, but started to appear in minks at the end of April 2020. We found 125 sequences from mink samples with this mutation collected in the Netherlands since that time. A sample submission date places the first known potential transfer back from minks to humans in August 2020, also in the Netherlands. Although sample collection dates are unavailable for these human samples, submission dates show a larger number of human samples with this variant were reported in the Netherlands in October and November 2020. One human sample in Scotland, collected in October 2020 shows that F486L may be viable alone and can occur de novo, without evidence of zoonotic transfer. Based on mutations seen with F486L, L452M and Q314K, and considering the potential for sequencing error, this case is likely not linked to the Netherlands sequences (table 1) . The L452M and Q314K variants were almost always observed to appear concurrently with The six mutations we studied result from a Single Nucleotide Polymorphism (SNP), and are among those with the least potential consequence on the stability (kcal/mol, supplementary The presence of F486L paired with either L452M or Q314K in humans and in minks indicates two separate transfer events between the species. These mutation pairs were observed in the Netherlands, but not elsewhere (table 1) . The mutation F486L did not make a correlated jump from minks to humans until L452M, Q314K, or N501T was simultaneously present as a second mutation ( figure 3) . This illustrates the possibility that multiple mutations are required to preserve fitness and facilitate inter-species transfer, particularly in relation to the host's ACE2 protein. Mutations in viruses have been described previously to occur in pairings for functional purposes 33 and furthermore have been shown to have evolutionary relationships involving pairs of variants 34 . In Denmark, Y453F showed a pattern potentially arising from transmission to humans, then to minks, and back to humans again. Thousands of miles away in the US, isolated incidences of F486L, N501T, and V367F present evidence for the same type of transfer event, in the same species of mink. These data support the interpretation that paired mutations facilitate, or are required, for zoonotic transfer. Although some submitted human sequences from the Netherlands do not have recorded collection dates, the submission timeline supports that the spread and transfer from minks to humans is occurring, as suggested by the submitters of the sequences, 35 within the GISAID database 6,7 . Current antibody-based therapeutics are focused on antibody binding to the SARS-CoV-2 spike protein RBD 36 . Similarly, spike protein antibody response-based vaccines may be dependent on the stability of the primary amino acid sequence in the RBD to maintain their ability to generate neutralizing antibody responses 37 . It is therefore critical to understand the extent to which SARS-CoV-2 mutations are occurring in regions targeted by antibodies. The extent to which mutations in the RBD could have a beneficial or deleterious effect on viral fitness, on RBD binding affinity to ACE2, and/or on infectivity is also not known. The mutation N501T does not appear to be spreading rapidly and may be showing decreased fitness in J o u r n a l P r e -p r o o f Journal Pre-proof humans. Y453F however, now present in 629 human samples from Denmark beginning in June 2020, may be conferring increased viral fitness, potentially facilitating its spread into human populations. Our calculations also suggest that the N501T variant decreases protein stability and affinity for hACE2, but that the change is minimal. The pairing of certain mutations should be tested in vitro in the future. A single mutation could have a combinatory effect when paired with another, producing a completely different effect. The data suggest that two mutations together may be required for bi-directional zoonotic transfer to occur. While the number of cases with these pairings is not growing exponentially around the world, a third mutation could occur, further changing binding affinity and stability characteristics. In Denmark, 629 human samples and 85 mink samples have shown the presence of Y453F. As this mutation may escape antibody neutralization, 38 the emergence of this variant may increase resistance to monoclonal antibody therapy or convalescent sera therapy. The evidence for SARS-CoV-2 zoonotic transfer using mutation analysis of human and other associate species that can carry the virus has helped identify how variants can arise. As there are multiple species that can be infected by SARS-CoV-2 to assist in this mutation potential, 39 vigilant continual sequencing of the virus in the coming months and years is of high importance. Several species known to be infected by this virus have not been sequenced to the same extent, and although a large number of viruses isolated from minks were sequenced in the Netherlands and Denmark, as of this writing mink sequences from the United States are not publicly available. Tracking viral mutations isolated from minks and other farm animals that might later appear in humans is important. In Oregon for example, the location of the mink farm with SARS-CoV-2 outbreak has not been disclosed 40 Humans and minks have as much as 92% amino acid sequence similarity between their respective ACE2 receptor proteins (supplementary figure 1 & table 5 42, 43 . Based upon changes in affinity calculations when the human residues are substituted for mouse ACE2 residues, the affinity of ACE2 for spike decreases in most cases (supplemental table 5). However, when mink residues are substituted for human ACE2 residues, the affinity is minimally affected (except for G354H; supplemental table 5). This may explain why mice do not get infected by SARS-CoV-2 whereas the virus thrives in human and mink hosts. Indeed, a novel artificial intelligence algorithm has shown that minks, along with bats, could be a reservoir of SARS-CoV-2 44 and samples from cats, dogs, ferrets, hamsters, primates, and tree shrews demonstrate that all of these species have been infected with SARS-CoV-2 45 . Multiple factors contribute to zoonotic transfer including ACE2 expression level and close contact with the same or other species and therefore viral sequences of animals that may be in contact with humans is of great importance. The consequence of observed mutations can be analyzed in silico, at least initially. Stability A Case for the Ancient Origin of Coronaviruses Profile of a killer: the complex biology powering the coronavirus pandemic Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Neutralising antibodies drive Spike mediated SARS-CoV-2 evasion Geographic and Genomic Distribution of SARS-CoV-2 Mutations Emerging genetic diversity among clinical isolates of SARS-CoV-2: Lessons for today. Infection Natural variants in SARS-CoV-2 S protein pinpoint structural and functional hotspots: implications for prophylaxis and therapeutic strategies Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structural basis of receptor recognition by SARS-CoV-2 Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability A pneumonia outbreak associated with a new coronavirus of probable bat origin Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing Analysis of multimerization of the SARS coronavirus nucleocapsid protein Jalview Version 2-a multiple sequence alignment editor and analysis workbench SARS-CoV-2 Spike Glycoprotein with 3 ACE2 Bound. (Worldwide Protein Data Bank PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels SIFT: predicting amino acid changes that affect protein function IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era UniProt: a worldwide hub of protein knowledge Mink Farm Tests Positive with SARS-CoV-2 Thousands of Minks Dead as COVID Outbreak Escalates on Utah Farms More Than 3K Mink Dead From Coronavirus At Taylor County Mink Farm Two Taylor County mink farms under quarantine after more than 5,000 animals died from COVID-19 Fourth state confirms mink farm coronavirus outbreaks as U.S. looks to avoid Denmark's disaster Agricultural Statistics Board, United States Department of Agriculture (USDA) New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus Protein 3D Structure Computed from Evolutionary Sequence Variation Jumping back and forth: anthropozoonotic and zoonotic transmission of SARS-CoV-2 on mink farms A noncompeting pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2 SARS-CoV-2 immunity: review and applications to phase 3 vaccine candidates COVID mink analysis shows mutations are not dangerous -yet SARS-CoV-2 jumping the species barrier: Zoonotic lessons from SARS, MERS and recent advances to combat this pandemic virus Brown, agencies condemned for secrecy surrounding Oregon COVID-19 mink farm outbreak COVID-19 outbreak declared at mink farm in B.C.'s Fraser Valley Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates Comparative ACE2 variation and primate COVID-19 risk Host and infectivity prediction of Wuhan 2019 novel coronavirus using deep learning algorithm CDC. COVID-19 and Animals An Approach for a Synthetic CTL Vaccine Design against Zika Flavivirus Using Class I and Class II Epitopes Identified by Computer Modeling Viral-Induced Enhanced Disease Illness Recent Advances in the Vaccine Development Against Middle East Respiratory Syndrome-Coronavirus High-resolution mapping and characterization of epitopes in COVID-19 patients The authors would like to thank scientists throughout the world that provided the SARS-CoV-2 sequences rapidly for on-going analyses during this pandemic. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The Stanford team recognizes the intellectual support of the SPARK At Stanford Program. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.