key: cord-0822122-47wpgkxw
authors: Gelbart, Maoz; Stern, Adi
title: Site-Specific Evolutionary Rate Shifts in HIV-1 and SIV
date: 2020-11-16
journal: Viruses
DOI: 10.3390/v12111312
sha: 958ca1ec5cfe70e608fc9d79b25852c111a4924f
doc_id: 822122
cord_uid: 47wpgkxw

Site-specific evolutionary rate shifts are defined as protein sites, where the rate of substitution has changed dramatically across the phylogeny. With respect to a given clade, sites may either undergo a rate acceleration or a rate deceleration, reflecting a site that was conserved and became variable, or vice-versa, respectively. Sites displaying such a dramatic evolutionary change may point to a loss or gain of function at the protein site, reflecting adaptation, or they may indicate epistatic interactions among sites. Here, we analyzed full genomes of HIV and SIV-1 and identified 271 rate-shifting sites along the HIV-1/SIV phylogeny. The majority of rate shifts occurred at long branches, often corresponding to cross-species transmission branches. We noted that in most proteins, the number of rate accelerations and decelerations was equal, and we suggest that this reflects epistatic interactions among sites. However, several accessory proteins were enriched for either accelerations or decelerations, and we suggest that this may be a signature of adaptation to new hosts. Interestingly, the non-pandemic HIV-1 group O clade exhibited a substantially higher number of rate-shift events than the pandemic group M clade. We propose that this may be a reflection of the height of the species barrier between gorillas and humans versus chimpanzees and humans. Our results provide a genome-wide view of the constraints operating on proteins of HIV-1 and SIV.

The human immunodeficiency viruses HIV-1 and HIV-2 are the causative agents of AIDS in humans, infecting millions of people worldwide. Both viruses emerged from a clade of lentiviruses known as the simian immunodeficiency viruses (SIV), which naturally infect a variety of non-human primate species. HIV in humans arose from several independent transmission events of primate SIVs that resulted in HIV-1 groups M and N (from SIV infecting chimpanzees, SIVcpz), HIV-1 groups O and P (from SIV infecting gorillas, SIVgor), and HIV-2 groups A through H (from SIV naturally infecting sooty mangabeys, SIVsmm) [1] [2] [3] . The gorilla-infecting lentivirus, SIVgor, is itself a result of a transmission of SIVcpz from chimpanzees to gorillas [4, 5] . Phylogenetic analyses date the most common recent ancestors of HIV groups M and O to the beginning of the 20th century, making it a relatively new human pathogen [6, 7] . Similar analyses of SIVgor date the most recent common ancestor of this virus in gorilla somewhere in the 19th century [4] . SIVcpz itself was found to be a transmission from other primates, leading to two lineages of SIVcpz: SIVcpzptt infecting the chimpanzee subspecies Pan troglodytes troglodytes of central Africa, and SIVcpzpts infecting the Pan troglodytes schweinfurthii chimpanzee subspecies of eastern Africa [5, [8] [9] [10] [11] [12] .

How viruses are able to cross the species barrier is a subject of much interest, since many pandemic human viruses arose from zoonosis events: the influenza strain of Spanish flu H1N1, Ebola Virus,

In order to collect sequences for this study, the Los Alamos HIV sequence database (available online at http://www.hiv.lanl.gov [36] , retrieved at 17 April 2016) was queried for HIV-1 sequences from the same strain that spanned all nine HIV-1 open reading frames (ORFs: gag, pol, vif, vpr, tat, rev, vpu, env, and nef ) and for SIVcpz and SIVgor strains that spanned the corresponding ORFs (with the exception of vpx). This led to 2004 sequences of HIV-1 group M, 45 sequences of HIV-1 group O, 9 sequences of HIV-1 group N, 2 sequences of HIV-1 group P, 4 sequences of SIVgor, and 29 sequences of SIVcpz. The final study dataset was constructed with an equal number of group M and group O sequences (N = 45), by sampling sequences so that the n most distant strains (in terms of genetic distance) were sampled. Due to extremely high similarity, HIV-1 groups N and P sequences were reduced to a single representative strain from each. The IIIB_LAI strain was added manually as a reference sequence.

Initial MSA's of the nine proteins were performed using PRANK and iteratively improved until convergence [37] . In order to reconstruct of the phylogenetic relationship between the sequences, we concatenated the alignments of Gag, Pol, Vif, Vpr, Tat, and Env and provided this as input for PhyML ( [38] , File S1). We next used the reconstructed phylogeny as a guide tree to realign each ORF with PRANK (File S1). JpHMM was used to validate that the strains used in the analysis are not inter-group recombinants [39] .

In order to identify evolutionary rate shifts, we used RASER [25] to analyze each of the nine proteins separately, with the proteome-based phylogeny as input. RASER is a likelihood-based phylogenetic method for detecting a change in the substitution rates in each site of a given protein, along all possible branches in the phylogeny. First, a likelihood ratio test against a null model of no rate shifts is performed in order to assess if a model enabling rate shifts better fits the data. Next, the posterior probability of rate shift is calculated at each site and sites with a posterior probability higher than 0.6 are reported here. Finally, for each such site, the method lists the lineages where the rate shift occurred with the highest probability and further categorizes each sites as undergoing either a rate deceleration or a rate acceleration. All sites are reported in File S2. Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rate-shift events at each branch, divided by protein size (Table 1). Another table containing the same  rate shifts as percentage from total rate shifts at each branch, without taking into account the protein  size, is present as Table S2 . Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1). Another table containing the same rate  shifts as percentage from total rate shifts at each branch, without taking into account the protein size,  is present as Table S2 . 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . Table 1 . The proportion of rate shifts (rate decelerations and rate accelerations in upper panel and lower panel, respectively) detected in each of the major branches of the phylogeny, relative to the protein size. The right-most column shows the absolute number of rate shifts detected in the branch. Cell background color intensity increases with the proportion of rate shifts. 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . Table 1 . The proportion of rate shifts (rate decelerations and rate accelerations in upper panel and lower panel, respectively) detected in each of the major branches of the phylogeny, relative to the protein size. The right-most column shows the absolute number of rate shifts detected in the branch. Cell background color intensity increases with the proportion of rate shifts. 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] Rate-shift events along the HIV/SIV phylogeny were constructed as the percentage of total rateshift events at each branch, divided by protein size (Table 1) . Another table containing the same rate shifts as percentage from total rate shifts at each branch, without taking into account the protein size, is present as Table S2 . 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rateshifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] 

We have previously developed a maximum-likelihood-based method called RASER that identifies sites that undergo a significant change in their rate of evolution along a given branch of a phylogenetic tree (see for example Figure 1 ) [25] . RASER is based on two stages: first, it tests if a rate-shifting model better fits the data than a non-rate-shifting model based on a likelihood ratio test. Second, RASER reports the posterior probability that a protein site evolves under to a rate-shifting model and infers the branch in the phylogeny where the rate shift most likely occurred. Sites are said to be "rate decelerating" or "rate accelerating" at a given branch, if their evolutionary rate is, respectively, slower or faster in one clade of the tree as compared to the complementary clade spanned by the branch. For example, given the branch that separates HIV group M from SIVcpz, a site may be conserved (slower evolutionary rate) in the HIV group M clade and variable (faster evolutionary rate) in the rest of the tree. Notably, these terms are always relative: a rate deceleration at one clade can also be interpreted as rate acceleration in the complementary clade. Here, we sought to identify and characterize rate-shifting sites in the HIV/SIV phylogeny. We queried the Los Alamos HIV database [36] for all available HIV-1, SIVcpz and SIVgor full genome sequences. Rate-shift analyses were performed on the nine translated open reading frames of the virus, and likelihood ratio tests were found to be highly significant in favor of a rate-shift model across all HIV proteins (Table S1 ).

RASER identified a total of 271 rate-shifting sites along the HIV-1/SIV phylogeny. One hundred and thirty-seven of them were identified as rate deceleration events, and an additional 134 were identified as rate accelerations. Of the 271 rate-shifting sites, 230 were attributed to four branches:

Viruses 2020, 12, 1312 4 of 11 the branch separating group M from SIVcpz, the branch separating group O from SIVgor, the branch separating SIVgor and group O from SIVcpz, and the branch separating SIVcpzpts from other viruses ( Figure 2B) . Notably, these branches are the longest branches in the phylogeny and at least two of them represent cross-species transmission events. Figure 1 . Projection of Env 458 on the HIV-1/SIV phylogeny. Each leaf (corresponding to an HIV/SIV strain) is color-coded based on the amino acid present at Env 458 . The legend shows the three most prevalent amino acids found at this site. The substitution rate of Env 458 , which interacts with the CD4 receptor, was found to be slower in HIV-1 group M than in the rest of the phylogeny, with the branch separating both clades marked by an arrow.

In order to test for sequence sampling effects on the identified rate-shift patterns, we repeated the analysis with increased amount of group M sequences (n = 183), reduced amount of group O sequences (n = 13), and no SIVgor sequences (File S3). With this dataset, more rate decelerations were identified in the branch leading to group M, but chi-squared tests for differences in group M rate-shift distributions showed no significant difference (p = 0.11 and 0.87 for rate decelerations and accelerations, respectively), indicating that the patterns of rate shifts between the large group M sample and the smaller group M sample are similar.

We calculated the relative contribution of rate-shift events in each protein when controlling for protein size, for each branch corresponding to a virus speciation event ( Figure 2B ). While rate-shift patterns differed between the different lineages, one common theme was that, at all prominent branches, a large fraction of rate-shift events occurred at non-structural proteins, i.e., the regulatory and accessory proteins of the virus (Table 1 and Table S2 ). When comparing rate accelerations with the corresponding rate decelerations, the branch separating SIVcpzpts from the rest of the tree showed a very similar pattern of rate accelerations/decelerations: a very similar number of rate decelerations and rate accelerations were found in each protein (Figure 2A) . A similar pattern was observed in the structural genes: a similar number of sites were inferred as rate accelerations and as rate decelerations in the Gag, Pol, and Env proteins ( Table 1) .

The pattern of equal accelerations and decelerations led us to conjecture that epistatic interactions coupled with genetic drift may be the most likely explanation for the majority of observed rate shifts. This is based on the fact that (a) a priori, we do not expect any adaptation in the branch separating SIVcpzpts from other viruses, and thus this branch serves as a reference for what to expect under neutral or almost neutral evolution and (b) strong epistatic effects have already been reported in Env [35] , which indeed showed large and more or less equal numbers of accelerations and decelerations in most branches (apart from group O). After establishing a pattern that seemed to be most likely neutral evolution, we went on to focus on outliers in our data: specific proteins where the number of rate accelerations and rate decelerations were substantially different. We noted higher proportions of rate decelerations as compared to rate accelerations in the Vpu protein, in the lineages leading to group M and group O, and in the O-Rev protein. Conversely, we noted more rate accelerations in Nef in the lineage leading to group M. Finally, we observed slightly higher proportions of rate decelerations in Vif and accelerations in Vpr in the lineage leading to SIVgor and group O.

We were intrigued by the fact that most of the imbalances found in rate accelerations/decelerations occurred in accessory proteins, which are known to play a role in the response of the virus to the innate immune response, and we survey some of our findings on rate shifts in the context of the interaction between innate immune proteins and HIV proteins.

One such well-characterized interaction is demonstrated by the restriction factor Tetherin, a fast evolving protein that differs between chimpanzees, gorillas, and humans [40, 41] . In humans, a significant deletion in the cytoplasmic tail of Tetherin rendered it invulnerable to the SIV's Nef-based counteraction [21] , and in HIV-1 groups M and N, the Vpu protein adapted to counteract Tetherin through its transmembrane domain [22] . In gorillas, the Nef protein adapted to counteract its host's Tetherin but also maintained its ability to counteract the chimpanzee Tetherin [42, 43] .

Our results agree with the described anti-Tetherin adaptation events: a high proportion of group M rate decelerations were found in the Vpu protein of group M, most of them in the identified Tetherin-binding domain (Table 1 and Figure 3 ). In addition, in HIV-1 group M the Nef protein was identified with three rate accelerations, which could be interpreted as relaxation of the SIV-Nef anti-Tetherin activity. Numerous group O Nef rate acceleration events were found as well, possibly indicating the loss of chimpanzee/gorilla anti-Tetherin function. A preponderance of acceleration and deceleration events in Nef were also found in the lineages leading to the ancestor of SIVgor/group O, and to SIVcpzpts. Thus, our analysis pinpoints sites that may be responsible for the crucial changes in function of Vpu and Nef throughout SIVcpz and HIV evolution (Table S3 ). We move on to examine APOBEC3G (A3G), a broad-range antiviral protein that is counteracted by the Vif proteins of Lentiviruses [45, 46] . A3G variants of human, chimpanzee, and gorilla have been studied, and it has been found that the Vif recognition domain is conserved between human and chimpanzee A3G proteins, but slightly differs in gorillas [47] . Indeed, the SIVgor adaptation event to Viruses 2020, 12, 1312 7 of 11 the different host A3G has been demonstrated [48, 49] . In line with these results, our analysis shows no enrichment of rate decelerations in M-Vif, yet an excess of rate decelerations in the lineage of HIV-1 group O + SIVgor (Tables S2 and S3). O-Rev protein stood out for an excess of rate decelerations in the branch leading to group O (Table 1) . This is intriguing, since Rev is not known to be involved in any antiviral-related host activity. Some but not all of the decelerations identified are in genomic loci where rev rate decelerations overlap with env rate decelerations, and this might explain some of this pattern. It remains to be further investigated why the O-Rev protein experienced so many rate decelerations.

Our results tentatively indicate that accessory proteins may have an important role in enabling adaptation to new species in the case of HIV [12] . This suggests that a major common barrier for a host species jump in HIV/SIV is the stage where the virus must overcome intracellular host defenses. Our results further support a lineage-specific model of human adaptation: groups M and O underwent prominent rate shifts in different proteins. This model is supported by the fact that activity of group M and O proteins are indeed different (e.g., Vpu) [42, 43] , and by the fact that each group originated from an SIV from a different primate.

We were surprised to note that group O had almost twice the amount of rate-shift events as compared to any other lineage analyzed (Table 1) . Notably, group O is a non-pandemic strain that remained localized mainly to infections in west-central Africa [50, 51] . One possible explanation is that group O entered the human population later than group M, and accordingly, more sites in group O appear as conserved, since they had less time to evolve. We consider this explanation as unlikely, since (a) our phylogenetic model takes into account branch lengths and (b) in such a case, we would also expect less rate accelerations in group O than in group M, whereas we find a very large number of rate accelerations as well. Another possible interpretation may be the relative height of the species barrier a strain had to overcome, as chimpanzees from which group M originated are genetically closer to human than gorillas from which group O originated [52, 53] . A species barrier may potentially induce more entrenchment and may also lead to adaptation, as has been described previously [5, 54] .

This study has several limitations. First, the availability of fully sequenced SIVcpz, SIVgor, and non-HIV-M genomes is still low, reducing the statistical power of the analysis. Indeed, we noted that increasing the sample size of group M sequences that are much more available led to the detection of more rate-shifting sites. We expect that the availability of additional SIVgor sequences will increase the number of rate-shift events that are unique in that clade, possibly revealing more rate changing sites for this lineage. Second, the method that we utilized to identify rate shifts is calibrated to identify major changes in the evolutionary rates along a lineage. Accordingly, it cannot detect sites where a "content shift" occurred, i.e., the rates of substitution remained low in two complementary clades, but the amino acid itself has changed. This is partially demonstrated in Gag 30 , a previously found marker of HIV adaptation where the rate of evolution changed mildly, while the amino acid in this site changed between chimpanzee and human viruses ( Figure S1 ) [18] . Accordingly, we have detected this site but were inconclusive regarding the lineage where the rate shift occurred, and thus, it was not reported in this study. In addition, our method could not detect adaptation by amino acid insertion or deletion events, and we did not analyze for rate shifts at the RNA level and thus, for example, did not detect rate shifts at RNA structures of the virus.

We used a phylogenetic modeling approach to perform a large scale analysis of the constraints operating in all proteins of the HIV genome, across different lineages of HIV-1 and SIV. We inferred individual amino acids where rate shifts occurred and characterized the distribution of rate shifts across Viruses 2020, 12, 1312 8 of 11 different proteins and across the tree. We found that, in many cases, the number of rate accelerations and rate decelerations is similar and suggest that epistatic interactions are pervasive in the evolution of HIV-1 and SIV proteins, and that these likely led to many of the inferred rate shifts observed herein. Nevertheless, our results do suggest that at least some of the rate shifts that we detected may be due to selection and may be related to the cross-species transmission of the virus from apes into humans. Our results also implicate several of the accessory proteins of HIV as mediators that allow cross-species transmission events to occur. Our compiled list of the positions suggested as rate shifting can now serve as a basis for future functional studies that will allow a better and deeper understanding of the constraints operating on HIV-1 and SIV proteins.

Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/11/1312/s1. Figure S1 : Residues distribution for Gag30 along the HIV-1/SIV phylogeny. This position exhibits a pattern more similar to "content shift" than to "rate shift", Table S1 : Maximum log-likelihood (LL) values for the analysis of the nine HIV-1/SIVcpz/SIVgor proteins under the rate shift and null models, Table S2 : Rate shifts as percentage from total rate shifts for prominent branches, for rate decelerating sites (upper) and rate accelerating sites (lower), colored by intensity. Data presented in this table is not normalized by protein length, Table S3 : Rate deceleration events that may be related to previously reported species-specific adaptation events, File S1: Phylogenetic tree and sequence alignments used for identifying rate shifting sites. The zip file contains all nine alignments and constructed phylogeny used for identifying rate shifting sites, File S2: HIV/SIV sites identified as rate shifting. Positions are provided in HXB2 reference sequence coordinates. Branch number field refer to the branch number outputted by RASER (Penn et al. 2008 ); for ease of reading we provide branch labels for most branches, File S3: HIV/SIV sites identified as rate shifting when group M contains many more sequenced than group O. Positions are provided in HXB2 reference sequence coordinates. Branch number field refer to the branch number outputted by RASER (Penn et al. 2008 ); for ease of reading we provide branch labels for most branches.

AIDS-AIDS as a zoonosis: Scientific and public health implications

Human immunodeficiency viruses: SIV infection in wild gorillas

Origin of the HIV-1 group O epidemic in western lowland gorillas

Origin and biology of simian immunodeficiency virus in wild-living western gorillas

Modern-day SIV viral diversity generated by extensive recombination and cross-species transmission

Timing the ancestor of the HIV-1 pandemic strains

The molecular population genetics of HIV-1 group O

Dating the Age of the SIV Lineages That Gave Rise to HIV-1 and HIV-2

Hybrid origin of SIV in chimpanzees

Sequence Diversity among Chimpanzee Simian Immunodeficiency Viruses (SIVcpz) Suggests that SIVcpz Pts Was Derived from SIVcpz Ptt through Additional Recombination Events

Eastern chimpanzees, but not bonobos, represent a simian immunodeficiency virus reservoir

Gene loss and adaptation to hominids underlie the ancient origin of HIV-1

The evolution and genetics of virus host shifts

Investigating the zoonotic origin of the West African Ebola epidemic

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Recapitulating cross-species transmission of simian immunodeficiency virus SIVcpz to humans by using humanized BLT mice

Experimental Adaptive Evolution of Simian Immunodeficiency Virus SIVcpz to Pandemic Human Immunodeficiency Virus Type 1 by Using a Humanized Mouse Model

Adaptation of HIV-1 to its human host

Human-Specific Adaptations in Vpu Conferring Anti-tetherin Activity Are Critical for Efficient Early HIV-1 Replication In Vivo

Tetherin inhibits retrovirus release and is antagonized by HIV-1 Vpu

Nef proteins from simian immunodeficiency viruses are tetherin antagonists

Tetherin-driven adaptation of Vpu and Nef function and the evolution of pandemic and nonpandemic HIV-1 strains

The interferon-induced protein BST-2 restricts HIV-1 release and is downregulated from the cell surface by the viral Vpu protein

A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins

Evolutionary modeling of rate shifts reveals specificity determinants in HIV-1 subtypes

Maximum-likelihood phylogenetic analysis under a covarion-like model

an important process of protein evolution

Epistasis as the primary factor in molecular evolution

Epistasis in protein evolution

An epistatic ratchet constrains the direction of glucocorticoid receptor evolution

Amino acid coevolution induces an evolutionary Stokes shift

Contingency and entrenchment in protein evolution under purifying selection

A complex epistatic network limits the mutational reversibility in the influenza hemagglutinin receptor-binding site

Fitness epistasis and constraints on adaptation in a human immunodeficiency virus type 1 protein region

Mapping mutational effects along the evolutionary landscape of HIV envelope

An algorithm for progressive multiple alignment of sequences with insertions

New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0

Improving the reliability of recombination prediction in HIV-1

Species-specific activity of HIV-1 Vpu and positive selection of tetherin transmembrane domain variants

BST-2/tetherin: A new component of the innate immune response to enveloped viruses

Ancient adaptive evolution of tetherin shaped the functions of Vpu and Nef in human immunodeficiency virus and primate lentiviruses

Lack of adaptation to human tetherin in HIV-1 group O and P

Domain assembly, surface accessibility and sequence conservation in full length HIV-1 Nef

Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts

The antiretroviral enzyme APOBEC3G is degraded by the proteasome in response to HIV-1 Vif

Distinct Domains within APOBEC3G and APOBEC3F Interact with Separate Regions of Human Immunodeficiency Virus Type 1 Vif

Convergence and divergence in the evolution of the APOBEC3G-Vif interaction reveal ancient origins of simian immunodeficiency viruses

Vif Proteins from Diverse Primate Lentiviral Lineages Use the Same Binding Site in APOBEC3G

Geographical distribution of HIV-1 group O viruses in Africa

The Two-Phase Emergence of Non Pandemic HIV-1 Group O in Cameroon

Dating of the human-ape splitting by a molecular clock of mitochondrial DNA

Molecular phylogeny of the hominoids: Inferences from multiple independent DNA sequence data sets

Preferential host switching by primate lentiviruses can account for phylogenetic similarity with the primate phylogeny

The authors would like to thank Osnat Penn, Eran Bacharach, and Tal Pupko for their contribution to the inception of this study, and Matthias Geyer for providing the Nef protein model structure.

The authors declare no conflict of interest.