key: cord-262585-5vjqrnwh
authors: Hraber, Peter; O’Maille, Paul E.; Silberfarb, Andrew; Davis-Anderson, Katie; Generous, Nicholas; McMahon, Benjamin H.; Fair, Jeanne M.
title: Resources to Discover and Use Short Linear Motifs in Viral Proteins
date: 2019-08-16
journal: Trends Biotechnol
DOI: 10.1016/j.tibtech.2019.07.004
sha: 
doc_id: 262585
cord_uid: 5vjqrnwh

Viral proteins evade host immune function by molecular mimicry, often achieved by short linear motifs (SLiMs) of three to ten consecutive amino acids (AAs). Motif mimicry tolerates mutations, evolves quickly to modify interactions with the host, and enables modular interactions with protein complexes. Host cells cannot easily coordinate changes to conserved motif recognition and binding interfaces under selective pressure to maintain critical signaling pathways. SLiMs offer potential for use in synthetic biology, such as better immunogens and therapies, but may also present biosecurity challenges. We survey viral uses of SLiMs to mimic host proteins, and information resources available for motif discovery. As the number of examples continues to grow, knowledge management tools are essential to help organize and compare new findings.

Viruses exploit host cellular processes to replicate, and have developed myriad ways to subvert host immune defenses. Molecular mimicry (see Glossary) is a common and effective strategy, enabling a pathogen to usurp host protein function by resemblance [1, 2] . Molecular mimicry varies over a continuum, from one extreme that includes sequence and structural similarity (i.e., orthologs) of entire proteins, to another extreme of chemical similarity at only a few localized sites, as is the case for short linear motifs (SLiMs). The growing body of literature on SLiMs indicates that some important virushost interactions can be attributed to a few well-chosen AAs [3] [4] [5] [6] [7] . Rather than devote entire proteins to one function, SLiMs enable multifunctional viral proteins. Interactions between globular virus and host proteins have picomolar affinities, while SLiMs have micromolar binding affinities with globular host proteins [8] . Moderate binding affinity of SLiMs facilitates disruption of signaling interactions, rather than competing for stable formation of persistent protein complexes. Synthetic biology practitioners can benefit from an introduction to how SLiMs enable viral interference with host cell functions and computational resources available for SLiM analysis. Viral SLiMs are potentially useful in synthetic biology, to provide a toolkit for new functions, for example, to modulate immune responses or to complement and interact with newly developed adjuvants in a synergistic manner [9] . Research efforts to develop broad-spectrum antiviral compounds or design broadly cross-protective vaccine immunogens benefit directly from knowledge of gene products, protein functions, and motifs involved with viral immune interference. N-linked glycosylation of the PNG sequon is a well-known example used by viral glycoproteins as camouflage against immune recognition [10, 11] . The distribution of N-linked glycosylation sites has recently been recognized as essential for the design of immunogens to induce broadly cross-reactive immune protection against such challenging viruses as HIV-1 [12, 13] . Motifs associated with cellular trafficking (localization, transport, secretion, and sequestration) are readily edited to modify where expression products go, and change interaction profiles with other proteins [14] . In addition to motifs that stabilize the structure of immunogens, such as trimerization ('foldon') [15] [16] [17] [18] and dimerization [19, 20] domains, motifs that interact with cellular processes for innate antiviral pathways could be used to enhance immunogenicity. While SLiMs in eukaryotic proteins have been discussed extensively, SLiM involvement in viral immunomodulation remains less thoroughly explored, and suggests new opportunities for use in engineered biotechnology applications.

The ability to transfer genetic components across species, or to introduce such components de novo, enables new functions. While such functions are generally well intended, some risk also exists for harmful effects. Subject to the technical advances of synthetic biology, such effects are not Short linear motifs (SLiMs) are patterns of three to ten consecutive AAs used by eukaryotic cells for tasks that include: signaling, localization, degradation, and proteolytic cleavage.

Viruses use SLiMs to their advantage, including interference with antiviral innate immune pathways.

Viral SLiMs can tolerate mutations, evolve quickly to modify host interactions, and co-occur in a modular manner or involve multiprotein complexes.

SLiMs are useful in synthetic biology, where minor edits can alter target specificity, modulate persistence, reprogram interactions with cell-signaling domains, and alter protein function in myriad other ways.

Aside from possible beneficial uses, for example, to produce better immunogens and develop therapeutic interventions against infectious disease, SLiMs may help characterize new and emerging threats to global health. necessarily a taxonomically relevant property. It may be necessary to evaluate risks of new functions by other means than taxonomy or even protein functional evaluations. Instead, new methods are needed that assess functions at a finer resolution than the gene, whether by computational analysis or functional phenotypic assessments [21] [22] [23] . SLiM analysis might help with such assessments.

Viral proteins can modulate immunity in several ways, which include: shutdown of host macromolecular synthesis, inhibiting antigen production or apoptosis, and interference with such processes as antigen presentation by MHC, natural killer (NK) cell function, antiviral cytokines, or interferon responses. Each of these processes involves coordination among multiple components in host cells. Viral interference with these functions is frequently attributed to entire proteins, but in some important cases has been localized to SLiMs.

Because of their compact size, SLiMs are modular, rapidly evolvable sequence elements. Different instances of a given SLiM can vary in sequence while maintaining the overall functional profile, that is, the regular expression for the sequence motif, where a few positions are invariant while other positions tolerate numerous substitutions. Thus, partial sequence matches are sufficient for transient binding interactions with target domains, for example, signal transduction proteins. This observation led to the proposal of ex nihilo SLiM evolution -the evolution of a novel SLiM 'from nothing' -the appearance of a new functional module from a previously nonfunctional region of protein sequence [24] . Because hosts' interaction networks are often conserved, SLiMs represent a significant vulnerability for opportunistic exploitation. These properties enable pathogens to acquire host-like SLiMs rapidly through ex nihilo convergent evolution, to rewire host interaction networks, and to acquire tropism and virulence traits needed for successful adaptation and propagation [25] . Over 200 motifs are known, with 2,400 validated instances, and many more motifs may await discovery [3, 26] . Focus on viral motifs may reveal practical utility, to broaden the repertoire of tools available to reprogram molecular function in synthetic biology.

One example of how viral proteins use SLiMs to subvert host cell function is illustrated by Epstein-Barr virus (EBV), which persists in resting memory B cells of nearly all (>95%) individuals throughout their adult lives [5] . Latent membrane protein 1 (LMP1) is central to EBV persistence. The cytoplasmic tail of this membrane-bound protein includes PxxPxP and PxQxT motifs that recruit signaling proteins (JAK3, that is, Janus kinase 3, and several TRAFs, tumor necrosis factor receptor-associated factors, respectively) [5] . Together, the motifs mimic the cytoplasmic domain of CD40 to activate nuclear factor-kB via intermediates, including a third motif YYD$ (where $ denotes the C terminus), the TRADD (tumor necrosis factor receptor-associated death domain) binding domain. The overall result is that LMP1 inhibits apoptosis and infected B cell proliferation, to confer viral persistence [5] . Other examples of viral SLiM contributions to motif mimicry involved with immune function include: protein degradation, transcription, translation, and transport into and out of the nucleus [5] .

Given the continued growth of this field [27] , established frameworks can manage and exploit this knowledge beyond catalogues of currently known motifs [5, 7, 28] or details on contributions of one viral protein (e.g., [14, 29] ). Work to use SLiMs in bioengineering can benefit from understanding viral protein function. This information is organized in viral knowledge bases, such as ViralZone (Box 1). Ontologies describe systematically the many different functional roles of viral proteins, including immune evasion. By promoting use of standard terms for relationships between concepts, an ontology arranges concepts into a framework that can be updated as knowledge grows. Protein function is captured broadly in such a framework, though the nuanced details of interactions with other molecules are not localized to domains or motifs.

GO is an authoritative resource for annotating functions of gene sequences [30, 31] . An example of interest is 'evasion or tolerance by virus of host immune response' [32] (www.ebi.ac.uk/QuickGO/ Glossary Adjuvant: an additive to a vaccine that promotes nonspecific immune responses. When administered together with an antigen, it induces more potent responses than the antigen alone. Autophagy: an evolutionarily conserved degradation system for maintaining cellular homeostasis and innate immunity to clear pathogens from cells. Domain: structure-based modular subunit of a protein, often with a specific function. Domains are generally larger and associated with structural subunits, while motifs are shorter and associated with intrinsically disordered protein regions. ELM: the eukaryotic linear motif resource (elm.eu.org), a repository of known SLiMs; includes annotation from primary literature and information about experimental assays used. Glycosylation: post-translational modification by host transferases linking sugar molecules to side chains of either nitrogen (Nlinked) in asparagine or oxygen (O-linked) in serine or threonine. Viral proteins like HIV-1 envelope can be heavily glycosylated, providing camouflage against immune recognition, shifting as the PNG sequon mutates. GO: Gene Ontology, an ontology for genetic products of any and all organisms, providing a framework to annotate gene and protein functions in genetic sequence databases and bioinformatics analysis procedures. GO includes multiple dimensions to capture biological complexity in adequate depth: molecular function, cellular component, and biological process. GO development is conducted by a consortium of research communities and databases, which regularly solicits input and feedback from the broader research community. Immune modulation: interference with an immune-related process by a pathogen. Intrinsically disordered protein/ domain: regions of a protein sequence that are predicted or experimentally shown not to form consistent structure, e.g., a helices or b sheets. Such regions tend to be more accessible for interactions with other proteins. term/GO:0030683, Figure 1 ). Concepts are hierarchically organized, and include a definition, synonyms, and lists of parents and children. Functional annotation in GO reflects the diverse effects of viral proteins on immune interference.

Modulating autophagy is an example of recent advances in this research area [33, 34] . A growing number of reports describe how virus proteins and SLiMs therein modulate autophagy to promote various aspects of their life cycle [34, 35] . Both GO and ViralZone have developed concepts to detail Molecular mimicry: structural similarity that enables repurposing or hijacking of molecular function by pathogens, such as viruses. Molecular mimicry varies in extent from entire globular proteins, to localized domains, down to short linear motifs. Motif class: a regular expression that summarizes known variants to define a sequence motif. Motif instance: a particular motif as found in a protein or translated genetic sequence from a specific organism, strain, or isolate. Ontology: a representation of domain-specific knowledge organized as concepts, their properties, and their relationships. PNG sequon: three AA motif N [^P] [ST] , that is, asparagine, then any AA except proline, then serine or threonine, recognized by glycosyltransferase as a potential Nlinked glycosylation site. Regular expression: a string of characters that concisely represents many alternative sequence variants; may include wildcards to represent any character, groupings of possible characters, repetition, negation, start and end of sequence, etc. Short linear motif (SLiM): also known as MiniMotifs or MoRFs (molecular recognition features). Frequently represented as regular expressions, typically three to te-nAAs long. ViralZone: a knowledge base (https://viralzone.expasy.org) that documents viral families, genome architectures, proteins, hostprotein interactions, and an ontology for the functions of viral proteins.

A bridge that links literature reports to GO term annotation, ViralZone is an online knowledge base that contains 'textbook' information about viral taxonomy, replication, genome organization, and virion structure, and provides links to viral sequence data [62, 63] . Importantly, ViralZone staff collaborate with the GO Consortium to define entries for virus-specific molecular functions [63] [64] [65] [66] . ViralZone cross-references its keywords with GO Consortium terms and UniProt [67] identifiers. This makes it possible to search for viral proteins by their functional role.

ViralZone staff have developed GO concepts specifically for viruses, to represent the diversity of viral replication and processes involved with viral entry, replication, and egress [66] . ViralZone staff have also developed a detailed listing of virus-host interactions, with entries for 65 functions and 57 GO terms [64, 65] . Each entry ('keyword') has a unique identifier. Unlike Enzyme Commission (EC) numbers [68] , ViralZone IDs are arbitrary numbers and do not indicate position in the concept hierarchy. Instead, organization of the keyword hierarchy is provided online. The web address https://viralzone.expasy.org/886 is an entry point into the ViralZone concept space ( Figure I) . Blue text indicates a link to more detail. Shown is the vertebrate host-virus interactions page [64] . Also available are summaries for invertebrate, plant, and bacterial host-virus interactions. Adapted from www.ebi.ac.uk/QuickGO/term/GO:0030683 [32] . Most arrows indicate 'is a' relations, where the hierarchy is refined by specialization. Blue arrows indicate 'part of' relations, which relate to the symbiont process as parts to a whole. autophagy processes, including positive and negative regulation of xenophagy, the selective autophagy of pathogens [33] [34] [35] .

Understanding the functional roles of SLiMs can help identify related mechanisms or processes, or possibly identify knowledge gaps where SLiMs may be posited but not yet identified. An overview of SLiM functions may also help to prioritize which are of greatest potential for use or abuse when artificially added to modify protein function. Databases and discovery tools are also useful to identify known and new SLiMs.

Identification of shared structural features across divergent protein families led to analysis and identification of modular protein domains. Protein domains are used to categorize protein function, and the InterPro database [36] (www.ebi.ac.uk/interpro) aggregates information at this within-protein level. The identification of protein domains led to recognition of SLiMs as compact, small-scale functional modules [37] .

ELM is a database of eukaryotic motifs (Box 2), though its representation of viral-host interactions is not fully developed. At present, 264 ELM entries map to 648 GO terms. Most motifs map to multiple GO terms; the median is seven GO terms per motif and the maximum is 29 (MOD_Plk_1). Immuneassociated function of the LIG_IRF3_LxIS_1 motif is involved in signal transduction responses to pathogen-associated molecular patterns; this motif maps to 25 GO terms, but none occur in the ViralZone vocabulary. In total, only three ELM entries utilize GO terms from ViralZone: LIG_BH_BH3_1, LIG_HCF-1_HBM_1, and LIG_Rb_pABgroove_1. These three motifs all map to the most general (ii) DEG, degradation sites, part of polyubiquitination;

(iii) DOC, docking sites, involved in protein recruitment but not directly targeted by an active site;

(iv) LIG, ligand binding sites, primarily for protein-protein interactions;

(v) MOD, post-translational modification sites; and (vi) TRG, targeting sites for subcellular localization.

ELM has also spun-off several specialized databases: phospho.ELM for phosphorylation sites with experimental evidence [71] , switches.ELM for conditional molecular switches, such as requiring that a site be modified [72] , and iELM, with an emphasis on protein-protein interactions [73] . A detailed tutorial provides orientation for ELM use [74] .

ELM documents each motif class with a concise description of its function. For example, one type of nuclear localization signal (NLS), TRG_NLS_Bipartite_1 the 'classic bipartite NLS', which binds to importin-a for nuclear pore transfer and is utilized by the PB2 protein of influenza A, is documented here: elm.eu.org/elms/ TRG_NLS_Bipartite_1. The abstract and functional site descriptions summarize what is known about the motif. GO term, GO:0019048 ('modulation by virus of host morphology or physiology', a synonym of 'virushost interaction'). This underscores the prevalent mode of ELM motif discovery and annotation does not emphasize host-virus interactions, but rather systems-level interactions within eukaryotic cells. Thus, better integration of ViralZone-GO-term vocabulary with ELM or another domain-level representation of viral SLiMs is needed to promote potential utility for biotechnology.

Despite not using the ViralZone ontology, ELM documents other motif classes with GO terms that refer to 'viral', 'virus', 'immune', or 'immunity' (Table 1) . Because the focus is motif function in the host cell context, ELM does not directly indicate how viral immune interference results. Further, the relative lack of viral motifs in ELM does not indicate their absence in vivo, but rather the evidence-based requirement for ELM inclusion.

Related to the earlier observation, a review of how viruses use SLiMs to interfere with host cells [5] lists 52 examples that represent viral mimicry of host SLiMs (Table 2 ). Only 70% of these have corresponding ELM entries, though the SLiMs are known. The remaining 30% indicate that ELM does not fully capture all known viral motifs. This strong requirement by ELM for evidence-based motif classes and instances is not strictly a drawback. Indeed, the ELM creators are very aware that computational analysis alone is error prone and can yield misleading outcomes. In [38] , they discuss this issue in depth, and recommend a workflow for SLiM discovery that culminates in experimental validation, whether in vivo or in vitro. Working with viral-host systems adds layers of difficulty to experimental motif validation, so it should not be surprising or to the detriment of available information resources that viral SLiMs are less thoroughly documented.

Searching arbitrary sequences for motif instances is computationally straightforward. Box 3 provides an example of motif searching with ELM, which might facilitate comparative analysis of two related proteins from different species of human herpesvirus (HSV). Resources such as InterPro and UniProt are able to perform similar assessments, but give broader, domain-level representations with less functional detail than the SLiM searches enabled by ELM. Reports in the primary literature take a different approach, by marking SLiMs in a protein alignment, which includes orthologues to mark conservation (e.g., Figure 3 in [39] ). The ELM-generated report combines predicted SLiMs with information from annotated domains and local disorder predictions, for a perspective that complements the other approaches.

Clearly, false positives are inevitable among SLiM search results. This makes it necessary to filter for the most significant and informative outcomes. This leads to consideration of in silico (computational) methods for SLiM evaluation. A highly recommended, authoritative review of SLiM discovery techniques, from an author of the SLiMSuite software package, discusses motif identification techniques in depth [40] .

Methods for SLiM discovery can be divided into two broad classes: (i) de novo discovery of new SLiMs, and (ii) instance prediction to find new occurrences of known SLiMs. There are currently at least eight software packages available to discover new SLiMs and 40 packages (25 stand-alone programs or servers and two software suites that consist of multiple tools: SLiMSuite [40] , which includes ten utilities, and MEME [41] , which consists of five tools for SLiM instance detection). Though MEME was developed for discovery of DNA sequence motifs, it generates ungapped, profile-based motifs using the expectation-maximization (EM) algorithm. No single method is inherently better than the rest, but the choice of which to use depends on several factors, such as the input sequence data, whether one sequence, an alignment, or a collection of nonhomologous sequences [40] . To illustrate the diversity of motif discovery methods, this section mentions only a handful of the software tools available (Table 3) . Readers seeking to learn more about the full set of alternatives are strongly encouraged to consult [40] , particularly Tables 1, 4, and 5 therein. Another helpful resource ( Table 1 in [38] ) lists online motif discovery bioinformatics services.

Several alternative approaches for discovery of new motifs have been advanced. Edwards and Palopoli [40] review the alternatives in depth, discussing their merits and drawbacks. Briefly, they can be divided into alignment-based and alignment-free methods. An alignment-based approach looks for conserved sites among homologous sequences, but can be misled by high sequence conservation in globular domains. A program called SLiMPrints works around this with a specialized approach to model substitutions [42] . SLiMPrints uses a statistical model of relative local conservation, which looks for clusters of overly constrained sites in a window of about 30 AAs, using IUPred scores (intrinsically unordered prediction; see later) to weigh sites in intrinsically disordered protein regions more heavily than sites in globular (ordered) regimes [42] .

In contrast, alignment-free methods look for enrichment of amino acid patterns in proteins that are expected by other means to perform similar motif-related roles, for example, by GO category annotations or protein-protein interaction (PPI) data, that is, via databases that capture experimental evidence for protein colocalization and functional interactions. An important caveat is that to assume such sequences are independent could yield spurious enrichment of shared patterns, so alignment-free methods need to compensate for evolutionary constraints at the domain level, rather than for full-length homologous proteins. The development of such corrections and their relative advantages are detailed in [40] . Some programs (e.g., SLiMDisc [43] , SLiMFinder [44, 45] , and DILIMOT [46, 47] ) produce regular expressions that compensate for phylogenetic relatedness, while others (MEME suite, GLAM2 [48] , and NestedMICA [49, 50] ) produce probabilistic profiles. For more discussion of these and issues of concern for computational motif discovery, see [40] .

Filtering methods control high false positive rates from SLiM instance detection. Structural information, whether known or predicted, can be used for filtering. Box 3 illustrates how ELM filters results to

HSV-1 and HSV-2 virulence factor ICP34.5 assists in viral immune evasion by molecular mimicry. HSV-1 neurovirulence protein ICP34.5, encoded by the g34.5 gene, initiates immune interference by binding and sequestering cellular proteins that would stimulate autophagy, translational arrest, and type I interferon responses. HSV-1 ICP34.5 binds TANK-binding kinase 1 (TBK1) to prevent type I interferon induction [75] , Beclin-1 to prevent autophagy [76] , and both PP1a and eIF2a to overcome translational arrest [77] . HSV-2 g34.5 contains an intron not present in HSV-1, and up to four isoforms of HSV-2 ICP34.5 are known [78] . Full-length HSV-2 ICP34.5 has conserved PP1a and eIF2a-binding domains, but lacks TBK1 and Beclin-1 binding domains [79] . Additional HSV-1 motifs influence intracellular localization [80] , virion maturation, and egress [81] , not yet characterized in HSV-2. HSV-2 is recognized as more virulent than HSV-1, but both can cause neuropathology, including viral encephalitis and meningitis [82] . To attenuate virulence, ICP34.5 is routinely deleted or inactivated when making HSV-1 constructs for oncolytic therapy [82] . Both are the same length and share domain structures, and partially share SLiM compositions ( Figure I ). Identifying differences in SLiMs from each could provide clues for more detailed experimental investigations to understand ICP34.5 virulence determinants and host protein targets. exclude a region predicted to fold as globular protein. These predictions were made by SMART (simple modular architecture research tool) [51] and Pfam [52] domain matches, corroborated by GlobPlot [53] . Another widely used approach is to identify regions of local disorder, where protein structure is not clearly defined, making that region accessible to interact with other proteins. IUPred [54] is commonly used for this task, though the choice of parameter settings and how to interpret results varies. ELM results include an IUPred disorder score and a simple cutoff of 0.5 to define the disorder transition. Above this value, local protein structure is considered accessible for interaction with other proteins.

Scoring schemes filter for statistical enrichment of motif instances. An approach of filtering by homology [55] seems inappropriate for use to detect virus interactions with host proteins, as it may exclude nonhomologous regions with motifs that do interact, yielding false negatives. Regardless, failure to consider evolutionary relatedness among sequences being searched could introduce bias due to common ancestry, rather than independence, among sequences.

A simple approach to instance prediction is a stand-alone program called ShettiMotif [56] . It was used to scan 2251 protein sequences from 11 Poxviridae genomes (an average of 205 proteins per poxvirus) for low-complexity regions and regular expressions defined by PROSITE. The approach compared numbers of proteins per genome that carry each motif, and doubtlessly includes many motif instances that are not functional as SLiMs. Also, shorter motifs occur more frequently than longer motifs [3, 27] , partly due to chance alone. Regardless, systematic error may be considered a source of background noise across the large number of proteins in 11 viral proteomes, each having different host specificities, to enable somewhat meaningful comparisons, in such a 'statistical genomics' approach [3] . The comparisons could be more meaningful if false positive motif instances were reduced.

Becerra et al. developed another approach to instance counting [57] , which involves comparison with a null distribution from permuting primary sequence and testing for presence of the motif in the permuted sequences. A motif is considered rare and therefore significantly unlikely to occur by chance if it is present at or below some cutoff frequency. Restricting the sequence region that is used for permutation testing, such as by use of structural considerations, can further focus the search. Indeed, such a hybrid filtering approach was described recently and evaluated on the HIV-1 proteome [57] . Following methods described in an earlier study [58] , Becerra et al. used IUPred with a modified, window-based scoring procedure to identify intrinsically disordered protein regions, and tested for statistical rarity below 1% of 1000 shuffled variants. The approach further considered conservation above 70% in a set of aligned sequences, though combining three filtering criteria was too stringent and excluded all motif candidates [57] .

While algorithmic approaches seek to identify a broad range of SLiM types, more specialized resources have emerged to track the distribution of a particular SLiM in viral proteins. For example, iLIR@viral is a web resource dedicated to detecting LIR motif-containing proteins in viruses [59] . LC3interacting regions (LIR motifs) are SLiMs that mediate protein-protein interactions involved in autophagy, as used by influenza A virus M2 protein to subvert autophagy and maintain virion stability [60] .

Using curated text mining analysis and position-specific scoring matrices, iLIR@viral analyzed 16 609 reviewed viral sequences available from UniProt across 2569 individual viral species and found that 15 589 viral sequences contain LIR motifs. While many predicted instances may represent false positives, the enrichment of LIR motifs in viral sequences is consistent with viral adaptation to host xenophagy [35] . Curiously, ELM currently lists the LIR motif as a candidate, rather than an accepted motif class.

Embedding SLiMs into engineered constructs may enable specific effects on cellular immune processes, for applications that include targeted drug delivery, pathogen-specific adjuvants, potent and broadly effective immunogens, transformational medical countermeasures, and improved design of vectors for gene therapy. SLiM modularity may enable easy ways to reprogram protein function with a few localized modifications. To realize the potential utility of SLiMs in synthetic biology, more research is needed to expand and integrate our collection of knowledge on viral SLiMs (see Outstanding Questions).

Detecting SLiMs in variant sequences may help to identify functional innovation or changes in virulence, in a manner that does not rely strictly on functional assessment at the whole-gene level, to identify how sequence-specific variation may interact with host responses. This may be particularly useful and important to understand new variants and assess the risk that they may spread and cause harmful effects on human health or agricultural interests. Such knowledge is needed in an era where synthetic biology may introduce new risks for biological error and biological terror. Detecting and understanding SLiM variants can help to reduce such risks and identify newly emerging threats to global health and security because watch lists for harmful organisms to ensure public safety by preventing access to select known risks may be inadequate [21] [22] [23] .

SLiMs in viral proteins can interact in many different ways with host proteins to modulate immune responses. A motif may be necessary but not sufficient for any inferred function. The simplest case is where a viral SLiM interacts directly with a host protein to yield an immunomodulated phenotype. More elaborate cases are known, such as the multifunctional proteins E1A (EBV), Nef (HIV-1), and ICP34.5 (HSV). Computational prediction of SLiM classes and new instances is a process, which involves experimental confirmation and validation. High-throughput methods for experimental assessment of protein interactions are useful to validate computational predictions [38, 61] , and more assays are needed to evaluate functional and phenotypic effects of adding or deleting SLiMs.

resolved map of human-virus protein-protein interaction networks. PLoS Pathog. 9, e1003778

What specific constraints limit SLiM evolvability?

What strategies are most effective to advance knowledge of viral immunomodulatory SLiMs in the design of vaccines and therapies to promote global health? For example, can some viral peptides be useful as adjuvants?

Signatures of pleiotropy, economy and convergent evolution in a domain

Pathogen mimicry of host protein-protein interfaces modulates immunity

Use of host-like peptide motifs in viral proteins is a prevalent strategy in host-virus interactions

SLiMSearch 2.0: biological context for short linear motifs in proteins

How viruses hijack cell regulation

Attributes of short linear motifs

How pathogens use linear motifs to perturb host cell networks

Short linear motifs: ubiquitous and functionally diverse protein interaction modules directing cell regulation

Applications of immunomodulatory immune synergies to adjuvant discovery and vaccine development

Targeting hostderived glycans on enveloped viruses for antibodybased vaccine design

Structure and immune recognition of the HIV glycan shield

Completeness of HIV-1 envelope glycan shield at transmission determines neutralization breadth

Protein and glycan mimicry in HIV vaccine design

HIV-1 Nef: a master manipulator of the membrane trafficking machinery mediating immune evasion

Immunogenicity and protection efficacy of monomeric and trimeric recombinant SARS coronavirus spike protein subunit vaccine candidates

A fusion intermediate gp41 immunogen elicits neutralizing antibodies to HIV-1

Immunosilencing a highly immunogenic protein trimerization domain

Vaccination with soluble headless hemagglutinin protects mice from challenge with divergent influenza viruses

Human chemokine MIP1a increases efficiency of targeted DNA fusion vaccines

Dengue E protein domain III-based DNA immunisation induces strong antibody responses to all four viral serotypes

Options for synthetic DNA order screening, revisited

A transatlantic perspective on 20 emerging issues in biological engineering

Biodefense in the Age of Synthetic Biology

Short linear motifs -ex nihilo evolution of protein regulation

Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions

The present and the future of motif-mediated protein-protein interactions

A million peptide motifs for the molecular biologist

A review of functional motifs utilized by viruses

Hacking the cell: network intrusion and exploitation by adenovirus E1A

The GOA database: Gene Ontology annotation updates for 2015

QuickGO: a web-based tool for Gene Ontology searching

Exploring autophagy with Gene Ontology

Autophagy in negative-strand RNA virus infection

Autophagy during viral infection -a double-edged sword

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins

Experimental detection of short regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad

An eIF2a-binding motif in protein phosphatase 1 subunit GADD34 and its viral orthologs is required to promote dephosphorylation of eIF2a

Computational prediction of short linear motifs from protein sequences

MEME SUITE: tools for motif discovery and searching

SLiMPrints: conservationbased discovery of functional motif fingerprints in intrinsically disordered protein regions

SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent

SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins

SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs

Systematic discovery of new recognition peptides mediating protein interaction networks

DILIMOT: discovery of linear motifs in proteins

Discovering sequence motifs with arbitrary insertions and deletions

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

NestedMICA as an ab initio protein motif discovery tool

SMART: recent updates, new developments and status in 2015

InterPro in 2017-beyond protein family and domain annotations

GlobPlot: exploring protein sequences for globularity and disorder

IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

A computational strategy for the prediction of functional linear peptide motifs in proteins

A bioinformatics pipeline to search functional motifs within whole-proteome data: a case study of poxviruses

Prediction of virus-host protein-protein interactions mediated by short linear motifs

Intrinsic disorder in ubiquitination substrates

iLIR@viral: a web resource for LIR motif-containing proteins in viruses

A LC3-interacting motif in the influenza A virus M2 protein is required to subvert autophagy and maintain virion stability

High-throughput methods for identification of protein-protein interactions involving short linear motifs

ViralZone: a knowledge resource to understand virus diversity

ViralZone: recent updates to the virus knowledge resource

An integrated ontology resource to explore and study host-virus relationships

Representing virus-host interactions and other multi-organism processes in the Gene Ontology

The ins and outs of eukaryotic viruses: knowledge base and ontology of a viral infection

UniProt: the universal protein knowledgebase

Fifty-five years of enzyme classification: advances and difficulties

The eukaryotic linear motif resource ELM: 10 years and counting

ELM 2016-data update and new functionality of the eukaryotic linear motif resource

ELM: a database of phosphorylation sites -update 2011

The switches.ELM resource: a compendium of conditional regulatory interaction interfaces

iELM -a web server to explore short linear motif-mediated interactions

Exploring short linear motifs using the ELM database and tools

Control of TANK-binding kinase 1-mediated signaling by the g 1 34.5 protein of herpes simplex virus 1

HSV-1 ICP34.5 confers neurovirulence by targeting the Beclin 1 autophagy protein

A conserved domain of herpes simplex virus ICP34.5 regulates protein phosphatase complex in mammalian cells

Up to four distinct polypeptides are produced from the g34.5 open reading frame of herpes simplex virus 2

Herpes simplex virus 2 ICP34.5 confers neurovirulence by regulating the type I interferon response

An N-terminal arginine-rich cluster and a proline-alanine-threonine repeat region determine the cellular localization of the herpes simplex virus type 1 ICP34.5 protein and its ligand, protein phosphatase 1

Replication of herpes simplex virus 1 depends on the g 1 134.5 functions that facilitate virus response to interferon and egress in the different stages of productive infection

The herpes simplex virus neurovirulence factor g34. 5: revealing virus-host interactions

ELM: the status of the 2010 eukaryotic linear motif resource