key: cord-0754721-pw60qx7c authors: Armstrong, John; Niemann, Heiner; Smeekens, Sjef; Rottier, Peter; Warren, Graham title: Sequence and topology of a model intracellular membrane protein, E1 glycoprotein, from a coronavirus date: 1984 journal: Nature DOI: 10.1038/308751a0 sha: 4cf61fb4f0b1a198d2c00e9ba2cfa36c3497bd04 doc_id: 754721 cord_uid: pw60qx7c In the eukaryotic cell, both secreted and plasma membrane proteins are synthesized at the endoplasmic reticulum, then transported, via the Golgi complex, to the cell surface(1–4). Each of the compartments of this transport pathway carries out particular metabolic functions(5–8), and therefore presumably contains a distinct complement of membrane proteins. Thus, mechanisms must exist for localizing such proteins to their respective destinations. However, a major obstacle to the study of such mechanisms is that the isolation and detailed analysis of such internal membrane proteins pose formidable technical problems. We have therefore used the E1 glycoprotein from coronavirus MHV-A59 as a viral model for this class of protein. Here we present the primary structure of the protein, determined by analysis of cDNA clones prepared from viral mRNA. In combination with a previous study of its assembly into the endoplasmic reticulum membrane(9), the sequence reveals several unusual features of the protein which may be related to its intracellular localization. In the eukaryotic cell, both secreted and plasma membrane proteins are synthesized at the endoplasmic reticulum, then transported, via the Golgi complex, to the ceU surface 1 -4. Each of the compartments of this transport pathway carries out particular metabofic functions 5-8, and therefore presumably contains a distinct complement of membrane proteins. Thus, mechanisms must exist for locafizing such proteins to their respective destinations. However, a major obstacle to the study of such mechanisms is that the isolation and detailed analysis of such internal membrane proteins pose formidable technical problems. We have therefore used the El glycoprotein from coronavirus MHV • A59 as a viral model for this class of protein. Here we present the primary structure of the protein, determined by analysis of eDNA clones prepared from viral mRNA. In combination with a previous stu'!} of its assembly into the endoplasmic reticulum membrane , the sequence reveals several unusual features of the protein which may be related to its intracellular localization. The coronaviruses are a diverse class of enveloped RNA viruses of considerable medical and agricultural significance; they also provide a model for the study of persistent viral infections (see ref. 10 for review). In contrast to many enveloped viruses, the corona virus mouse hepatitis virus (MHV) A59 buds inside the cell, into the lumen of the endoplasmic reticulum 11 The assembled virion then appears to travel, via the Golgi complex, to the cell surface. Of the two viral membrane proteins, the smaller one, El, is necessary for formation of the envelope, and is restricted to internal cell membranes; apparent?; it only reaches the cell surface as part of the budded virion 12 • 3 . Thus, the El glycoprotein is potentially a convenient model for studying those features of a membrane protein that determine its arrest at a particular destination on the membrane transport pathway. The mRNAs of MHV-A59 form a 'nested set': the seven RNAs share the 3' region of the positive-stranded genome, but extend to different lengths towards the 5' end 15 -18 • From each RNA, only the 5' gene is translated 19 ' 20 . In addition, a noncoding 'leader' sequence of approximately 70 bases, from the 5' end of the genome, is common to the mRNAs 18 • 21 • 22 . The E1 gene is second from the 3' end and is therefore translated from the second smallest mRNA, RNA 6 (refs 19, 20) . The sequence of the 3' -terminal gene, encoding the viral nucleocapsid protein, has been determined previously 23 • 24 • Copy DNA clones spanning the E1 gene were prepared by two methods 23 -25 and sequenced in the vectors M13mp8 (ref. Two versions were found, in two different clones, for the sequence immediately upstream from the E1 initiator codon. The shorter one is shown in Fig. 1 ; in the second clone, an additional copy of the pentanucleotide A TCT A was found between nucleotides 65 and 66, making the sequence similar to that of the region adjacent to the nucleocapsid gene of another strain of MHV 29 . This difference could represent a mutation; alternatively, it may reflect heterogeneity in the normal mRNA population. Indirect support for the latter possibility comes from the observation that a RNase-T 1 oligonucleotide from this region of RNA 6, corresponding to the shorter sequence, was recovered in markedly lower yield than those from the rest of the molecule 30 • This site represents the point of fusion between the 5' leader sequence and the coding portion of the RNA. The fusion is thought to occur by 'jumping' of the viral RNA polymerase to particular sites on its genome-length, negativestranded template; the resumption of transcription then produces each of the subgenomic mRNAs 22 " 31 ' 32 • Thus, it seems possible that the polymerase may jump to more than one point on the template for each mRNA, generating variable numbers of the repeated pentanucleotide AUCUA in the resulting transcript. Figure 1 shows the amino acid sequence encoded by the E1 gene. The predicted molecular weight of the protein is 26,000, slightly higher than that observed by gel electrophoresis 19 • 33 but consistent with the unusual electrophoretic behaviour of this 33 , and other, hydrophobic proteins. Several features of the protein, when assembled into membranes in the virus 33 , or in vitro 9 , are reflected in the sequence. First, in contrast to the majority of membrane proteins, El is known to lack a cleaved 'signal peptide' 9 : the N-terminal region of the sequence contains no good candidate for a cleava3e site 34 • Second, the N-terminal region bears 0-linked sugars 5 ' 36 , which, uniquely among viral proteins so far studied, are the only known post-translational modification to El. Assuming that the terminal Met is removed 37 , theN-terminal sequence is Ser-Ser-Thr-Thr, which is identical to the 0-glycosylated amino terminus of M-type glycophorin A (ref. 38 ). The 0-linked sugars of El are them- selves identical to those found in glycophorin 39 . Third, most of the protein is resistant to proteolysis when assembled in the membrane. Only 2.5 kilodaltons of polypeptide from the Nterminus are cleavable on the luminal side of the membrane (or outside the virion) and 1.5 kilodaltons from the C-terminus from the cytoplasmic (or intra-virion) side 9 , suggesting that the protein is largely buried in the membrane. In the sequence, a run of 22 uncharged residues from positions 26 to 4 7 represents a potential membrane-spanning region; residues 1-25 correspond to the portion removable by protease. A further sequence of uncharged residues, positions 57-106, is sufficiently long to cross the membrane twice more. If this region is divided in two, and each half plotted as an a-helical 'wheel', all the polar side chains of both sections cluster within 140°. Thus, a plausible conformation for this region is two hairpinned helices in the membrane, with adjacent polar faces (Fig. 2a) . There are no other long hydrophobic sequences, implying that the region from residues 107 to -190 is either folded in the membrane to neutralize charges, or, more likely, is adjacent to the membrane but resistant to proteolysis. These features are summarized in Fig. 2b . Which, if any, of these various features might be responsible for the protein's intracellular localization? We do not know, for example, whether the protein has an active 'signal' causing its arrest on the transport pathway, or, alternatively, if it lacks a signal for onward transpOrt; nor do we know whether a sorting process might operate on one or the other side of the membrane. The availability of a eDNA clone for the protein presents the opportunity to investigate these questions by allowing expression of the cloned DNA and in vitro mutagenesis. This approach has already been applied to two other viral glycoproteins, to investigate the importance of their cytoplasmic domains for transport to the cell surface, yielding opposite conclusions 40 • 41 • An intrinsic problem with the method, however, is the difficulty of distinguishing specific effects due to alterations at the site of mutagenesis, from a general structural disruption of the molecule. In this respect the E1 protein may be advantageous in that it provides the possibility of creating a more 'active' phenotype in the mutated molecule: specifically, particular alterations to the protein may result in its transport to the cell surface. We thank Willy Spaan for communicating results before publication, G. Heisterberg-Moutsis (G. B. F. Braunschweig) for help with oligonucleotide synthesis, Ben van der Zeijst for discussion and Annie Steiner for preparing the manuscript. J.A. was supported by fellowships from the Royal Society and the OUTSIDE (LUMENl Fig. 2 a, Distribution of polar side chains in the hydrophobic regions of the El sequence. Residues (1), 57-81 (2) and 82-106 (3) are plotted as a-helices and viewed endon. Polar side chains are boxed: proposed hydrophilic faces of helices 2 and 3 are indicated. b, Possible topologies of the El protein across the membrane. Arrows indicate sites accessible to protease; broken arrows represent inefficient pro-teolysis9. European Molecular Biology Organization, H.N. by the Deutsche Forschungsgemeinschaft, SFB4 7 (Virologie), Teilprojekt B3, and P.R. by a short-term EMBO fellowship. Some of these results have been presented in preliminary form elsewhere 24 • 25 • Proc. natn. Acad. Sc< U.S.A Molecular Biology and Pathogenesis of Coronaviruses Molecular Biology and Pathogenesis of Coronaviruses