key: cord-0930387-8cpgv08l authors: Jouvenne, Patricia; Richardson, Christopher D.; Schreiber, Steven S.; Lai, Michael M.C.; Talbot, Pierre J. title: Sequence analysis of the membrane protein gene of human coronavirus 229E date: 1990-02-28 journal: Virology DOI: 10.1016/0042-6822(90)90115-8 sha: 87d5b80231b8956e498791ab3507f0f1ca529be8 doc_id: 930387 cord_uid: 8cpgv08l Abstract Human coronaviruses (HCV) are ubiquitous pathogens which cause respiratory, gastrointestinal, and possibly neurological disorders. To better understand the molecular biology of the prototype HCV-229E strain, the complete nucleotide sequence of the membrane protein (M) gene was determined from cloned cDNA. The open reading frame is preceded by a consensus transcriptional initiation sequence UCUAAACU, identical to the one found upstream of the N gene. The M gene encodes a 225-amino acid polypeptide with a molecular weight (MW) of 25,822, slightly higher than the apparent MW of 19,000–22,000 observed for the unprocessed M protein obtained after in vitro translation and immunoprecipitation. The M amino acid sequence presents a significant degree of homology (38%) with its counterpart of transmissible gastroenteritis coronavirus (TGEV). The M protein of HCV-229E is highly hydrophobic and its hydropathicity profile shows a transmembranous region composed of three major hydrophobic domains characteristic of a typical coronavirus M protein. About 10% (20 amino acids) of the HCV-229E M protein constitutes a hydrophilic and probably external portion. One N-glycosylation and three potential O-glycosylation sites are found in this exposed domain. Human coronaviruses (HCV) belong to either one of two antigenic groups, represented by the prototype strains 229E and OC43 (1) . They are responsible for as much as 25% of common colds (2, 3) and have been associated with gastrointestinal disorders (4) . Their possible involvement in neurological diseases was suggested by the observation of coronavirus-like particles in the brain of one multiple sclerosis (MS) patient (5) , the isolation of coronaviruses from two MS brain tissues passaged in mice (6) , and the detection of intrathecal antibodies to HCV-OC43 and HCV-229E in MS patients (7) . However, the association of human coronaviruses with neurological diseases has not yet been confirmed . HCV-229E possesses a single-stranded, positivesense RNA genome with a molecular weight of 5 .8 x 106 and a poly(A) tail of about 70 nucleotides at the Tend (8) . As with other coronaviruses, six subgenomic RNAs are synthesized in infected cells (9) . These appear to have lower molecular weights than viral RNAs synthesized in cells infected with murine hepatitis virus (MHV) . At least four polypeptides have been found in purified HCV-229E virions : 160-to 200-kDa and 88-to 105-kDa glycoproteins which may be analogous to the ' To whom requests for reprints should be addressed . 0042-6822/90 $3 .00 Copyright C 1990 by Academic Press, Inc . All rights of reproduction in any to" reserved . 608 spike glycoprotein S (previously designated E2) of MHV (10) ; a 47-to 53-kDa polypeptide corresponding to the nucleocapsid protein N and a 17-to 26-kDa M protein (previously designated El) observed in both glycosylated and nonglycosylated forms (11) (12) (13) (14) . One author also reported glycoproteins of 31 and 65 kDa (11) . The nucleotide sequence of the genes encoding the nucleocapsid proteins as well as the mRNA leader sequences of HCV-229E and HCV-OC43 have recently been determined (15, 16) . As a continuation of these studies, we report the nucleotide sequence of the gene encoding the membrane protein M of HCV-229E . Its predicted amino acid sequence is compared with sequences determined for other coronaviruses . Clones containing the sequence of the M protein gene were obtained from a cDNA library constructed with mRNA isolated from HCV-229E-infected L132 cells, and identified using a genome-specific probe (15) . One clone, designated L8, was selected for sequencing since it contained a large 3 .6-kb insert overlapping by 1 .2 kb the 5' end of the N protein gene . The remaining 2 .4-kb fragment was excised from an internal Pstl site of clone L8 and subcloned into the pBluescript II vector (Stratagene) . Unidirectional deletions of the 2 .4-kb insert were created using exonuclease III, mung bean nuclease, and deoxythionucleotide derivatives (Stratagene) . The sequencing of both strands was performed by the plasmid sequencing technique (17), using T7 DNA polyrnerase . In vitro translation of poly(A)' mRNAs isolated from HCV-229E-infected L132 cells was carried out in order to determine the molecular mass of the unprocessed viral polypeptides . The complete nucleotide sequence of the M protein gene of HCV-229E and its predicted amino acid sequence are presented in Fig . 1 . The AUG codon is preceded by the consensus intergenic sequence UCU-AAACU, which is identical to that upstream of the nucleocapsid protein-coding sequence (15 ; and lion of poly(A)' mRNAs from HCV-229E-infected cells were precipitated with a polyclonal antiserum prepared against purified HCV-229E virions . As shown in Fig . 2 , six viral polypeptides were observed, which migrated with apparent molecular masses of 98, 58, 42, 28 .5, 19, and 14 kDa, respectively . Although the identity of these proteins has not been firmly established, by comparing with other coronaviruses, p98 probably corresponds to S, p58 to N, and p19 to M . The nature of p42 and p28 .5 is not known at this time . Thus, the molecular mass of M predicted from the nucleotide sequence is slightly higher than the molecular mass estimated by SDS-PAGE . Other studies have shown that the mature M protein has a molecular mass of 23-to 26-kDa (12) (13) (14) and that virions also incorporate a nonglycosylated 20-to 22-kDa precursor of the M protein (12, 14) . The latter observation is consistent with the identification of in vitro translated p19 as M . The lower apparent molecular mass of M estimated by SDS-PAGE is consistent with the unusual electrophoretic behavior of this and other hydrophobic proteins, as was observed for MHV (19) . Like TGEV (20) , there are three amino acid sequences characteristic of N-glycosylation sites in the predicted M protein sequence (Asn-5 ; Asn-190 ; and Asn-214), although only one (Asn-5) is found near the N-terminus, as compared to two for TGEV . Moreover, three potential O-glycosylation sites are located in the putatively external N-terminus of the polypeptide (Set-2 ; Thr-7 ; and Thr-1 2) . In addition, there is only one cysteine residue (Cys-6) . Other coronavirus M proteins contain two (bovine coronavirus, BCV ; Ref . (21) 23)) cysteine residues . This cysteine residue is probably important in forming interchain disulfide bridges, since M of HCV-229E has been shown to form oligomers under nonreducing conditions (14) . No significant nucleotide sequence homology exists between the M genes of HCV-229E and other coronaviruses . The highest M amino acid homology (38% or 100 of 262 residues) occurs between HCV-229E and TGEV, which was reported to be antigenically related (24) . Antigenically distinct BCV, MHV, and IBV show amino acid homologies of 32, 30, and 28%, respectively . In contrast, a homology of 87% was found between the M proteins of BCV and MHV-A59 (21) , which belong to another antigenic subgroup (24) . On the other hand, a homology of 34% was found between the M protein of TGEV and BCV (25) , which belong to two different antigenic subgroups . Figure 3 illustrates the M regions common to both HCV-229E and TGEV . As with other coronaviruses, the M protein of HCV-229E is a highly hydrophobic membrane protein . is also the recipient of a University Research Scholarship from the Natural Sciences and Engineering Research Council of Canada . P . Jouvenne acknowledges a studentship support from the Fonds de la recherche en Same du Quebec Biochemistry and Biology of Coronaviruses Infect. Im-19 Virus