key: cord-0814881-gezlsgyh authors: McCuaig, Kimberly; Rosenberg, Madelaine; Nédellec, Patrick; Turbide, Claire; Beauchemin, Nicole title: Expression of the Bgp gene and characterization of mouse colon biliary glycoprotein isoforms date: 1993-05-30 journal: Gene DOI: 10.1016/0378-1119(93)90716-g sha: 28f87ecceb690e637ce9ac9135f11a6480f22f16 doc_id: 814881 cord_uid: gezlsgyh Abstract The biliary glycoprotein (BGP)-encoding gene is a member of the human carcinoembryonic antigen (CEA) gene family. We have now cloned several mouse Bgp cDNAs from an outbred CDR-1 mouse colon cDNA library, as well as by reverse transcription-PCR amplification of colon RNA. The distinguishing features of the deduced Bgp protein isoforms are found in the two divergent N-terminal domains, the highly conserved internal C2-set immunoglobulin domains, and an intracytoplasmic domain of either 10 or 73 amino acids (aa). The cDNA structures suggest that these mRNAs are produced through alternative splicing of a Bgp gene and the usage of multiple transcriptional terminators. The Bgp deduced aa sequences are highly homologous to several well characterized rat hepatocyte proteins such as the cell CAM105/ecto-ATPase/ppl20/HA4 proteins. Oligodeoxyribonucleotide probes representing the various cDNA isoform domains revealed predominant transcripts of 1.8, 3.1 and 4.0 kb on Northern analyses of mouse colon RNA; some of these bands are actually composed of several co-migrating transcripts. The transcripts encoding the long intracytoplasmic-tailed Bgp proteins are expressed at one-tenth the relative abundance of the shorter-tailed species. We have previously demonstrated that several mouse Bgp cDNAs, when transfected into eukaryotic cells, express BGP proteins at the cell surface and function in vitro as cell adhesion molecules, much like their human and rat counterparts. The expression of the many Bgp isoforms at the surface of epithelial cells, such as colon, suggests that these proteins play a determinant role, through self- or heterologous contact, in renewal and/or differentiation of their epithelia. Carcinoembryonic antigen (CEA); (Gold and Freedman, 1965 ) is a human tumor marker used to evalu-ate recurrences of gastrointestinal, breast and lung cancers (Shuster et al., 1980) . The CE.4 gene family, composed of at least 22 genes clustered on human chromosome 19ql3.1-q13.2, is divided into two subgroups gene family member; Cyt, intracytoplasmic tail; ICAM-I, intercellular adhesion molecule 1; Ig, immunoglobulin; kb, kilobase or 1000 bp; MHV, mouse hepatitis virus; MHVR, MHV receptor; NCA, nonspecific cross-reacting antigen; nt, nucleotide(s); oligo, oligodeoxyribonucleotide; ORF, open reading frame; PCR, polymerase chain reaction; PSG, pregnancy-specific glycoprotein; 5' or 3'UTR, 5' or 3' untranslated region; RIT, oligo specific for Cyt in the antisense orientation; RT, reverse transcription; TM, transmembrane based on sequence comparisons and expression patterns Khan et al., 1992) : the CEA subgroup [CEA, NCA, BGP, CGA46 and a number of genes with as yet undefined gene products (designated CGMs; Thompson et al., 199211 and the pregnancy-specific (PSG) subgroup (Khan et al., 1992) . The CEA-related gene products exhibit structural features of the immunoglobulin (Ig) supergene family with N-terminal domains resembling variable regions and internal domains homologous to C&set constant domains Williams and Barclay, 1988) . BGPs (Svenberg, 1976; Svenberg et al., 1979; Hinoda et al., 1988; Barnett et al., 1989) are unique in this family in that all isoforms, produced through alternative splicing of a single gene (Barnett et al., 1989) bear either short (10 aa) or long (71 aa) intracytoplasmic tails (Cyts) (Hinoda et al., 1988; Barnett et al., 1989) . In contrast, CEA, NCA and CGM6 are linked to the membrane by a glycophospholipid anchor Kolbinger et al., 1989; Berling et al., 1990) . CEA, NCA and BGP are postulated to participate in intestinal tissue organization during embryonic development and function in vitro as intercellular adhesion molecules (Benchimol et al., 1989; Oikawa et al., 1989; Oikawa et al., 1992; Rojas et al., 1990; Zhou et al, 1990) . The CEA and NCA glycoproteins are also involved in the bacterial recognition and in colonization (Leusch et al., 1991) . PSGs are postulated to act as immunomodulators during pregnancy (Cerni et al., 1977) . To pursue functional investigations, we have begun characterization of the mouse CEA-related gene family. In previous work, we have shown that mouse Bgp proteins (the cDNAs had previously been called mmCGMla and mmCGM2 and have been renamed BgpA and BgpB), expressed at the surface of transfectant cells, function in vitro as adhesion molecules (Turbide et al., 1991; McCuaig et al., 1992) . Moreover, Dveksler et al. (1991) have demonstrated that one of these mouse Bgp proteins (MHVR with identical coding sequence to the deduced aa sequence of BgpA) acts as the mouse hepatitis virus (MHV) receptor (Dveksler et al., 1991) . Contrary to human BGP isoforms which contain the same Ig-like N-terminal domain (Barnett et al., 1989) , the previously reported mouse Bgp-like cDNAs each encode a distinct N-terminal domain, and internal CZ-set domains that are conserved but not identical (Turbide et al., 1991; McCuaig et al., 1992) . This finding implied that either a Bgp gene subfamily exists, or that complex transcriptional mechanisms are operative to account for these Bgp cDNAs. It therefore became important to define the number of Bgp isoforms and to study their expression in mouse tissues for future functional studies. The data presented in this paper describes the characterization, the relative abundance and the differential expres-sion of nine Bgp cDNA isoforms expressed in colon. The results suggest that a unique mouse Bgp gene undergoes alternative splicing, and that different polyadenylation signals are used to produce a number of related proteins with possible complementary functions. However, the two classes of related Bgp cDNAs cloned from the outbred CD-l mouse may represent allelic variants of this Bgp gene (Dveksler et al., 1993) . (a) Library cDNA clones To characterize the mouse CEA-related gene family, we screened a CD-1 mouse colon cDNA library (~eauchemin et al., 1989) . This yielded a significant number of cDNA clones (i.e., 392), some of which presented different restriction patterns or different cDNA sequences. To verify if these cDNA clones were representative of other mouse CEA-related genes, clones 23, 32, 37, 58 and 64 were completely sequenced while clone 132 was partially sequenced. As shown in Fig. 1 p some of these clones (clones 23 and 132) overlapped perfectly within sequences encoding either the signal sequence, the N terminal or the A2 domain of the BgpA cDNA (formerly mmCGMla) (McCuaig et al., 1992) , while others (clones 23, 32, 37, 58 and 64) were aligned with DNA coding for either the leader, the N-terminal, the A2, the TM, the Cyt domains or the 3'UTR sequence of another published cDNA clone, BgpB (formerly mmCGM2) (Turbide et al., 1991) . Clone 23 extended the published BgpB cDNA 3'UTR (Turbide et al., 1991) by 119 nt and ended in a poly(A) tail. Another cDNA clone (clone 58) encoded a longer Cyt (73 aa) than the previously reported 10 aa Cyt domain (Turbide et al., 1991; McCuaig et al., 1992) ; this new mouse Cyt domain, however, resembled closely the cDNA sequence of human BGP (Barnett et al., 1989) and of a rat ecto-ATPase cDNA (Lin and Guidotti, 1989) . Clones 32 and 64 also encoded conserved but not identical Al and Bl domains when compared to similar domains of the BgpA cDNA. cDNA clone 37 demonstrated high homology to the reported 3' UTR of the MHVR isolated from Balb/c mice (Dveksler et al., 1991) , but extended this sequence by 199 nt to another poIy~A) tail. The information from the cDNA sequences and computer-translated aa sequences compiled from the cDNA clones were indicative of multiple Bgp cDNA isoforms present in mouse colon. We therefore conducted a series of ex~riments using a RT-PCR amplification technology (Frohman et al., 1988) 3'UTRl and 3WTR2, 3' untranslated regions; An, positions of poly(A) tails. Methods: cDNA was synthesized as previously described (Reauchemin et al., 1989) . The CD-l mouse colon cDNA library was screened with cDNA restriction fragments corresponding to the EcoRI-SstI and AC&EcoRI restriction fragment of clone 46 (McCuaig et al., 1992) and with the oligo RIT (antisense: S-TGAGGGTTTGTGCTCTGTGAGATC) representing a region of the long Cyt conserved between human BGP (Barnett et al., 1989) and rat ecto-ATPase (Lin and Guidotti, 1989) . Bgp cDNA restriction fragments were subcloned into unique sites of the BluescriptSK+ plasmid (Stratagene, La Jolla, CA) and overlapping DNA restriction fragments were sequenced on both strands with either the T7 or the T3 promoter primers or with internal primers by the dideoxy chain termination method @anger et al., 1977) using T7 DNA polymerase (Pharmacia, Montreal, Canada). Sequences were analyzed using the DNAsis, Prosis (Pharmacia) and the GCG sequence analysis system (Devereux et al., 1984) programs. present in this tissue. Total RNA from CD-l mouse colon was reverse-transcribed using either a (dT),, adaptor primer or KMS, a primer located in the 3'UTR of BgpB and BgpD (see Fig. 3b ). PCR amplifications were then performed on single-stranded cDNAs with combinations of primers located in the coding and non-coding regions and the products were cloned and sequenced (Fig. 2a) . Since we had observed that major sequence differences were found between the N-terminal domains (designated Nl and N2; Turbide et al., 1991; McCuaig et al., 1992) , divergent primers within these two N-terminal domains (46N1, R46N1, CGM2N and RCGM2N) were synthesized, tested at medium and high hybridization stringency and demonstrated non-cross-reactivity (data not shown). The coding sequence of the two Al domains (Ala, Al b), Bl domains (Bla, Blb) and the two A2 domains (A2a, A2b) differed by only 4, 1 and 3 nt, respectively, (indicated in Fig. 3b ). Common primers specific for each of these domains (KM7 and KM8 for Al and 33-35 for A2) as well as a primer specific to the region encoding the long Cyt (RIT) were synthesized. The use of primers specific to the SUTR of BgpA (KM2), and to the N-terminal domains (46Nl or RCGM2N) resulted in the amplification of a 360-bp fragment only when KM2 was combined with a primer from the Nl-terminal domain (46Nl). This result indicates that the S'UTR of the transcripts containing the N2 domain is different in the region of KM2 (Fig. 2a) . Varying the Mg2+ concentrations of the amplification reaction for the KMZRCGM2N primer pair did not produce a fragment. The identity of the KM2-46Nl fragment was confirmed by sequencing of the fragment. The sequences of the cDNA clones suggested that splicing events occurred on the primary Bgp transcript(s), resulting in the deletion of the Al and Bl domains (McCuaig et al., 1992) . Methods: Total RNA from CD-l mouse colon was prepared by guanidium isothiocyanate extraction and centrifugation . RT was performed using AMV reverse transcriptase (Pharmacia, Montreal, Canada) and 10 ug of total mouse colon RNA as template essentially as described previously (McCuaig et al., 1992) . The oligo primers used in this reaction were either a (dT),, containing restriction sites (S- GACTCGAGTCGACGGTACCCT,,) or oligo KM5 (antisense: 5'-TTGATACCTCACTCTCAGCCA). The amplification reactions were incubated at an annealing temperature of 44°C for 2 min, at an elongation temperature of 72°C for 3 min and at a denaturing temperature of 94°C for 1 min for 40 cycles in a total volume of 100 ul containing 20 mM TrisCl pH 8.8 (at 24"C)/lO mM KClj2 mM MgSO&O mM ammonium sulfate/O. 1% Triton X-100/0.1 ug BSA/0.2 mM of each dNTP/40 pmol of phosphorylated primers, using 2 units of Vent@ DNA polymerase (New England Biolabs), to decrease possible proofreading errors (Frohman et al., 1988; Mattila et al., 1991) . The oligo primers used were: KM2: sense S-CCAAGTCCCGACAAGTAGTG; RCGM2N: antisense 5'-GTCTTATTAGTGCCTGTTATAC; CGM2N: sense S-GTAACAGGC-ACTAATAAGAC; 46Nl: antisense 5'-CTTCATGGTGATCATTTGG; R46Nl: sense 5'-CCAAATGATCACCATGAAG; KM7: antisense T-GGGTCACTTCGGTTGACACT, KM8: sense 5'-CCAGTGAGTGTCAACCGAAG, 33-35: antisense S-CCGGCATCTTCCCTCTTAATA-GGGTCTATTCTG; RIT: antisense 5'-TGAGGGTTTGTGCTCTGTGAGATC; KM6: antisense 5'-GGCTCCAGGATCCACCTTTTCTTC. The PCR amplification products were blunt-ended (Sambrook et al., 1989) with T4 DNA polymerase (Boehringer-Mannheim) to degrade single-stranded DNA resulting from non-symmetrical amplification (Innis et al., 1990) , separated by electrophoresis and cloned into the SmaI site of the BluescriptSK + plasmid (Stratagene, La Jolla, CA, USA) for sequencing. A minimum of three independent clones were sequenced for each fragment resulting from every amplification reaction with different pairs of primers. Control samples omitting reverse transcription were included to verify that the RNA was not contaminated by genomic DNA. These fragments were also electrophoresed through 1% agarose gels and transferred to GeneScreen Plus membranes (NEN DuPont). Hybridizations were carried out as previously described (McCuaig et al., 1992) with 2 x lo6 cpm/ml of [32P]oligos (Sambrook et al., 1989) . The filters were washed to a final stringency corresponding to 5°C below the predicted melting temperature of each oligo. domains, amplifications were carried out between N-terminal primers (R46Nl or CGM2N) and oligos corresponding to the A2 domain (33-39, the long Cyt (RIT) or one of the 3'UTRs (KM6). As is summarized in Fig. 2a Similarly, amplifications using the N-terminal domain and the long Cyt-specific primers produced two DNA fragments. Sequencing confirmed that the shortest ones carried a deletion of the Al and Bl domains, while the longer fragments included these two domains. Amplifications conducted with primers located in the N-terminal coding domains and in the 3'UTR (KM6) confirmed the previous results. A greater number of clones (seven) representing these fragments were sequenced, one of which was shown to code for a long Cyt domain. Results from these amplifications were confirmed by Southern analyses. As can be seen in Fig. 2b , the 1350 and 1450-bp fragments amplified by either pairs (R46Nl-KM6 or CGM2N-KM6) hybridized to an oligo specific to the A 1 domains (KM7) (Fig. 2b, panel A) , while all the fragments hybridized to the 33-35 oligo defining the A2 domain (Fig. 2b, panel B) . The bands hybridizing to oligo RIT (Fig. 2b , panel C) confirmed that the shorttailed cDNA isoforms have a counterpart encoding a long Cyt. These hybridizations also revealed a new 3.0-kb DNA fragment specifically amplified in primer combinations containing the Nl primer. This novel cDNA fragment will be the subject of future reports. This fragment is not due to contamination with genomic DNA since it was not present in PCR amplifications when the RT reaction was omitted (data not shown). The structure of the deduced isoforms are depicted in Fig. 3a . A minimum of nine cDNA isoforms are generated from the mouse Bgp gene(s). Two of these cDNA isoforms (BgpA and MHI/R) have an identical coding region but somewhat different 5' and 3'UTRs (the first 41 nt upstream from the ATG start codon are identical) (Dveksler et al., 1991; McCuaig et al., 1992) . As was demonstrated by Southern analysis (Fig. 2b , panel C) and sequencing of the PCR products, cDNAs which lack a region encoding two C2-set domains such as BgpB and BgpC (clones 1 and 23 respectively) can also encode a long Cyt to generate the BgpG and BgpH cDNA isoforms. Likewise, the Southern and sequencing analyses indicate that the regions encoding the long Cyt as well as three internal repeats are present in the two longest cDNA isoforms designated BgpD and BgpF (Fig. 3b) . The Bgp isoforms exhibit two variants of the signal sequence: four-nt substitutions (Fig. 3b ) lead to change in two aa (Fig. 3~) . The leader region is followed by an N-terminal domain. There are two possible N-terminal domains (Nl and N2) which exhibit both common and divergent areas: the nt sequences encoding the first 37 aa are identical except for one nt substitution, while the last two thirds of these domains exhibit numerous nt differences, many of which are located in the third position of a codon (Fig. 3b and 3~ ). As has been noted in the human CEA gene (Schrewe et al., 1990) , the last nt of the N-terminal domain exon is paired with the first two nt of the following exon to form the first codon of the next domain. Because of this splicing position, different aa are encoded by the various Bgp isoforms, depending on whether the N-terminal domain is followed by the Al or the A2 domain. If there is an Al domain, the first aa in Al is Pro (BgpA/D/E/F) and in A2, Glu. If there is no Al domain, the first aa in A2 is a Gln (BgpB/C/G/H) (Fig. 3~ ). There are two variants for each of the internal C2-set Ig domains; however, there are few nt changes ( Fig. 3b) and those that do occur lead to aa substitutions (Fig. 3c) , and in one case, a silent mutation (indicated by arrow in Fig. 3~) . A linker region (Li) joining the A2 domain to the TM region is predicted by hydropathy computer analysis. Two Cyt domains are derived from essentially the same sequence of DNA. As reported for human BGP (Barnett et al., 1989) , the mouse long Cyt is generated by the inclusion of a 53 bp exon (nts 1464-l 517 in Fig. 3b ) which shifts the ORF from what would otherwise encode a short cytoplasmic tail. The translated proteins with a short Cyt are produced by using the first in frame stop codon (nt 1533-1535) while proteins bearing a long Cyt are generated by the use a stop codon at nt 1667-1669 due to a the shift in reading frame. These stop codons are followed by a 3'UTR of 1.25 (CytL) or 1.39-kb (CytS) (Figs. 1 and 3) . The UTRs of these cDNAs are also variable. cDNA clone 23 contains a 103-bp 5'7JTR. However, a longer S'UTR has been demonstrated by primer extension analysis (P.N. and N.B., unpublished results). As indicated above, BgpA and MHVR share the first 41 nt upstream from the ATG start codon but diverge further upstream from this point, suggesting that there are a minimum of two S'UTR exons, or that a more complicated exon pattern lies upstream from the signal sequence. Sequencing of cDNA library clones and PCR-amplified cDNA clones have indicated a polymorphic region in the 3'UTR (indicated at the bottom of Fig. 3b ). Since these sequence differences occur in a highly repetitive region and are from outbred CD-I mice, it is not known presently if this variable region is due to stuttering during the RT reactions, errors during PCR amplifications, or if it represents a bona fide genetic polymorphism. Further downstream, cDNA library clones terminate at three different positions: clone 23 exhibits the shortest 3'UTR (nt 2114 in Fig. 3b ), the MHVR cDNA (Dveksler et al., 1991) ends at nt 2710 (Fig. 3b) and clone 37 contains a further 199 nt ending with a poly(A) tail at nt position 2922. Several poly(A) consensus signals (double underlines in Fig. 3b ) are located upstream from the various poly(A) tails, only one of which fits the perfect consensus sequence (AATAAA) (Proudfoot and Brownlee, 1976) . The predicted aa sequence of the Bgp isoforms are presented in Fig. 3c and may be sorted into two classes, defined by their N-terminal domains. The first 37 aa of these domains are identical; however, in the following stretch of 19 aa, nine residues are not conserved; charged structures replace non-polar aa and vice versa (K-+A, M+K, F-Q). The Cys residues of the C2-set domains, thought to be involved in intrachain disulfide bonding (Williams and Barclay, 1988) are well conserved in all isoforms (shading in Fig. 3~ ). Northern analyses. Separation of mouse colon total RNA on a formaldehyde-agarose gel and hybridization with oligos. Positions of the 18S, 1.86-kb and 28S4.71 kb as well as the three major transcripts quantitated in c are indicated. (c) Quantification. Autoradiograms of different exposure times of the above Northern blots were scanned by laser densitometry using a Bio-Imager scanner (Millipore, Montreal, Canada). The 4.0-kb band revealed by hybridization to oligo 46N1 was weighed as 1.0. All other readings were computed proportionally. n.d., not determined. Methods: 20 ug of total colon RNA, prepared by guanidium isothiocyanate extraction and centrifugation, was electrophoresed through a 2.2 M formaldehyde-1.5% agarose gel and transferred to GeneScreen Plus membranes. Hybridizations were carried out as previously described (McCuaig et al., 1992) with the [32P]oligos (Sambrook et al., 1989) indicated above the autoradiograms. The filters were washed to a final stringency corresponding to 5°C below the predicted melting temperature of each oligo. Northern analyses were performed on mouse colon transcripts with oligos representing different domains of the cDNAs (Fig. 4a and b) to quantify the relative abundance of the Bgp isoforms (Fig. 4c) . Three major transcripts were identified with these probes. These were measured as 1.8,3.1 and 4.0-kb relative to rRNA markers. In comparison, human BGP restriction fragments have been shown to hybridize to 2.2 and 3.9-kb transcripts from a variety of cell lines (Hinoda et al., 1988; Barnett et al., 1989) . The nt sequences of BgpD and BgpF isoforms reported in Fig. 3b are 2922 nt long, while the BgpB and BgpC or BgpG and BgpZf isoforms are 15 18 or 1569 nt long, respectively. The size discrepancy between the cDNA sequences and the transcript lengths measured by Northern analysis may be due to an additional 300 nt at the 5' end (P.N. and N.B., unpublished results) . As well, the length of the poly(A) tails are thought to be 400-bp longer than the DNA sequences of the human CEA gene family members (Beauchemin et al., 1987) . This adjustment would be consistent with a size of 3622-bp for the longer cDNAs. The existence of an even longer 3'UTR has not been ruled out. The 4.0-kb transcripts and a strong hybridization signal at 3.1-kb with a relative abundance of 1.5 are detected with oligo KM6; the 3.1-kb length is compatible with cDNAs expressing L-N-A2-TM-CytS or CytL carrying a complete 3'UTR region. To quantify the number of transcripts co-migrating in the revealed bands, equal pmol of the labelled oligos were added to hybridization mixtures to ensure that the intensity of the bands would reflect the,relative abundance of the transcripts. The most striking observation was that, although the cDNAs encoding the isoforms containing only two Ig domains (N and A2) could be readily cloned from a cDNA library or amplified by PCR, their abundance as 1.8-kb transcripts was minimal ( Fig. 4c : 46N1, RCGMN2 and KM7). The Nl and N2-terminal domains are preferentially expressed with three Ig C2-set constant domains as full-length transcripts. But, the transcripts encoding proteins with long Cyt are preferentially expressed without Al and Bl domains and migrate as 1.8-kb rather than 4.0-kb mRNA. These are l/l0 to l/3 as abundant as the short Cyt-encoding isoforms. As expected, hybridization with an oligo (3UTR-2) located in the distal region of the 3' UTR detects preferentially 4.0 kb transcripts. An oligo specific for the A2 domain (33-35) binds to both the 1.8 and 4.0-kb transcripts; however, quantification of the autoradioagrams at different exposure times indicated exceedingly high values for this domain. Although the migration of these transcripts is similar to the 18s rRNA, the signal seen does not represent crosshybridization to rRNA; when this oligo was used to probe non Bgp-expressing cell lines, no signal was encountered. However, this A2 domain is very conserved when compared to a similar domain expressed in the mouse Psg cDNAs (Kodelja et al., 1989; Rebstock et al., 1990; Rudert et al., 1992) . Thus, the oligo 33-35 may be detecting some of these PSG-encoding transcripts. (1) The data presented in this report emphasize the multiplicity of the mouse Bgp isoforms and suggest mechanisms responsible for their generation. So far, nine similar, but not identical cDNA isoforms have been cloned from CD-I mouse colon (Turbide et al., 1991; McCuaig et al., 1992; this report) and from Balb/c liver (Dveksler et al., 199 1) . Alternative splicing is active in the generation of isoforms exhibiting different Cyt domains as well as isoforms lacking internal Al and Bl domains. Northern analysis performed on mouse colon RNA with specific oligo probes and quantification of the autoradiograms have confirmed the diversity of the Bgp isoforms; three major transcripts (1.8, 3.1 and 4.0 kb) encode these isoforms, albeit in different relative amounts. The relative abundance of the Bgp long Cyt-containing transcripts are approximately tenfold less abundant than the short Cyttailed isoforms. This finding may reflect on the function of the long-tailed BGP proteins since they contain consensus Tyr phosphorylation sites that have been conserved throughout evolution and are present in the mouse, the rat and the human BGP counterparts (Lin et al., 1989; Afar et al., 1992; Culic et al., 1992) . Overexpression of these isoforms may be detrimental to cell growth and/or differentiation. (2) In contrast to what has been described for human BGP isoforms (Barnett et al., 1989) there are, in fact, two classes of mouse Bgp isoforms. Sequence analyses performed on the cDNA isoforms from CD-l mice have demonstrated that the Nl-terminal domain is only associated with either the Ala or A2a domain, while the N2terminal domain is joined to the Alb or A2b domain. These C2-set Ig domains differ from each other by only a few nt while the N-terminal domains exhibit a greater number of changes. The A2 domain is also a required structural feature of the mouse colon Bgp isoforms contrary to the human BGPb and BGPd isoforms (Barnett et al,, 1989) which lack this domain as do BGPg, BGPh, BGPx and BGPy (T. Barnett, personal communication) . However, all isoforms examined in this report carry identical TM and long or short Cyt domains. The length of the 3'UTR varies but the sequence within remains invariable (except for a few nt modifications in a highly repetitive region), strongly suggesting that a single Bgp gene encompasses all exons. However, the generation of isoforms bearing a few nt modifications could also be explained by allelic variation. The present characterization was done on the outbred CD-I mouse. A recent report suggests that expression of the two classes of isoforms (defined by their N-terminal domains) are due to allelic variation of the same gene. Balb/c mice do not express the N2-containing cDNA isoforms and conversely, SJL/J mice lack the Nl-bearing isoform transcripts and proteins (Dveksler et al., 1993) . The splicing versus allelic variation issue will be resolved with the characterization of the complete Balb/c Bgp gene (P.N. and N.B., in progress). (3) The feature which renders the two isoform classes distinct lies in their N-terminal domains. The first 37 aa of these domains are identical, while the last third encloses many modifications mostly concentrated in a central core. Recently, Bates et al. have constructed a three-dimensional model for human CEA based on the NMR structure of the rat CD2 as well as X-ray crystal structures of human CD4 and Ig variable domains (Bates et al., 1992) . The first two domains of CEA are predicted to have a rod-like appearance with numerous tightly packed p sheets and a few exposed loops on either side of the molecule. Interestingly, the most divergent aa residues (positions 72-90) within the mouse Bgp N-terminal domains would be localized in the C-C' loop which is predicted to be exposed. On the other hand, some of the conserved 37 aa are positioned within another exposed loop. These loops are the most likely targets for interactions with other cells or ligands. In fact, two described properties of these mouse glycoproteins are dependent upon such interactions; two protein products of the Bgp cDNA isoforms (BgpA and BgpB) can function in vitro as cell adhesion molecules (Turbide et al., 1991; McCuaig et al., 1992) and one (MHVR or BgpA) has also been shown to bind to MHV spike glycoprotein (Williams et al., 1990; Dveksler et al., 1991) . Although other domains of the CEAIBGP proteins are necessary for intercellular adhesion, the N-terminal domain appears to be critical for this function (Oikawa et al., 1991) . Similarly, other members of the Ig supergene family (CD4, ICAM-1, CD2, poliovirus receptor) require that their N-terminal domains enter into contact with either viral envelope glycoproteins (Greve et al., 1989; Mendelsohn et al., 1989; Staunton et al., 1989; White and Littman, 1989; Koike et al., 1990; Pollard et al., 1991) or their respective ligands to be functionally activated (Staunton et al., 1990; Koike et al., 1991; Register et al., 1991) . Human BGP mRNAs and proteins appear to be less abundant than CEA in colon but more abundant in liver (Hinoda et al., 1990; Drzeniek et al., 1991) . Extensive expression analyses of these two protein entities have been hampered by the lack of specific anti-BGP antibodies which have only recently been developed . However, the mouse Bgp mRNAs (Turbide et al., 1991; McCuaig et al., 1992) and proteins (M.R., P.N., S.J., C.T., D.F. and N.B., manuscript in preparation) are more abundant in colon than liver. Since a true CEA homolog or any of the phosphatidylinositol-linked CEA family members have yet to be identified in the mouse or rat models, it is tempting to speculate that the functions associated with CEA in the human may represent a recent evolutionary event @tanners et al., 1992) and that these functions may be ensured in the mouse by other Bgprelated genes (P.N. and N.B., in preparation) . The significance of the concurrent expression of the many Bgp isoforms on the surface of mouse tissues such as colon, intestine, liver and uterus (Turbide et al., 1991; McCuaig et al., 1992) may be that these proteins play a determinant role, through self-or heterologous contact, in renewal and/or differentiation of their epithelia. Tyrosine phosphorylation of biliary glycoprotein, a cell adhesion molecule related to carcinoembryonic antigen Carcinoembryonic antigens: alternative splicing accounts for multiple mRNAs that code for novel members of the carcinoembryonic antigen family A predicted threedimensional structure for the carcinoembryonic antigen Isolation and characterization of full-length functional cDNA clones for human carcinoembryonic antigen A mouse analogue of the human carcinoembryonit antigen Carcinoembryonic antigen, a human tumor marker, functions as an intercellular adhesion molecule Cloning of a carcinoembryonic antigen gene family member expressed in leukocytes of chronic myeloid leukemia patients and bone marrow Immunosuppression by human placenta lactogen (HPL) and the pregnancy-specific Bl-glycoprotein (SPI): inhibition of mitogen-induced lymphocyte transformation Molecular cloning and expression of a new rat liver cell-CAM105 isoform A comprehensive set of sequence analysis programs for the VAX Identification of membrane antigens in granulocytes and colonic carcinoma cells by a monoclonal antibody specific for biliary glycoprotein, a member of the carcinoembryonic antigen family Cloning of the mouse hepatitis virus (MHV) receptor: expression in human and hamster cell lines confers susceptibility to MHV Several members of the mouse CEA-related glycoprotein family are functional receptors for murine coronavirus MHV-A59 Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer The major human rhinovirus receptor is ICAM-I Specific carcinoembryonic antigens of the human digestive system Carcinoembryonic antigen is anchored to membranes by covalent attachment to a glycosylphosphatidylinositol moiety: identification of the ethanolamine linkage site Molecular cloning of a cDNA coding for biliary glycoprotein I: primary structure of a glycoprotein immunologically crossreactive with carcinoembryonic antigen Transcription of biliary glycoprotein I gene in malignant and non-malignant human liver tissues The pregnancy-specific glycoprotein family of the immunoglobulin superfamily: identification of new members and estimation of family size Identification of a carcinoembryonic antigen gene family in the rat The poliovirus receptor protein is produced both as membrane-bound and secreted forms Functional domains of the poliovirus receptor Expression of an NCA cDNA in NIH/3T3 cells yields a 1 IOK glycoprotein, which is anchored into the membrane via glycosylphosphatodylinositol Binding of Escherichia cnli and Salmonella strains to members of the carcinoembryonic antigen family: differential binding inhibition by aromatic a-glycosides of mannose Cloning and expression of a cDNA coding for rat liver plasma membrane ecto-ATPase: the primary structure of the ecto-ATPase is similar to that of the human biliary glycoprotein 1 Fidelity of DNA synthesis by the Thermococcus litoralis DNA polymerase. An extremely heat stable enzyme with proofreading activity Production of single-stranded DNA by asymmetric PCR mmCGMla: a mouse carcinoembryonic antigen gene family member, generated by alternative splicing, functions as an adhesion molecule Cellular receptor for poliovirus: molecular cloning, nucleotide sequence, and expression of a new member of the immunoglobulin superfamily Cell adhesion activity of non-specific cross-reacting antigen (NCA) and carcinoembryonic antigen (CEA) expressed on CHO cell surface: homophilic and heterophilic adhesion A specific heterotypic cell adhesion activity between members of carcinoembryonic antigen family W272 and NCA, is mediated by N-domains Homotypic and heterotypic Ca 2+-independent cell adhesion activities of biliary glycoprotein, a member of carcinoembryonic antigen family, expressed on CHO cell surface CD4-binding regions of human immunodeficiency virus envelope glycoprotein gp120 defined by proteolytic digestion 3' Non-coding region sequences in eukaryotic messenger RNA cDNA and gene analyses imply a novel structure for a rat carcinoembryonit antigen-related protein Human-murine chimeras of ICAM-identify amino acid residues critical for rhinovirus and antibody binding Biliary glycoprotein, a member of the immunoglobulin supergene family, functions in vitro as a Ca* +-dependent intercellular adhesion molecule Characterization of murine carcinoembryonic antigen gene family members Molecular Cloning. A Laboratory Manual DNA sequencing with chainterminating inhibitors Cloning of the complete gene for carcinoembryonic antigen: analysis of its promoter indicates a region conveying cell type-specific expression Immunologic approaches to diagnosis of malignancies The CEA family: a system in transitional evolution? A cell adhesion molecule, ICAM-I, is the major surface receptor for rhinoviruses The arrangement of the immunoglobulin-like domains of ICAM-I and the binding sites for LFA-1 and rhinovirus Carcinoembryonic antigen-like substances of human bile: isolation and partial characterization Purification and properties of biliary glycoprotein 1 (BGP 1). Immunochemical relationship to carcinoembryonic antigen Long-range chromosomal mapping of the carcinoembryonic antigen (CEA) gene family cluster A mouse carcinoembryonic antigen gene family member is a calciumdependent cell adhesion molecule Viral receptors of the immunoglobulin superfamily The immunoglobulin superfamily: domains for cell recognition Purification of the I IO-kilodalton glycoprotein receptor for mouse hepatitis virus (MHV)-A59 from mouse liver and identification of a nonfunctional homologous protein in MHV-resistant SJL/J mice Specificity of intercellular adhesion mediated by various members of the immunoglobulin supergene family