key: cord-0706709-6f4h3cck
authors: Kozak, Marilyn
title: Pushing the limits of the scanning mechanism for initiation of translation
date: 2002-10-16
journal: Gene
DOI: 10.1016/s0378-1119(02)01056-9
sha: 97c78798cc75c9da5716e5c6e7c61f8b703ccadf
doc_id: 706709
cord_uid: 6f4h3cck

Selection of the translational initiation site in most eukaryotic mRNAs appears to occur via a scanning mechanism which predicts that proximity to the 5′ end plays a dominant role in identifying the start codon. This ‘position effect’ is seen in cases where a mutation creates an AUG codon upstream from the normal start site and translation shifts to the upstream site. The position effect is evident also in cases where a silent internal AUG codon is activated upon being relocated closer to the 5′ end. Two mechanisms for escaping the first-AUG rule – reinitiation and context-dependent leaky scanning – enable downstream AUG codons to be accessed in some mRNAs. Although these mechanisms are not new, many new examples of their use have emerged. Via these escape pathways, the scanning mechanism operates even in extreme cases, such as a plant virus mRNA in which translation initiates from three start sites over a distance of 900 nt. This depends on careful structural arrangements, however, which are rarely present in cellular mRNAs. Understanding the rules for initiation of translation enables understanding of human diseases in which the expression of a critical gene is reduced by mutations that add upstream AUG codons or change the context around the AUG(START) codon. The opposite problem occurs in the case of hereditary thrombocythemia: translational efficiency is increased by mutations that remove or restructure a small upstream open reading frame in thrombopoietin mRNA, and the resulting overproduction of the cytokine causes the disease. This and other examples support the idea that 5′ leader sequences are sometimes structured deliberately in a way that constrains scanning in order to prevent harmful overproduction of potent regulatory proteins. The accumulated evidence reveals how the scanning mechanism dictates the pattern of transcription – forcing production of monocistronic mRNAs – and the pattern of translation of eukaryotic cellular and viral genes.

The scanning mechanism for initiation of translation postulates that the small (40S) ribosomal subunit enters at the 5 0 end of the mRNA and migrates linearly, stopping when the first AUG codon is reached. Consistent with the postulated 5 0 end-dependent entry of ribosomes, translation in vivo is strongly augmented by the m7G cap (Furuichi and Shatkin, 2000; Horikami et al., 1984; Lo et al., 1998; Neeleman et al., 2001) and ribosome binding in vitro is prevented by circularization of the mRNA (Kozak, 1979a; Konarska et al., 1981) .

Perhaps because the scanning mechanism has been around for a while, the evidence for some basic points has been forgotten. One recent commentary even questions whether the 40S ribosomal subunit has anything to do with it (Mathews, 2002) . The easiest answer is that the stopscanning step is clearly mediated by pairing of the initiation codon with the anticodon in Met-tRNA i (Cigan et al., 1988a) , and the 40S ribosomal subunit is the carrier of Met-tRNA i ·eIF2. But the 40S subunit was already implicated by experiments done earlier.

The experiments that gave rise to the scanning model concerned unusual polysome-like complexes formed in the presence of edeine, an antibiotic which blocks recognition of the AUG codon (Kozak and Shatkin, 1978) . Analysis of the rapidly sedimenting complexes revealed 40S ribosomal subunits distributed throughout the body of the mRNA.

Because control experiments showed that, even in the presence of edeine, ribosomes can enter only from the 5 0 end, the simplest explanation was that 40S subunits enter at the 5 0 end and then migrate into the interior of the mRNA; in the absence of edeine, the migration would stop when an AUG codon is reached. Independent experiments confirmed that edeine is targeted to the ribosome (Herrera et al., 1986) , and use of a fractionated translation system confirmed that the edeine-induced complexes are formed by 40S but not 60S ribosomal subunits (Kozak and Shatkin, 1978; Kozak, 1979b) . Subsequent experiments, with edeine omitted, showed that scanning can be interrupted by inserting a base-paired structure between the cap and the AUG codon; the resulting abortive complexes sediment around 40S (Kozak, 1989 (Kozak, , 1998 Paraskeva et al., 1999) .

We are not yet sure which initiation factors are associated with the 40S ribosomal subunit during the scanning phase. The only factors whose role in scanning has been defined clearly are the GTP-binding protein eIF2, which escorts Met-tRNA i onto the 40S subunit, and eIF5, which activates GTP hydrolysis by eIF2 (Asano et al., 2001; Das et al., 2001) . By controlling the rate of GTP hydrolysis, eIF5 controls the fidelity of initiation, i.e. the fidelity of the stop-scanning step . Other protein factors have not yet been fitted in. (The voluminous literature on factors focuses on modifications -phosphorylation, cleavages -rather than on defining the initiation pathway. Basic questions, such as when each factor enters and leaves, have not yet been answered.) One untested possibility is that the large initiation factor eIF3, bound to the 40S ribosomal subunit, might form a clamp around the mRNA that is opened and closed by cycles of ATP hydrolysis. Scanning appears to be dependent on ATP hydrolysis (Kozak, 1980) , thereby implicating eIF4A, an RNA-dependent ATPase which might control the hypothetical clamp. Some ideas about the function of other initiation factors are reviewed elsewhere (Dever, 2002; McCarthy, 1998; Pestova et al., 2001) .

The strongest evidence that the scanning 40S ribosome/ factor complex advances linearly is the position effect on selection of the start codon: initiation at the first potential start codon has been demonstrated in rigorous experimental tests (Cigan et al., 1988a; Kozak, 1983 Kozak, , 1995 and confirmed in many 'natural tests' wherein addition or removal of an AUG codon produces the expected shift in the site of initiation (see below). The aforementioned blockade caused by inserting a base-paired structure between the 5 0 cap and the AUG START codon is further evidence that 40S ribosomal subunits traverse the leader sequence linearly, rather than hopping (discontinuous scanning) or entering directly at the AUG codon.

Although the scanning mechanism predicts that translation should initiate at the AUG codon nearest the 5 0 end of the mRNA, two ancillary mechanisms -reinitiation and context-dependent leaky scanning -enable additional initiation events at downstream AUG codons in some mRNAs. These well-defined mechanisms for escaping the first-AUG rule are discussed below. An additional escape mechanism might involve direct entry of ribosomes at an internal site in the mRNA. While there is evidence suggestive of direct internal initiation with picornavirus mRNAs, the evidence for internal ribosome entry sites (IRES) in cellular mRNAs is problematic (Kozak, 2001a) . The absence of shared structural features among candidate cellular IRES elements makes it impossible to predict which mRNAs, if any, might use such a mechanism. Rather than attempting to summarize the extensive literature on internal initiation, I refer the reader to other detailed reviews on that subject (Dever, 2002; Hellen and Sarnow, 2001; Pestova et al., 2001) .

The next section provides a terse summary of points that are easily explained by the scanning model. The bulk of the review then focuses on complicated examples and issues.

2. Constraints imposed by the scanning mechanism explain many common aspects of gene expression in higher eukaryotes

Many plant and animal viruses produce dicistronic or polycistronic mRNAs from which only the 5 0 cistron can be translated (Table 1 ). All these viruses solve the problem of 'silent 3 0 cistrons' by producing -via splicing or discontinuous transcription or an internal promoteradditional forms of mRNA in which the downstream cistron is repositioned closer to the 5 0 end. The reason for the complicated pattern of splicing seen with human immunodeficiency virus type 1 (HIV-1), for example, is simply to produce mRNAs that allow downstream open reading frames (ORFs) to be translated. The broad range of viruses represented in Table 1 merits attention.

The same problem and same solution -post-transcriptional processing of polycistronic mRNAs -underlie the expression of many genes in Caenorhabditis elegans (Blumenthal et al., 2002; Hough et al., 1999) . In mammalian cells, mRNAs that contain two full-length nonoverlapping cistrons are extremely rare and, as with the aforementioned viruses, actual translation of the 3 0 cistron probably occurs from a second, monocistronic mRNA (Pardigol et al., 1998; Westerman et al., 2001) or from a second mRNA in which the two cistrons are fused into a single translation unit (Gray and Nicholls, 2000; Hänzelmann et al., 2002) . A dicistronic transcript derived from the mouse Snurf-Snrpn locus barely supports translation of the second cistron, as discussed below in the section on reinitiation (Section 4). Recently discovered dicistronic transcripts produced from the mouse Hyal locus support translation of only the 5 0 cistron (Shuttleworth et al., 2002) . A few other reported dicistronic mRNAs await testing. Wold et al., 1995; Ziff, 1985 Parvovirus: adeno-associated Capsid protein A Capsid proteins B/C Splicing Muralidhar et al., 1994 Hepatitis B virus Core protein S proteins (envelope) Promoter switch Schaller and Fischer, 1991 Retrovirus: avian, murine Gag (capsid) protein Env protein Splicing e Pawson et al., 1977; Van Zaane et al., 1977 Retrovirus: human foamy Gag (capsid) protein Pol precursor Splicing Jordan et al., 1996 Lentivirus: HIV-1 Tat Rev and Nef f Splicing Schwartz et al., 1992 Alphavirus: Semliki Forest Nonstructural proteins Capsid protein Internal promoter Glanville et al., 1976; Strauss and Strauss, 1994 Calicivirus: feline g Nonstructural proteins Capsid protein Independent replication Carter, 1990; Neill et al., 1991 Miller et al., 1985; Shih and Kaesberg, 1976 Tobacco mosaic virus Replicase Coat and movement proteins Internal promoters Grdzelishvili et al., 2000; Hunter et al., 1976 Potato virus X 25 kDa movement protein 12 and 8 kDa movement proteins f ?? Verchot et al., 1998 Carmovirus: turnip crinkle g Replicase (p28/p88) p8 and p9 movement proteins Internal promoters Li et al., 1998; Wang and Simon, 1997 Fütterer et al., 1994 a The silent downstream cistron identified in the third column is expressed only upon being moved closer to the 5 0 end via production of a second, shorter mRNA. Translation of most genes derived from these viruses follows straightforward predictions of the scanning mechanism, although occasional deviations have been reported. In rare instances where a 3 0 cistron appears to be translated from a dicistronic mRNA (Grundhoff and Ganem, 2001; Kirshner et al., 1999; Nador et al., 2001; Stacey et al., 2000) , the virus in question employs a complicated pattern of splicing and therefore the existence of an undetected monocistronic mRNA is not beyond the realm of reason. In some other cases only a small amount of the protein encoded by the 3 0 cistron was produced, and the published RNA analyses were not sufficiently sensitive to rule out the presence of an additional subgenomic mRNA (Herbert et al., 1996). b In some cases the listed example is arbitrary, i.e. with retroviruses, coronaviruses, closteroviruses, etc., there are additional polycistronic mRNAs wherein translation is restricted to the 5 0 cistron. c Whereas DNA viruses and retroviruses use conventional promoter-switching or splicing mechanisms to generate alternative forms of mRNA that allow translation of the downstream cistron, more complicated mechanisms underlie the production of subgenomic mRNAs by some RNA viruses . d The presence of internal promoters that produce a shorter transcript for each downstream ORF is suggestive, but testing of translation is still needed for the mRNAs produced by cytomegalovirus and geminivirus. e Whereas all retroviruses employ splicing to produce the subgenomic mRNA from which envelope protein (Env) is translated, some retroviruses also employ an internal promoter which is postulated to mediate expression of novel ORFs, such as the superantigen of mouse mammary tumor virus (Reuss and Coffin, 1998) and orf-x of the virus that causes lung cancer in sheep (Palmarini et al., 2002) .

f See leaky scanning in Table 3 and Fig. 1 . g In place of the usual m7G cap, the 5 0 end of these viral RNAs carries a covalently linked protein (VPg) or is unblocked. The need for a subgenomic mRNA even in these cases emphasizes that translation is 5 0 end-dependent even when it is not cap-dependent.

h The full-length genomic mRNA supports translation of the 3 0 cistron in vitro but the 3 0 cistron is silent in vivo. The latter result is considered more reliable (Meulewaeter et al., 1992) .

Notwithstanding the documented inability to translate the 3 0 cistron in natural dicistronic mRNAs, synthetic dicistronic mRNAs -constructed by inserting a putative IRES element between two reporter genes -appear sometimes to allow translation of the downstream cistron. The interpretation that this occurs via direct internal initiation of translation has been questioned (Kozak, 2001a) and defended (Hellen and Sarnow, 2001) in other reviews.

The position effect, indicative of scanning, is seen when a mutation creates an AUG codon upstream from the normal start codon and translation shifts to the upstream site (Bergenhem et al., 1992; Cai et al., 1992; Gross et al., 1998; Harington et al., 1994; Liu et al., 1999; Lock et al., 1991; Mével-Ninio et al., 1996; Muralidhar et al., 1994; Wada et al., 1995) . In the most stringent test of the rule, the first AUG codon was shown to be the exclusive site of initiation even when the second AUG was positioned just a few bases downstream from, and in the same optimal context as, the first (Kozak, 1995) .

The position effect is seen also when removal of the first start codon activates initiation from the next AUG downstream. Some genes require production of two versions of the encoded protein, wherein the shorter version, initiated from an internal AUG codon, lacks the N-terminal domain of the longer isoform. The problem of how ribosomes can gain access to an internal start codon is solved by producing, via splicing or a downstream promoter, a second form of mRNA from which the first AUG START codon has been removed. Table 2 lists some examples. The N-terminally truncated isoform thus produced may reside in a different cellular compartment, or may function as an antagonist to the full-length protein (as seen with various transcription factors listed in Table 2 ), or may function in a surprising way. One such surprise was the discovery that a truncated form of tryptophanyl-tRNA-synthetase ('miniTrpRS') has angiostatic activity (Wakasugi et al., 2002) .

The entries in Table 2 and some other examples (Aichem and Mutzel, 2001; Beuret et al., 1999; Falvey et al., 1995; Nagpal et al., 1992) are what I call natural tests of the position rule. Additional evidence comes from experimental manipulations wherein removal of the first AUG was shown to activate initiation from a downstream site (Cahana et al., 2001; Chenik et al., 1995; Tailor et al., 2001; Thoma et al., 2001) .

The scanning mechanism predicts that the 5 0 untranslated region (5 0 UTR, an unfortunate misnomer) is actually traversed by ribosomes. This explains why translation of the major coding domain is reduced when adventitious out-offrame AUG codons occur upstream. The upstream AUG codons often create small ORFs (upORFs) which are indeed translated, as shown by detecting the encoded peptide (Hackett et al., 1986; Raney et al., 2000; Wang and Wessler, 2001) or by fusing a reporter gene to the upORF (Abastado et al., 1991; Donzé et al., 1995; Liu et al., 1999; Steel et al., 1996; Tanaka et al., 2001; Xu et al., 2001) . The fusion test is the more reliable, as small peptides are usually degraded rapidly.

Even if upstream AUG codons are arranged in a way that allows reinitiation, there is a penalty because reinitiation is usually inefficient. This topic will be discussed at length in Section 4.

The hypothesis that the 5 0 UTR is traversed by ribosomes explains why a highly structured 5 0 leader sequence is so detrimental to translation. Vertebrate mRNAs characteristically have long, GC-rich -hence highly structured -leader sequences (Kozak, 1991a; Macleod et al., 1998) , and the resulting difficulty in translation has been discovered over and over in the course of cloning. Even a short GC-rich 5 0 UTR can inhibit profoundly, as illustrated in cases where a gene produces a mixture of mRNAs with different leader sequences, and the worst-translated mRNA was found to be the form with the shortest 5 0 UTR (Jiang and Lucy, 2001; Yang et al., 1998) . A stem-and-loop structure, stabilized in some cases by a repressor protein, is most inhibitory when its proximity to the 5 0 end blocks ribosome binding (Goossen and Hentze, 1992; Kozak, 1989; Wang and Wessler, 2001) . If the structure is far enough from the 5 0 end to allow ribosome entry, the advancing 40S ribosome/factor complex apparently has some ability to disrupt base pairing, but this ability is notably less than that of 80S elongating ribosomes (Kozak, 1986a (Kozak, , 2001b Lingelbach and Dobberstein, 1988; Paraskeva et al., 1999) and is curtailed in yeast (Koloteva et al., 1997) .

While there are mechanisms for reducing the inhibitory effects of upstream AUG codons, as discussed below, no mechanism has yet been defined for modulating the inhibitory effects of secondary structure. Some studies suggest that secondary structure might be less inhibitory to translation in vivo than in vitro (Charron et al., 1998; Curnow et al., 1995; Hensold et al., 1997; Hoover et al., 1997; Morrish and Rumsby, 2001; Van der Velden et al., 2002) . This could be due to production of an alternative transcript that simply eliminates the secondary structure -a reasonable possibility given that GC-rich domains often harbor promoter elements -or to modification of the translation machinery. Interpretation of in vivo tests of translation could also be complicated by effects of secondary structure on mRNA stability (Stefanovic et al., 1999) . Whether and how translation of GC-rich leader sequences might improve in exponentially growing cells (Nielsen et al., 1995) remains an important open question.

The scanning mechanism rationalizes the occurrence of initiation at upstream ACG or CUG codons in some mRNAs. These alternative codons are usually too weak to actually substitute for the AUG START codon (reviewed by Kozak, 1999 ; for some exceptions see Falvey et al., 1995; Kiefer et al., 1994; Riechmann et al., 1999; Sadler et al., 1999) . It is not uncommon, however, for initiation to occur at an upstream non-AUG codon in addition to the first AUG (see leaky scanning in Section 3). This is observed frequently with cellular genes that have highly structured, GC-rich leader sequences (Kozak, 1991b) , perhaps because secondary structure slows scanning and thus allows more time for the mismatched codon to pair with Met-tRNA i .

With some viruses, the extra protein isoform initiated from an upstream non-AUG codon serves an essential function (Muralidhar et al., 1994; Portis et al., 1994 Portis et al., , 1996 . While the N-terminally-extended isoforms derived from some cellular genes also display distinct functions (Arnaud Table 2 Partial list of vertebrate genes that produce a second, shorter version of the encoded protein via a second form of mRNA in which an internal AUG codon becomes a functional start site upon elimination of the upstream AUG START Sun et al., 1995 a Production of long and short protein isoforms via this mechanism is seen also with genes from insects (Mével-Ninio et al., 1996) , plants (Cunillera et al., 1997; Wimmer et al., 1997) , yeast (Beltzer et al., 1988; Carlson et al., 1983; Chatton et al., 1988; Ellis et al., 1989; Gammie et al., 1999; Natsoulis et al., 1986; Wolfe et al., 1996) and viruses (Barbosa and Wettstein, 1988; Lambert et al., 1987; Liu and Roizman, 1991; Liu and Biegalke, 2002; Weimer et al., 1987; Welch et al., 1991; Wu et al., 1993b; Zheng et al., 1994). b In these cases, the long and short protein isoforms have different functional effects. Other genes that resemble this pattern, producing long and short isoforms with contrasting functions, are not listed in the table because the AUG START codon for the shorter protein is carried on an alternative exon present only in the shorter mRNA (e.g. Koski et al., 1999; Molina et al., 1993) . That arrangement does not illustrate the main point of the table, which is that a silent internal AUG codon in the longer mRNA can be activated simply by truncating the transcript. c The long and short isoforms are targeted to different cellular compartments. d The long and short isoforms are expressed in different tissues. e The long and short forms of b1,4-galactosyltransferase appear to function identically. The main significance of the promoter switch, which eliminates the first AUG START codon, is that the shorter 5 0 UTR supports translation more efficiently (Charron et al., 1998 (Charron et al., ). et al., 1999 Calkhoven et al., 2000; Spotts et al., 1997) or patterns of localization (Acland et al., 1990; Lock et al., 1991; Packham et al., 1997) , it would not be surprising if some other upstream-initiated proteins turn out to be inadvertent byproducts generated in the course of slowly traversing a GC-rich leader sequence.

Because the ability of the scanning mechanism to explain the big picture is generally accepted, the remainder of this review directs attention, not to examples that can be seen readily to support the model, but to mRNAs that seem to be poorly designed for a scanning mode of initiation. The main point is that the scanning mechanism applies even in these difficult cases.

Understandably, such mRNAs are translated inefficiently and this brings out a second important point: some critical regulatory genes require protein synthesis to be inefficient. An earlier review raised awareness that genes that encode potent regulatory proteins -cytokines, growth factors, kinases, transcription factors -often produce mRNAs in which the 5 0 leader sequence is GC-rich or burdened by upstream AUG codons (Kozak, 1991a) . Some examples described herein validate the prediction that these encumbered 5 0 sequences are nature's way of limiting the synthesis of potent proteins that would be harmful if overproduced.

I also suggested in earlier reviews that, when a cDNA sequence has so many upstream AUG codons as to challenge the applicability of the scanning mechanism, it is wise to ask whether the cDNA correctly reflects the structure of the mRNA. That advice is not changed by what is written here. Very often, cDNA sequences that appear incompatible with scanning have been found to derive from incompletely spliced transcripts or to have been misinterpreted in other ways (Kozak, 1996 (Kozak, , 2000 . In other cases, although an encumbered cDNA sequence is correct, it derives from a transcript that does not support translation (Hake and Hecht, 1993; Foo et al., 1994; Larsen et al., 2002; Lee et al., 2000) . Only after one is certain of the mRNA structure should the mechanisms below be considered.

In mammals, the optimal context for recognition of the AUG START codon is GCCRCCaugG. Within this motif, the purine (R) in position 23 is the most highly conserved (see Section 6.1) and functionally the most important position. The importance of A or G (A is somewhat better than G) in position 2 3 was proved by mutagenesis experiments on a wide variety of genes (Kozak, 1986b;  and see entries marked 'tested' in Table 3 ). The G in position þ 4 is also highly conserved and, especially in the absence of A in position 2 3, contributes strongly (Kozak, 1997) . Adherence to the rest of the GCCRCCaugG motif varies, without major consequences as long as positions 2 3 and þ 4 conform; the upstream GCC motif can be seen to contribute, however, in the absence of other elements (Kozak, 1987b) .

The aforementioned mutagenesis experiments define two extremes: (i) when the first AUG codon occurs in a strong context -ANNaugN or GNNaugG -all or almost all ribosomes stop and initiate at that point; (ii) when the first AUG resides in a very weak context, lacking both R in position 2 3 and G in position þ 4, some ribosomes initiate at that point but most continue scanning and initiate farther downstream. This leaky scanning enables the production of two separately initiated proteins from one mRNA, as documented below. It is harder to predict what happens at start sites that fall between the extremes, i.e. mRNAs in which the first potential start codon has the sequence YNNaugG, GNNaugY or GNNaugA. Leaky scanning is seen in some but not all such cases. A possible explanation suggested by studies with test transcripts (Kozak, 1990a) is that initiation might be restricted to the first AUG codon, despite a suboptimal context, when downstream secondary structure slows scanning and thus provides more time for codon/anticodon pairing. Suppression of leaky scanning via this mechanism requires a critical distance (13 -15 nt, which corresponds to half the diameter of the ribosome) between the AUG codon and the downstream structured element. Table 3 lists some examples in which two proteins are produced from one mRNA via leaky scanning. The postulated link between context and leaky scanning has been tested in many of these cases by showing that, upon improving the context at the upstream site, initiation from the second site is reduced or abolished. (Whether the second AUG codon resides in a strong or weak context is not relevant; the ribosome reads the mRNA linearly and thus the decision to stop or to bypass the first AUG is not influenced by whether there is a better initiation site downstream.) The large number of genes that employ leaky scanning precludes discussion of the biological significance of the proteins thereby produced, but it merits noting that, for many of the viruses in Table 3 , replication requires production of both listed proteins. For some other viruses, the second protein is a virulence factor that weakens host defenses (Bridgen et al., 2001; Chen et al., 2001; Weber et al., 2002) . The biological importance of these downstream-initiated proteins shows that leaky scanning is a deliberately employed tool; it does not simply reflect sloppiness on the part of the translational machinery.

The long list of examples in Table 3 conforms to expectations in that the first start codon resides in a suboptimal context. There are, however, rare instances of leaky scanning despite a good context (R 23 and G þ4 ) around the first AUG. This can happen when the first AUG codon is too close to the 5 0 end to be recognized efficiently (Kaneda et al., 2000; Kozak, 1991c; Ruan et al., 1994; Sedman et al., 1990; Slusher et al., 1991; Spiropoulou and In all mRNAs here listed, the sequence flanking the first start codon deviates from the consensus sequence in position 23 and/or position þ 4, highlighted by underlining. When the postulated link between context and leaky scanning was tested (so marked in this column), mutations that improved the context at the first start site diminished access to the downstream start site. This test failed only with cucumber necrosis virus, where the short distance between the m7G cap and AUG#1 allowed some leaky scanning even when the context was optimized. c In some cases the first and second AUG codons are in the same reading frame, generating long and short versions of the encoded protein which may function differently. In cases where the first and second start codons are in different reading frames, indicated by italicizing the second product, the extent of overlap between the two ORFs ranges from a few codons (peanut clump virus, southern bean mosaic virus) to 626 codons (turnip yellow mosaic virus). d Access to the downstream initiation site via leaky scanning is augmented by a reinitiation shunt, as explained in the text (Section 4.3) and diagrammed in Fig. 1 for C/EBPb mRNA. e Mutations that eliminate AUG#1 usually increase production of the second, downstream protein. In rare cases where the expected increase was not seen (e.g. von Hippel-Lindau, turnip yellow mosaic virus), it might be because translation of the second protein was restricted at the level of elongation. For a similar reason, improving the context around AUG#1 occasionally fails to elevate production of the protein there initiated (Fajardo and Shatkin, 1990) . These entries nevertheless satisfy the main prediction of the leaky scanning mechanism, which is that improving the context around AUG#1 prevents initiation from the second, downstream site (Fajardo and Shatkin, 1990; Iliopoulos et al., 1998) .

f Whereas feline leukemia virus produces an N-terminally-extended, glycosylated form of Gag (gp80 gag ) from the indicated weak AUG codon, the corresponding upstream start site in murine leukemia virus is ACCCUGG (Portis et al., 1994) . When that site was experimentally ablated, however, revertants expressed the extended protein from a weak upstream AUG codon (UUUaugG) created by a point mutation. Those revertants were selected because the extra glycosylated form of Gag contributes to viral spread (Portis et al., 1996) . g In the mRNAs from baboon reovirus, influenza A virus, and southern bean mosaic virus, the indicated proteins derive from the first (weak) and fourth AUG codons. AUG#2 and AUG#3 initiate small ORFs that terminate before AUG#4. Thus, a combination of leaky scanning and reinitiation probably mediates access to the downstream start site. Nichol, 1993; Werten et al., 1999) or when the facilitating effect of G in position þ 4 is canceled by U in position þ 5 (Kozak, 1997; Sloan et al., 1999; Stallmeyer et al., 1999) . Other occasional claims of leaky scanning despite a strong context at the first AUG codon were simply mistaken (Scherer et al., 1995) ; the shorter protein turned out to be translated from a second form of mRNA (Kogo and Fujimoto, 2000) .

At the opposite extreme, there are rare mammalian mRNAs in which, despite a very unfavorable context flanking the first AUG codon, translation appears to initiate exclusively at that site (Arai et al., 1991; Hickey and Roth, 1993; Leslie et al., 1992; McNeil et al., 1992; Plowman et al., 1990; Wu et al., 1993a) . Leaky scanning might be suppressed in these and a few other cases because of downstream secondary structure, or because the wider context (C in positions 21, 22, 24, 2 5; G in position 2 6) compensates to some extent for the absence of R 23 and G þ4 , or for other unknown reasons.

The same principle that allows initiation from the first and second AUG codons when the first AUG is in a suboptimal context (Table 3 ) applies in cases where translation initiates at an upstream non-AUG codon in addition to the first AUG (Acland et al., 1990; Arnaud et al., 1999; Carroll and Derse, 1993; Fajardo et al., 1993; Florkiewicz and Sommer, 1989; Fütterer et al., 1996; Fuxe et al., 2000; Lock et al., 1991; Muralidhar et al., 1994; Packham et al., 1997; Saris et al., 1991; Spotts et al., 1997) . Recognition of an upstream ACG, CUG or GUG start codon requires a strong context (Portis et al., 1994) , despite which scanning is usually leaky because the initiator codon itself is weak.

Production of long and short protein isoforms via leaky scanning is harder to regulate -e.g. to achieve tissue specific expression of one or the other form -than when a unique mRNA encodes each isoform, as in Table 2 . There are hints, however, that dual initiation via leaky scanning might be regulable (Probst-Kepper et al., 2001; Spotts et al., 1997) . This could conceivably be accomplished via proteins that stabilize downstream secondary structure or, perhaps, via a combination of leaky scanning and regulated reinitiation, if the mRNA also has small upORFs.

Among the examples in Table 3 are many plant viruses, indicating that the basic context rules extend to plant systems. Mutagenesis experiments confirm the functional importance of R in position 23 and G in position þ 4 in plant mRNAs (Jones et al., 1988; Lukaszewicz et al., 2000) and surveys of plant cDNA sequences confirm the conservation of those key positions (Pesole et al., 2000; Rogozin et al., 2001) . Unlike mammalian mRNAs, however, plant mRNAs do not show a predominance of C in positions 2 1, 2 2, 2 4 and 2 5.

The foregoing discussion pertains to mRNAs from plants and vertebrate animals. There is some evidence for contextdependent leaky scanning in fungi (Arst and Sheerins, 1996) , but context effects on initiation have not yet been studied carefully in protozoa, insects, and various other systems. The observation that trans-splicing of mRNAs in C. elegans sometimes brings a purine into position 2 3 is interesting (Hough et al., 1999) but the significance awaits testing. With a number of yeast genes, there is a hint of leaky scanning when the usual A in position 2 3 is replaced by a pyrimidine (Gaba et al., 2001; Slusher et al., 1991; Vilela et al., 1998; Welch and Jacobson, 1999; Wolfe et al., 1994) . Context effects were not evident, however, in other studies of translation in yeast (Cigan et al., 1988b) . For whatever reason, leaky scanning is rare in yeast, apart from a few cases attributable to the first AUG codon residing too close to the 5 0 end.

mRNAs that initiate translation from three sites provide a striking illustration of how far leaky scanning, alone or in combination with reinitiation, can be pushed. Fig. 1 shows some examples.

The predominant translation product obtained from cmyc mRNA is a 65 kDa 'long form 2' which initiates at the first AUG codon (Fig. 1A) . A small amount of a longer isoform (68 kDa) derives from an upstream CUG codon which is a weak start site (i.e. very leaky) because the codon is not AUG. Although the first AUG codon has the required A in position 2 3, a small percentage of ribosomes bypass that site and initiate at the next AUG, producing a third (50 kDa) form of c-myc. This happens apparently because the context flanking the first AUG codon is good but not perfect. Thus, production of the 50 kDa isoform was eliminated when the upstream site was changed from ACGaugC to ACCaugG (Spotts et al., 1997) .

Fig. 1B depicts another example in which ribosomes initiate from three in-frame AUG codons. With the mRNA that encodes C/EBPb, access to the far downstream site via leaky scanning is augmented by a reinitiation shunt, as explained in the legend to Fig. 1 and discussed further in Section 4. Translation of C/EBPa mRNA occurs by a mechanism similar to that depicted for C/EBPb except that the first start site in C/EBPa is a CUG codon (Calkhoven et al., 2000) , which generates a smaller amount of the longest protein (isoform A) than does the AUG codon in C/ EBPb. With c-myc, C/EBPb and C/EBPa mRNAs, leaky scanning is biologically important because the long and short versions of the protein have opposing effects as regulators of transcription.

It is striking that leaky scanning can operate even when the second initiation site resides far downstream from the first. With synthetic transcripts designed to test the processivity of scanning, there was no reduction in initiation from the downstream site when the inter-AUG distance was expanded stepwise from 11 to 251 nt (Kozak, 1998) . In some remarkable viral mRNAs, the second functional initiation site is more than 500 nt downstream from the Fig. 1 . Examples of 'maximally leaky' scanning wherein one mRNA produces three independently initiated proteins. Major (thick arrow) and minor (thin arrow) translation products are identified below their respective start codons. Sequences that cause the initiation site to be weak, and thus promote leaky scanning, are highlighted in red. Offset rectangles represent ORFs in different reading frames. (A) With c-myc mRNA, a leaky scanning mechanism was inferred from experiments in which optimizing the context around the first AUG codon suppressed production of the 50 kDa isoform, while changing the upstream CUG codon to AUG suppressed production of both the 65 and 50 kDa isoforms (Spotts et al., 1997) . Access to the downstream start site might be more complicated than here depicted, as there is a small out-of-frame ORF between the 65 and 50 kDa start sites. (B) With C/EBPb mRNA, a mutation that strengthens the first start codon (UUCaugC ! ACCaugG) blocked production of all shorter isoforms, implicating a leaky scanning mechanism (Calkhoven et al., 2000) . A small upORF (blue) superimposes another level of control, causing more ribosomes to bypass the start site for isoform B1 than would be expected from leaky scanning alone. Presumably because the AUG START codon for isoform B1 is positioned close to the termination site of the upORF, reinitiation at site B1 is inefficient and some ribosomes thus reach the far downstream start site for the 20 kDa isoform (LIP). As evidence for this reinitiation shunt, Calkhoven et al. (2000) showed that eliminating the AUG codon of the upORF abolished production of LIP and that strengthening or weakening the context around the upORF start codon caused corresponding changes in the yield of LIP. Although the smallest form of C/EBPb can be generated in some situations by proteolysis (Dearth et al., 2001) , the effects of the aforementioned mutations clearly implicate a translational mechanism. The LAP/LIP ratio shows tissue and stage specific variation (Dearth et al., 2001; Descombes and Schibler, 1991) . (C) Whereas leaky scanning allows initiation at multiple sites within a single ORF in C/EBPb and c-myc mRNAs, leaky scanning allows translation of three separate ORFs in the pregenomic mRNA of rice tungro bacilliform virus. These ORFs (not drawn to scale) have overlapping start and stop codons of the form AUGA. Translation via leaky scanning was inferred from the strong reduction (.13-fold) in translation of ORF2 and ORF3 when the start codon of ORF1 was changed from AUU to AUG (Fütterer et al., 1997) and from the inhibitory effect on expression of ORF3 when an adventitious AUG codon was inserted into ORF2. The 5 0 leader sequence that precedes ORF1 has ten small upORFs which are not depicted here because that peculiar leader sequence, postulated to be translated by ribosome hopping (Fütterer et al., 1996) , is not required for the leaky scanning mechanism that underlies translation of ORFs 1, 2 and 3. (D) The avian reovirus S1 mRNA supports translation of one structural and two nonstructural proteins (Bodelón et al., 2001) . The depicted mechanism postulates that ORF1 has a dual function, encoding its own polypeptide (p10) and facilitating translation of ORF3 by shunting some ribosomes past the strong AUG START codon for ORF2. The absence of extraneous AUG codons in the 310 nt region between the end of ORF1 and the start of ORF3 is consistent with the idea that ORF3 might be translated by reinitiation. Some ribosomes would be expected to translate p17 (ORF2) by leaky scanning, engendered by the poor context at the start of ORF1. Improving the context at the start of ORF1 indeed increased production of p10 (Shmulevitz et al., 2002) ; unfortunately, the yield of p17, which would be expected to decrease, was not monitored. The observation that strengthening the context at the start of ORF1 had no effect on the yield of sC is not surprising because the reinitiation mechanism postulated to underlie translation of ORF3 would probably be limited by other features, such as the relatively large size of ORF1. first AUG (Herzog et al., 1995; Sivakumaran and Hacker, 1998) .

The pregenomic mRNA of rice tungro bacilliform virus (Fig. 1C) provides the most dramatic illustration of these points. Use of a weak (non-AUG) codon to initiate ORF1 and an unfavorable context at the start of ORF2 (UACaugA) enables the majority of ribosomes to reach and initiate at the start of ORF3. The ORF3 polyprotein is thought to be a precursor from which coat protein, protease and reverse transcriptase are derived by proteolysis. The remarkable absence of AUG codons from the long (563 nt) coding domain of ORF1 and the presence of but one weak AUG codon within ORF2 underscore how carefully this mRNA is constructed to support translation via scanning. The careful construction includes minimizing the overlap between adjacent cistrons. Without that precaution, elongational occlusion might work against utilization of a far downstream start codon, as documented in other cases (Kozak, 1995) .

The avian reovirus RNA diagrammed in Fig. 1D offers another example of initiation from three sites in one mRNA. Additional experiments are needed to validate the postulated mechanism.

In contrast with the 'maximally leaky' mRNAs in Fig. 1 , the mRNAs in Fig. 2 are minimally leaky: only a small fraction of ribosomes bypass the first AUG codon and initiate downstream. Here the leaky scanning mechanism has been pushed to the limits in the sense that there is (lowlevel) initiation from a second site despite the presence of a strong context around the first AUG codon. The explanation is that the context flanking the first AUG START codon is good but not perfect. The resulting low-level leaky scanning enables the viruses depicted in Fig. 2A ,B to produce two proteins -one abundant, the second in small amountsfrom a single mRNA. Experimental manipulations that support this interpretation are summarized in the legend to Fig. 2. A few other viral genes that might fit this category have been described (Chenik et al., 1995; Jayakar and Whitt, 2002) . The hepatitis B virus example is noteworthy because, via the Rube Goldberg mechanism diagrammed in Fig. 2B , reverse transcriptase encoded by the P gene is initiated independently from a far downstream site, unlike most other reverse transcriptase genes which lack an independent start codon and therefore require frameshifting during translation of the preceding core gene.

Production of a second protein isoform via low-level leaky scanning is seen also with some cellular mRNAs. An interesting example is the production in rats of an osteogenic growth peptide (OGP) initiated from codon 85 in the histone H4 gene (Fig. 2C) . The leaky scanning explanation was tested by showing that production of OGP increased upon deleting the upstream H4 start codon, and that production of OGP was suppressed upon changing the H4 start codon from a good (AGGAAGaugU) to a perfect (GCCACCaugG) context. A similar mechanism might operate with a few other cellular genes that produce a trace amount of a second protein isoform Short and Pfarr, 2002;  it is not clear whether leaky scanning or a change in splicing underlies the translational switch described by Land and Rouault, 1998) .

The fourth example in Fig. 2 differs from the others in that a good-but-not-perfect context at the first start site serves, not to enable production of two proteins, but simply to modulate the yield of A 2A -R from the second AUG. Examination of A 2A -R genes from various organisms shows conservation of the overlapping ORF, with the upstream AUG codon always in a context that allows only low-level leaky scanning (Lee et al., 1999) . Conservation of the structure supports the interpretation that this is a device contrived to limit the production of A 2A -R protein.

Low-level leaky scanning caused by a not-quite-perfect context around the first AUG codon might occur with other mRNAs where it normally goes unnoticed because the downstream start site(s) are out-of-frame. Antigenic peptides recognized by cytotoxic T-lymphocytes (CTLs) might be produced in this way, as discussed in Section 5.4. A small degree of leaky scanning that normally goes unnoticed could become significant if a mutation that shifts the normal start codon out of frame moves a downstream AUG codon into the main reading frame. In some cases where low-level internal initiation was observed with such a mutated gene (e.g. Maser et al., 2001) , the possibility that the downstream site is reached via a combination of leaky scanning and reinitiation -a mechanism such as that proposed for hepatitis B virus (Fig. 2B ) -merits consideration.

Reinitiation occurs with mRNAs, such as those depicted in Fig. 3 , that have small ORFs near the 5 0 end. Our rudimentary understanding of what happens following translation of the first upORF may be summarized as follows.

When the 80S ribosome reaches the termination site of the upORF, the 60S ribosomal subunit is thought to be released (this has not actually been shown) while the 40S subunit remains bound to the mRNA, resumes scanning, and may initiate another round of translation at a downstream AUG codon. For the downstream reinitiation event to occur, the 40S subunit must reacquire Met-tRNA i and this appears to be an important point of control. Reacquisition of Met-tRNA i is promoted by lengthening the intercistronic domain (Abastado et al., 1991; Kozak, 1987c) , which provides more time for Met-tRNA i to bind, or by increasing the concentration of eIF2 (Abastado et al., 1991; Hinnebusch, 1997) . Genetic experiments also implicate eIF3 in the Met-tRNA i rebinding step (Garcia-Barrio et al., 1995) . Another potential point of control is at the termination site of the upORF, where certain features -perhaps nearby secondary structure (Grant and Hinnebusch, 1994; Vilela et al., 1998) -might prevent the resumption of scanning or, in some other way, prevent reinitiation. This brief summary is based on studies carried out in yeast and mammals. Some studies of reinitiation in plants suggest that the intercistronic sequence may have effects beyond simply providing time for ribosomes to reacquire Met-tRNA (Wang and Wessler, 1998) .

Some results obtained in early experiments with mammalian vectors were interpreted as evidence that ribosomes can scan backwards and thus reinitiate at an AUG codon positioned upstream from the termination site (Peabody et al., 1986 ), but recent experiments contradict Fig. 2 . Examples of minimally leaky scanning in which a strong, but not quite perfect, context at AUG#1 causes most ribosomes to initiate there while allowing a low level of initiation downstream. With the depicted viral mRNAs (A,B), the predominant product of translation is the capsid protein initiated from AUG#1. Low-level leaky scanning generates a small but adequate amount of the indicated second protein. With bovine coronavirus, a mutation in position þ4 (U ! G, indicated in red) flanking AUG#1 strongly reduced translation from the downstream site (Senanayake and Brian, 1997) , supporting the interpretation that the natural mRNA is slightly leaky because the context flanking AUG#1 is not a perfect match to the consensus sequence. With hepatitis B virus, ribosomes en route to the P start site (AUG#5) apparently bypass the weak AUG#2 by leaky scanning, while translation of the small ORF initiated at AUG#3 enables some ribosomes to miss the inhibitory AUG#4 (inhibitory because it resides in a strong context and overlaps the P ORF) and thus to reach AUG#5. Whereas the core protein start codon (AUG#1) here depicted resides in a context which allows a low level of leaky scanning, a slightly longer mRNA which encodes the pre-core protein has a stronger start codon (A in position 23) and polymerase cannot be translated from that form of mRNA (Fouillot and Rossignol, 1996) . The publications on which the scheme shown here is based (Fouillot et al., 1993; Hwang and Su, 1998 ) also discuss some alternative possibilities vis-à-vis translation of polymerase. (C) The first AUG codon in rat histone H4 mRNA initiates translation of the full-length protein. The second AUG, 85 codons downstream and in the same reading frame, initiates production of a peptide which has growth-regulatory properties (Bab et al., 1999) . (D) With rat A 2A R adenosine receptor mRNA, an overlapping upORF that initiates at an AUG codon in a strong context is used to minimize production of A 2A R protein. The overlapping arrangement precludes reinitiation but the not-quite-perfect context at the upstream start site allows low-level leaky scanning. This interpretation is supported by the observed ten-fold increase in translation of A 2A R in vivo when the start codon of the upORF was eliminated (Lee et al., 1999) . Via a second promoter, the rat A 2A -R gene produces some transcripts with additional upORFs, but no transcript has yet been found that lacks the inhibitory upORF discussed here. Here and in Fig. 3 , the major coding domain is shaded gray. Small regulatory ORFs (blue rectangles) are not drawn to scale. that view (Kozak, 2001b) . Indeed many studies have shown that the strongest inhibition is caused by an upORF that overlaps the start of the downstream cistron (Babik et al., 1999; Bates et al., 1991; Byrne et al., 1995; Cao and Geballe, 1995; Ghilardi et al., 1998; Hansen et al., 2002; Kos et al., 2002; Lee et al., 1999; Liu et al., 1999) , which would not be the case if ribosomes could move backwards to reinitiate.

The size of the first ORF is a major limitation on reinitiation in eukaryotes: reinitiation can occur following the translation of a 'minicistron' (a small first ORF) but not following the translation of a full-length 5 0 cistron. The long list of mRNAs that contain silent 3 0 cistrons (Table 1) underscores the point. The only apparent exception occurs with cauliflower mosaic virus, where a protein encoded by the virus appears to promote reinitiation following the translation of a full-length first cistron (Park et al., 2001) . The reason why reinitiation is usually restricted by the size of the first ORF is not known, but a possible explanation is that certain initiation factors dissociate from the ribosome only gradually during the course of elongation. If the elongation phase is brief -i.e. if the first ORF is a Fig. 3 . Small upstream ORFs in eukaryotic mRNAs function in various ways to modulate translation. Only the 5 0 end of each mRNA is depicted. (A) The presence of upORFs forces translation of the major ORF to occur by a reinitiation mechanism, which is usually inefficient. The extent of inhibition depends on the number and arrangement of upORFs and whether the context flanking the upstream start codon(s) allows some escape via leaky scanning. (B) Because reinitiation can occur only in the forward direction, an overlapping upORF strongly impairs translation of the major ORF. (C) Whereas type B mRNAs have a single in-frame start codon which is bypassed due to the overlapping upORF, type C mRNAs initiate from two in-frame start codons; the upORF serves to divert some ribosomes to the downstream start site. The depicted sequence is a simplified representation of GlyRS mRNA (Mudge et al., 1998) . Translation of Bag-1 mRNA can also be fitted to this pattern: the first start site is an in-frame CUG codon which produces the 50 kDa form of Bag-1; the next start site (AUG#1, out-of-frame) initiates a small upORF within which the first in-frame AUG codon (AUG#2) resides, and that AUG is thereby skipped; the 36 kDa form of Bag-1 is produced from AUG#3 which is accessed by reinitiation following translation of the small upORF (Packham et al., 1997) . Some other mRNAs that use an upORF to dodge one AUG codon in favor of another are described elsewhere (Mittag et al., 1997; Sarrazin et al., 2000) . Note that the reinitiation shunt as here defined adheres to the linear scanning mechanism, unlike a shunt postulated to operate with cauliflower mosaic virus mRNA (Ryabova et al., 2000) . (D) The common feature of mRNAs that use mechanism D is inhibition of translation in cis by a peptide encoded within the upORF. The amino acid sequence of the inhibitory peptide is different in each case (Morris and Geballe, 2000) . In the column at the far right, asterisks indicate examples in which the translational control mechanism is regulated, e.g. via a change in concentration of eIF2 (GCN4) or arginine (CPA1) or polyamines (AdoMetDC) or, more commonly, via an alternative promoter that generates a simpler form of mRNA devoid of upORFs (c-mos, MDM2, IL-12; see text for other examples, e.g. in Alderete et al., 2001) . minicistron -the factors required for reinitiation would still be present when the 40S subunit resumes scanning. Although the postulated factors have not been identified, there is evidence for the idea that the duration of the elongation phase matters: when a short upORF which normally permits reinitiation was reconfigured to contain a pseudoknot that is known to slow elongation, reinitiation failed (Kozak, 2001b) . That result makes it difficult to specify a cutoff size, i.e. one cannot say 'an upORF this long will allow reinitiation' while a longer ORF will not. The permissible size is likely to vary depending on features, such as secondary structure or codon usage, that affect the rate of elongation. As a rough guide, however, one may note that reinitiation often has been observed following translation of a ten to 12 codon upORF, and that reinitiation was substantially reduced, but not abolished, when a 13 codon upORF was lengthened to 33 codons (Kozak, 2001b) . In a different study, reinitiation occurred following a 24 codon upORF but not when the ORF was lengthened to 40 codons (Luukkonen et al., 1995) .

Some naturally occurring upORFs that strongly inhibit translation, perhaps because their size precludes reinitiation, include a 36 codon upORF in mitochondrial uncoupling protein 2 mRNA (Pecqueur et al., 2001) , a 71 codon upORF in polyoma virus JC mRNA (Shishido-Hara et al., 2000) , and a 53 codon upORF in plant S-adenosylmethionine decarboxylase (AdoMetDC) mRNA (Hanfrey et al., in press) . That the size of the upORF might be what limits translation of AdoMetDC is suggested from the five-fold increase in translation observed when the upORF was shortened from 53 to 25 codons, but that result could also be explained in other ways. (The suggested interpretation is not contradicted by the fact that an alternative upORF in AdoMetDC mRNA caused little inhibition even when lengthened to 66 codons; the alternative upORF initiates from an AUG codon in a weak context which would allow it to be bypassed to some extent by leaky scanning. The 53 codon upORF, in contrast, has a strong start codon.)

With the mouse Snurf-Snrpn transcript, where the first cistron is 71 codons long , a very low level of reinitiation might account for translation of the downstream SNRPN cistron. A naturally occurring ATG-to-AGG mutation in the start codon of the upstream SNURF cistron was found to elevate translation of SNRPN . 15fold (Tsai et al., 2002) , which implicates a scanning/ reinitiation mechanism and rules out direct internal initiation.

From cDNA sequencing data, it is clear that many vertebrate mRNAs have small ORFs upstream from the start of the major coding domain, but an accurate count of genes in this class is difficult. The tallies that have been attempted (e.g. Pesole et al., 2001; Suzuki et al., 2000) are invariably flawed by inclusion of misinterpreted cDNA sequences, such as cDNAs in which a putative 5 0 UTR with 'upstream' AUG codons turned out to be part of the coding domain or part of an intron that gets removed from the functional mRNA (Di Fruscio et al., 1998; Kozak, 1996 Kozak, , 2000 Kubu et al., 2000; Nishitani et al., 2001; Santamarina-Fojo et al., 2000; Wagner et al., 1998) . Some transcripts with long, AUG-burdened leader sequences are not associated with polysomes (Hake and Hecht, 1993; Sanchez-Góngora et al., 2000) or not able to support protein synthesis (Foo et al., 1994; Larsen et al., 2002; Lee et al., 2000) , emphasizing that not all cDNAs correspond to functional mRNAs.

A more fundamental complication vis-à-vis which genes to count is the propensity for a single gene to produce transcripts with different 5 0 leader sequences, only some of which have upstream AUG codons (Anant et al., 2002; Aplan et al., 1991; Eerola et al., 2001; Huo and Scarpulla, 1999; Kawakubo and Samuel, 2000; Laurin et al., 2000; Perälä et al., 1994; Perrais et al., 2001; Sanchez-Góngora et al., 2000; Suva et al., 1989; Tanaka et al., 2001; Tsuda et al., 2000; Zimmermann et al., 2000) . The significance of a particular form of RNA cannot always be deduced from its abundance, inasmuch as a minor transcript is sometimes the major functional mRNA (Andrea and Walsh, 1995; Babik et al., 1999; Ghilardi et al., 1998; Mitsuhashi and Nikodem, 1989; Nielsen et al., 1990) and incompletely processed transcripts are sometimes more abundant than the fullyspliced, translatable mRNA (Boularand et al., 1995; Frost et al., 2000; Xie et al., 1991; Zachar et al., 1987) . Translational regulation mediated by small upORFs is important, as discussed below, but equally important are non-translational mechanisms -use of alternative promoters or splice sites -that simplify the 5 0 UTR by eliminating upORFs in certain tissues or at certain times when elevated synthesis of the protein is required (Aizencang et al., 2000; Anusaksathien et al., 2001; Arrick et al., 1994; Babik et al., 1999; Brown et al., 1999; Horiuchi et al., 1990; Landers et al., 1997; Lee et al., 2000; Nonaka et al., 1989; Phelps et al., 1998; Ren and Stiles, 1994; Steel et al., 1996; Teruya et al., 1990) .

Because vertebrate mRNA leader sequences are often GC-rich (Section 2.4), secondary structure near the 5 0 end might impair translation even more than the presence of upstream AUG codons. Thus, it is not surprising that eliminating upstream AUG codons does not improve translation in every case (Rao et al., 1988; Wood et al., 1996) . In many cases, however, mutations targeted to the upstream AUG codons confirmed their role in restricting translation from downstream (Anant et al., 2002; Babik et al., 1999; Bates et al., 1991; Brown et al., 1999; Child et al., 1999a; Gereben et al., 2002; Ghilardi et al., 1998; Griffin et al., 2001; Harigai et al., 1996; Kos et al., 2002; Lee et al., 1999; Marth et al., 1988; Meijer et al., 2000; Pecqueur et al., 2001; Ren and Stiles, 1994; Steel et al., 1996; Tanaka et al., 2001; Tsai et al., 2002; Wang and Wessler, 1998; Wang and Rothnagel, 2001; Wera et al., 1995; Wu et al., 2002) . This occurs by a variety of mechanisms, as summarized in Fig. 3 and discussed next.

While the efficiency of reinitiation varies, there is almost always a penalty -demonstrable by showing an increase in translation when the upORF is deleted -and the penalty can be severe. Thus, the simplest function of small upORFs is to limit production of the protein encoded in the full-length ORF by making downstream translation dependent on an inefficient reinitiation mechanism (Fig. 3A) .

The best studied example is yeast GCN4, which initiates from the fifth AUG codon in the mRNA; the long leader sequence contains four small upORFs. In a series of classic experiments, Hinnebusch (1997) was able to reconstruct GCN4 regulation using only the first and fourth upORFs, and I will explain what happens in that simplified case. UpORF1 is always translated efficiently (it is the first AUG in the mRNA), after which ribosomes resume scanning and reinitiate, usually, at upORF4. UpORF4 is unusual in that its translation precludes further reinitiation events: thus, when upORF4 is translated, GCN4 is not. That is the situation in yeast cultures that have adequate nutrients. Starvation for amino acids, however, causes some ribosomes to bypass the inhibitory upORF4 and reinitiate instead farther downstream. This happens because starvation creates a pool of uncharged tRNAs which activate a protein kinase that phosphorylates, and thus partially inactivates, eIF2. When eIF2 levels fall, it takes longer for ribosomes to reacquire Met-tRNA i and thus become competent to reinitiate. The slower acquisition of competence means that some ribosomes, scanning in the reinitiation mode, will bypass the nearby upORF4 and can thus reach the downstream GCN4 start site.

Three general lessons from the GCN4 story appear to carry over to mammals. (i) Fig. 3A lists some examples of mammalian mRNAs that are translated inefficiently due to small upORFs; many other examples were cited in Section 4.2. (ii) Experimental manipulations with C/EBPb mRNA (Fig. 1B ) support the interpretation that an AUG codon which follows the upORF too closely is skipped (presumably because ribosomes have not yet reacquired Met-tRNA i ), allowing reinitiation to occur farther downstream. The same mechanism might be invoked to explain how an internal start codon is accessed in miniTrpRS mRNA (Wakasugi et al., 2002) and baculovirus IE0 mRNA (Theilmann et al., 2001) , and how c-myb gets translated from a rearranged transcript generated by retrovirus insertion (Jiang et al., 1997) . In each of these mRNAs, the first AUG codon that follows a small upORF must be bypassed to reach the functional start codon downstream. (iii) The third lesson from GCN4 pertains to regulation of reinitiation by manipulation of eIF2 levels. Although hints of this have been described with mammalian genes that encode C/EBP transcription factors (Calkhoven et al., 2000) , macrophage receptor protein CD36 (Griffin et al., 2001) and activating transcription factor 4 (Harding et al., 2000) , the point requires much more careful study.

The mRNAs discussed in connection with Fig. 3A have upORFs that terminate before the start of the major coding domain, thus allowing (inefficient) translation of the main ORF by reinitiation. In Fig. 3B , however, the upORF overlaps the start of the major coding domain. This precludes reinitiation and profoundly reduces the translational yield. Limited access to the main ORF in some of these mRNAs might be achieved by leaky scanning, as was discussed for A 2A -R (Fig. 2D) .

mRNAs derived from the human thrombopoietin (TPO) gene have structures similar to that depicted in Fig. 3B and much can be learned from the TPO story, as outlined in Fig.  4 . The normal gene produces a mixture of transcripts with different leader sequences, all of which translate TPO poorly because of an overlapping upORF (upORF7 in Fig.  4 ). Targeted mutagenesis (Ghilardi et al., 1998) confirmed that upstream AUG#7 is primarily responsible for blocking translation of TPO. This is because its near-optimal context (GCCGCCUCCaugG) prevents leaky scanning and the overlapping arrangement precludes reinitiation.

Various mutations that restructure the 5 0 UTR in ways that increase production of TPO cause hereditary thrombocythemia. Translation of TPO normally initiates at AUG#8 in exon 3, but a splice-site mutation that causes deletion of exon 3 causes initiation to shift to a previously silent inframe codon (AUG#9) in exon 4; this is diagrammed in the center of Fig. 4 . The resulting truncated form of TPO lacks only the first four amino acids and appears to function normally . The problem -the cause of the pathology -is that the mutation greatly increases translation of TPO by removing the inhibitory upORF7. In two other families affected with hereditary thrombocythemia, production of TPO is elevated by mutations that restructure upORF7. In one case, deletion of a G residue shifts upORF7 into the same reading frame as TPO, thereby causing overproduction of an elongated form of TPO initiated from AUG#7 Kondo et al., 1998) . In the other case, a G ! T mutation creates a terminator codon within upORF7 and this shortening of the ORF, which now terminates 31 nt before AUG#8, enables efficient reinitiation at AUG#8 .

These insightful studies of TPO expression make two important points: (i) the bulk of the transcripts produced by the wild type gene are virtually untranslatable; and (ii) it is necessary for this potent cytokine to be translated poorly; overproduction results in disease. With TPO as precedent, one suspects that in other cases where -despite the production of alternative leader sequences -it is hard to find even one form of mRNA devoid of upstream AUG codons (e.g. Larsen et al., 2002; Lee et al., 1999; Pecci et al., 2001; Peterson and Morris, 2000; Wang et al., 1999) , the goal is to ensure that translation is very, very inefficient. The wig-1 growth-regulatory gene might be another example: an overlapping upORF initiates from a strong AUG codon, while the wig-1 start codon itself is weak, and these distinctive features are conserved between the human and mouse genes (Hellborg et al., 2001) .

Whereas an overlapping upORF functions simply to down-modulate translation in the examples depicted in Fig.  3B , with the mRNAs in Fig. 3C the overlapping upORF qualitatively affects the protein output. Ribosomes that translate the upORF thereby miss the first in-frame AUG codon but proceed to reinitiate at another start codon Fig. 4 . A low-level reinitiation mechanism normally prevents overproduction of TPO. Translational yields from various forms of TPO mRNA in transfected COS cells (far right column) are expressed relative to a control transcript that has a short, unencumbered 5 0 UTR. P1 and P2 are alternative promoters; a cluster of arrows indicates that P2 produces staggered start sites. The TPO coding domain (horizontal black bar) begins at an AUG codon which is labeled #8 because, in the longest form of mRNA (line 1), it is preceded by seven AUG codons that initiate small upORFs. Superscript letters indicate whether each upstream AUG resides in a strong (S) or weak (W) context and horizontal blue lines depict the approximate length and arrangement of the upORFs. Vertical lines demarcate the boundaries of exons; carets depict the introns in alternatively spliced transcripts. Only the beginning of the TPO coding domain (exons 3 -7) is shown. The key point is that the normal set of transcripts supports translation poorly because upORF7 overlaps the TPO start site. Various mutations (shown in red) that relieve this constraint elevate the translation of TPO, and this overproduction causes hereditary thrombocythemia. Among the normal set of mRNAs, the 'rare' transcript from promoter P1 (line 2) supports translation slightly better than the others, perhaps because the short distance between upORF2 and AUG#7 enables some reinitiating ribosomes to bypass AUG#7 and thus reach AUG#8. Because of the strong context at AUGs #1 and #2, upORFs 1 and 2 would be more effective than upORFs 5 and 6 in setting up this reinitiation shunt. The depicted scheme is based on experiments described by Ghilardi et al. (1998) and Wiestner et al. (1998) . Additional mutations diagrammed near the bottom of the figure were described by , , and Kondo et al. (1998). downstream. If the upORF itself has a suboptimal initiation site (U in position 2 3 in the depicted example), leaky scanning will allow some production of the long protein isoform from the first in-frame AUG codon while the reinitiation shunt promotes production of the shorter protein isoform. The operation of a reinitiation shunt is most obvious when the upORF overlaps a potential start codon, as shown in Fig. 3C , but the same principle applies in cases (discussed in Section 4.3) where, although the upORF terminates prior to a potential downstream start codon, the intervening distance is too short to allow reinitiation.

The fourth regulatory mechanism diagrammed in Fig. 3 is used only rarely. Mammalian AdoMetDC mRNA is the best studied example in which a small upORF encodes a peptide which functions in cis to inhibit downstream translation. The nascent peptide (MAGDIS) produced during translation of the upORF is thought to interact with ribosomes in a way that prevents completion of the termination process and thus prevents reinitiation (Law et al., 2001) . The stalled ribosome, held at the termination site of the upORF, would also block other ribosomes from reaching the downstream start site via leaky scanning. Biologically, this mechanism is important because Ado-MetDC is a key enzyme in polyamine biosynthesis and, at least in vitro, elevated polyamine levels stabilize the ribosome complex stalled at the end of the upORF (Law et al., 2001) . In other words, elevated polyamine levels down-regulate translation of AdoMetDC. It is interesting to note parenthetically that antizyme, a protein that downregulates polyamine levels, is also translated via a polyamine-sensitive mechanism. Elevated polyamine levels up-regulate production of antizyme by promoting a ribosomal frameshift needed to translate the full-length protein (Ivanov et al., 2000) .

The foregoing examples illustrate how reinitiation operates as part of the normal translation mechanism in cases where upORFs are constitutively present in mRNAs. There are other cases in which a reinitiation mechanism kicks in only when a nonsense mutation is introduced in a way that truncates the coding domain. In effect, the normal AUG initiator codon becomes the start of an upORF, following which reinitiation occurs at a normally silent internal AUG codon (Chang and Gould, 1998; Ledley et al., 1990; Zoppi et al., 1993) . The N-terminally truncated protein thus produced sometimes retains enough function to mitigate the pathological effects of the nonsense mutation (Chang and Gould, 1998) . This potential rescue device often fails, however, because many mRNAs are rapidly degraded when a nonsense codon is introduced (Frischmeyer and Dietz, 1999; He and Jacobson, 2001) . The mRNA decay pathway that targets these abnormal mRNAs is activated in part by cis elements located in the coding domain (Gudikote and Wilkinson, 2002) , which might explain why normal upORF-containing mRNAs (e.g. those discussed in Figs. 3  and 4) are not rapidly degraded.

Initiation factor eIF2 plays a key role in translational control (Clemens, 2001; Dever, 2002) and mutations that perturb regulation of eIF2 have profound pathological consequences (Delépine et al., 2000; Han et al., 2001; Harding et al., 2001; Van der Knaap et al., 2002) . Human genetic disorders have been traced also to disruption of regulatory mechanisms mediated by mRNA binding proteins Cazzola and Skoda, 2000; Cazzola et al., 2002; Kaytor and Orr, 2001; Mikulits et al., 1999) . Here, however, I focus on pathologies resulting from increases or decreases in translation caused directly by changes in mRNA structure. The preceding paragraph mentioned some cases in which an N-terminally truncated protein is produced, apparently by reinitiation, when a mutation introduces a premature nonsense codon. The effects of some other types of mutations can also be understood in light of the scanning mechanism, as outlined next and discussed elsewhere in more detail (Kozak, 2002) .

Recent investigations have identified diseases that result from failure to produce one of the two protein isoforms derived from genes that encode certain transcription factors (Table 4 ). Because the second isoform often functions as a modulator, the transcriptional imbalance caused by these changes in translation can have serious consequences.

Hereditary diseases have been traced occasionally to point mutations that alter the context flanking the AUG START codon. The list includes a-thalassaemia caused by an A ! C change in position 23 of the a-globin gene (Morlé et al., 1985) , androgen insensitivity syndrome caused by a G ! A mutation in position þ4 of the androgen receptor gene (Choong et al., 1996) , and ataxia with vitamin E deficiency caused by a C ! T mutation in position 21 of the a-tocopherol transfer protein gene (Usuki and Maruyama, 2000) . There is an interesting report of a somatic point mutation (G ! C in position 23) in the BRCA1 gene in a highly aggressive case of sporadic breast cancer (Signori et al., 2001) . In mice, a screen for mutations that cause defects in eye development uncovered an A ! T change in position 23 of the Pax6 gene (Favor et al., 2001) . Each of these mutations was shown to cause a decrease (generally two-to four-fold) in translation.

Not every mutation or polymorphism within the consensus motif can be explained simply, however. Other considerations, such as codon usage, might prevent an increase in translation even when the context is improved (i.e. translation might be limited at the level of elongation rather than initiation), and some mutations near the AUG codon might affect mRNA processing or stability rather than translation per se. A clinically relevant polymorphism in position 2 1 of annexin V appears to have an effect on translation which is inconsistent with the context rules (González-Conejero et al., 2002) , but the effect was small and documented only by assaying translation in vitro, which is not always reliable (Section 6.2). A natural polymorphism in position 2 5 of the glycoprotein Iba gene displayed a small effect on translation in vitro that was consistent with the rules (C worked better than T; Afshar-Kharghan et al., 1999) but, in the same study, mutations that changed position þ 4 from C to G did not augment translation. Testing mutations in position þ 4 is tricky, however, because the change in identity of the penultimate amino acid might affect protein stability in ways that obscure the effects on translation. The solution is to use an assay that directly monitors the initiation step of translation (Kozak, 1997) .

The scanning mechanism predicts that a mutation which weakens or destroys the normal start codon should activate initiation from the next AUG downstream. In some hereditary diseases in which the AUG START codon is ablated, a truncated protein is indeed produced in this way but it does not function well enough to prevent the disease (Cahana et al., 2001; Huang et al., 1999; O'Neill et al., 2001) . In the case of a mutated vasopressin gene in which the G of the AUG START codon is deleted, the shorter signal peptide initiated from the second AUG codon is not recognized by signal peptidase (Beuret et al., 1999) . The resulting uncleaved vasopressin-precursor protein folds incorrectly, causing subsequent processing steps to fail, and therefore vasopressin never gets released from the endoplasmic reticulum. The second AUG is only four codons downstream from the first, but the processing defect caused by this slight shift in the site of initiation causes diabetes insipidus.

The scanning mechanism predicts that, when an out-offrame AUG codon is introduced into the 5 0 UTR, the adventitious upstream start codon should supplant the normal start site. A number of pathologies result from this kind of translational block. Sometimes the upstream AUG codon is created by a rare mutation (Cai et al., 1992; Liu et al., 1999) . Other times it derives from a common polymorphism (Bergenhem et al., 1992; Endler et al., 2001; Kanaji et al., 1998; Kraft et al., 1998; Zysow et al., 1995) . The reduction in translation is more or less severe depending on the context of the upstream AUG codon and whether reinitiation is possible.

I have already explained how hereditary thrombocythemia is caused by mutations that elevate translation of TPO by restructuring or eliminating an inhibitory upORF (Fig.  4) . Translation of proto-oncogenes is also elevated in some cases by eliminating small upORFs from the mRNA. The MDM2 oncogene is one example: whereas the normal mRNA has a long 5 0 UTR that includes two upstream AUG codons, in tumor cells the use of a different promoter eliminates the upstream AUGs and thus increases translational efficiency 20-fold (Brown et al., 1999; Landers et al., 1997) . In the case of oncogene GLI1, the upstream AUG codons that restrict translation in normal cells reside in an intron which is eliminated by more efficient splicing in basal cell carcinomas (Wang and Rothnagel, 2001) . Translation of many other human or rodent oncogenes is restricted in normal cells by an encumbered (AUG-burdened or GC-rich) leader sequence (Arrick et al., 1991; Bates et al., 1991; Child et al., 1999a; Harigai et al., 1996; Hoover et al., 1997; Horvath et al., 1995; Manzella and Blackshear, 1990; Sarrazin et al., 2000) ; in some of these cases, a shorter 5 0 UTR that better supports translation is produced in transformed cells (Arrick et al., 1994; Marth et al., 1988) . For other oncogenes, although there are alternative leader sequences that might regulate expression in normal tissues (Link et al., 1992; Sasahara et al., 1998) , there is no evidence that switching leader sequences contributes to tumorigenesis. Table 4 Pathologies resulting from a change in mRNA structure which selectively abolishes production of the long or short form of a transcription factor

Translational mechanism that normally generates two protein isoforms Disease-associated change in mRNA structure and translation References C/EBPa (human) Two proteins from one mRNA via leaky scanning þ reinitiation shunt

In acute myeloid leukemia, mutations near amino terminus eliminate production of longer isoform. Pabst et al., 2001 GATA1 (human) Two proteins from one mRNA via leaky scanning

In Down syndrome-related leukemia, premature stop codon eliminates production of longer isoform. Wechsler et al., 2002 LEF1 (human) Two proteins from two mRNAs (via two promoters)

In colon cancer, failure to activate downstream promoter prevents production of shorter (inhibitory) isoform. Hovanes et al., 2001 Rx/rax (mouse) Two proteins from one mRNA via leaky scanning a

In eyeless mice, mutation of second AUG, leaving only the weak upstream start codon, results in inadequate yield. Tucker et al., 2001 a Here the long and short isoforms appear to function identically; the significance of the second AUG START codon pertains to boosting the overall protein yield. The eyeless mouse serves as a spontaneous model for human anophthalmia.

Whereas removal of small upORFs elevates the translation of the aforementioned MDM2 and GLI1 oncogenes in tumor cells, addition of small upORFs shuts off the translation of some tumor suppressor genes. In the case of HYAL1, retention of an intron which contains eight upstream AUG codons renders the mRNA untranslatable in squamous cell carcinomas (Frost et al., 2000) . A striking example of translational inactivation of a tumor suppressor gene is seen in some individuals with a predisposition to melanoma. In certain families, a point mutation (G ! T) creates an upstream, out-of-frame AUG codon in the CDKN2 gene . The small upORF initiated from this new AUG codon overlaps the CDKN2 start codon, and the resulting inhibition of translation is profound.

Structural changes that attenuate the translation of viral mRNAs can contribute to the development of persistent infections. The 5 0 leader sequence on bovine coronavirus mRNAs, for example, was found to evolve -by acquiring a small upORF -during the course of establishing a persistent infection (Hofmann et al., 1993) . Shishido-Hara et al. (2000) speculate that human polyomavirus JC might cause persistent rather than acute infection because all capsid-protein encoding transcripts produced by the JC virus have a small upORF. With the related simian virus 40, in contrast, the upORF is sometimes eliminated by splicing, generating transcripts that better support translation of the major capsid protein. Attenuating effects caused by introducing an upstream AUG codon have been described also with other viruses (Petty et al., 1990; Slobodskaya et al., 1996) .

A more drastic restructuring of mRNAs sometimes occurs during the establishment of persistent infections by the measles virus. Instead of the normal monocistronic mRNA for the fusion protein, the predominant transcript in some persistently infected cells was a dicistronic mRNA from which the F cistron, located at the 3 0 end, could not be translated (Hummel et al., 1994) . A similar problem encountered in studies with recombinant rhabdoviruses provides insight into the transcriptional defects that can generate untranslatable dicistronic mRNAs (Quiñones-Kochs et al., 2001) .

In the case of a human parvovirus, productive infection is restricted to a subset of erythroid cells in which splicing generates a monocistronic mRNA for each of the major capsid proteins. In nonpermissive cells, a slight shift in the position of a splice site imposes an upstream ORF which is postulated to restrict translation of the capsid proteins (Brunstein et al., 2000) .

Translational twists sometimes generate antigens which, by stimulating the CTL response, are important in the host defense against tumor cells and viruses (Shastri et al., 2002) .

Leaky scanning is a likely explanation in several cases where the major ORF starts with an AUG codon in a suboptimal context and the CTL antigen derives from initiation at the next (out-of-frame) AUG (Aarnoudse et al., 1999; Bullock et al., 1997; Probst-Kepper et al., 2001; Rimoldi et al., 2000) . In one notable case, translation shifts upstream to an in-frame AUG codon created during insertion of a provirus, and the resulting novel N-terminal amino acid extension functions as a tumor rejection antigen (Wada et al., 1995) .

The scanning mechanism cannot explain the translation of CTL antigens for which the start codon resides far in the interior of the mRNA (Ronsin et al., 1999; Wang et al., 1996) . In these cases the antigenic peptide might be produced from an undetected alternative form of mRNA. Sensitive new assay techniques employed with some genes indeed reveal an array of alternative transcripts from which novel tumor antigens can be translated (Behrends et al., 2002) . In another study, a potent tumor rejection peptide, which maps to an internal AUG codon in the full-length cDNA, was expressed experimentally from a truncated cDNA wherein the start codon for the antigenic peptide was made the first AUG (Rosenberg et al., 2002) . Additional analyses are needed to determine whether, in the melanoma cells wherein this antigen is expressed naturally, a transcript similar to the experimentally truncated cDNA is produced via a downstream promoter or splice site.

6. Surveys and assays and problems therein 6.1. cDNA surveys Surveys of mRNA/cDNA sequences differ in other details, but every survey confirms the presence of a purine in position 2 3 in most ($ 90%) vertebrate mRNAs (Kozak, 1987a; Pesole et al., 2000; Rogozin et al., 2001; Sakai et al., 2001) . The occasional survey that purports to challenge the context rules involves distortions, such as emphasizing the low percentage of cDNAs that have the full consensus sequence while ignoring the high percentage of cDNAs that have the critical purine in position 2 3 (Peri and Pandey, 2001) . A major uncertainty pertaining to all cDNA surveys concerns the validity of the database. When I re-examined the entries in one study (Suzuki et al., 2000) , I found numerous instances in which the AUG START codon had been misidentified; the corrected start sites adhered more closely to the consensus motif (Kozak, 2000) . Some authors pre-emptively defend their conclusions on the grounds that the (unidentified) cDNA sequences used for their analysis derive from RefSeq, which is a curated database (Pruitt et al., 2000) . But the entries in RefSeq are not without errors, some of which -e.g. misidentified start codons, mistaken claims of upstream AUG codons -can be traced by comparing curated GenBank entries NM_005493, NM_005502, NM_000282 and NM_003605 with results published else-where (Campeau et al., 2001; Nishitani et al., 2001; Nolte and Müller, 2002; Santamarina-Fojo et al., 2000) .

Some cDNA surveys use misleading terminology, e.g. referring to upstream AUG codons as 'unused' (Peri and Pandey, 2001) . Given that upstream AUG codons are used, as proved by detecting the encoded peptide or by fusing the upORF to a reporter gene, it is not anomalous to find a good context around some upstream AUG codons. All surveys tend to overestimate the incidence of upstream AUGs by scoring only the longest cDNA isoform, ignoring the existence of alternative transcripts that have shorter, unencumbered 5 0 UTRs (Section 4.2). The significance of upstream AUG codons also tends to be misstated: the presence of small upORFs in vertebrate mRNAs which are thereby translated inefficiently (see the foregoing discussion of TPO, oncogenes, etc.) constitutes evidence for, rather than against, the scanning model. The first-AUG rule, which I cite as evidence for the scanning mechanism, derives not from statistical analysis of cDNA sequences but from the experimentally observed fact (Section 2.2) that translation shifts predictably upstream or downstream when an AUG codon is added or removed. In short, it makes more sense to use the scanning/context rules to evaluate cDNA sequences (Hatzigeorgiou, 2002) than to attempt the reverse.

While conclusions about translation derived from experimental studies are arguably more meaningful than those derived from statistics, the interpretation of experimental results can be complicated. In vivo assays avoid the problem of reaction-conditions-dictating-the-outcome (see next paragraph), but there are other potential traps. The usually-valid assumption that polysomal association identifies actively translated mRNAs is called into question by the recent discovery of mRNAs trapped on large polysomes from which there is no polypeptide production (Rüegsegger et al., 2001) . The major problem when translation is studied in vivo is uncertainty about the structure of the mRNA. For example, a claim that IRES-mediated translation is developmentally regulated (Créancier et al., 2000 (Créancier et al., , 2001 is premature, inasmuch as those studies monitored the amount but not the form of mRNA. When mRNA structure is examined, the developmentally regulated expression might be found to reflect activation of an internal promoter rather than activation of an IRES. Other studies wherein translation of an encumbered leader sequence appears to improve under certain conditions or in certain cell types require better analyses to rule out a possible change in structure of the 5 0 UTR (Bernstein et al., 1995; Child et al., 1999b; Li et al., 2001; Zimmer et al., 1994) . Some useful hints may be found in reports that describe the belated discovery of alternative forms of mRNA that were missed the first time around Cortner and Farnham, 1990; Déjardin et al., 2000; Deng et al., 2002; Frost et al., 2000; Grundhoff and Ganem, 2001; Jordan et al., 1996; Kastner et al., 1990b; Kiss-László et al., 1995; Laurin et al., 2000; Peremyslov and Dolja, 2002; Zhang and Liu, 2000; Zheng et al., 1994) .

In vitro translation assays pose a different set of problems. The commercial availability of in vitro translation kits is both a blessing -the systems are easy to useand a curse. The latter because insufficient attention is paid to reaction conditions that can affect the selection of AUG start codons. When the magnesium concentration is too low, the first AUG codon may be bypassed despite an adequate context; when the magnesium concentration is too high, initiation may occur at upstream non-AUG codons that are not naturally used. One solution is to include control transcripts for which start-codon selection was determined in vivo, and to adjust the in vitro reaction conditions to give the same result (Kozak, 1990b) . Some suppliers of translation kits make it possible to adjust the magnesium concentration, but there is little awareness of the need to do so and the use of coupled transcription/translation systems makes it difficult.

For whatever reason, in vitro translation results sometimes deviate significantly from what is seen in vivo vis-àvis access to internal AUG codons (Grove et al., 1991; Land and Rouault, 1998; Meijer et al., 2000; Meulewaeter et al., 1992; Mitchelmore et al., 2002; Saucedo et al., 1999) and the degree of inhibition caused by small upORFs (Ghilardi et al., 1998; Harigai et al., 1996; Pecqueur et al., 1999 Pecqueur et al., , 2001 Tanaka et al., 2001; Wang and Wessler, 1998) . The fidelity of initiation in vitro is clearly impaired, possibly due to degradation of the mRNA, in cases where extraneous, low molecule weight polypeptides are produced (Herbert et al., 1996; Liu and Biegalke, 2002; Lekven et al., 2001, Fig. 3C; Maser et al., 2001, Fig. 4b; Packham et al., 1997, Fig. 5) .

The possibility that the input mRNA might undergo cleavage during incubation in vitro complicates attempts to study the expression of dicistronic mRNAs, as discussed in the next section. This type of artifact is not ruled out by finding that only certain downstream ORFs are translated (O'Connor and Brian, 2000) . Extrapolating from what is seen when mRNAs are deliberately cleaved in vivo (Thoma et al., 2001) , activation of internal start codons in vitro would depend on where the accidental cleavage occurs and whether the endolytic cleavage product persists long enough for a ribosome to engage the newly created 5 0 end before exonucleases take over. This line of reasoning could explain the claim that an 'artificial IRES', consisting of a multiple cloning region and a portion of the Escherichia coli lacI gene, supports internal initiation of translation in starved yeast cells (Paz et al., 1999) : starvation is likely to promote mRNA degradation, and the 'IRES' might fortuitously stabilize certain intermediates in degradation. The discovery that IRES elements are actually targeted by some ribonucleases (Elgadi and Smiley, 1999; Nadal et al., 2002) should be remembered.

Recent studies that use a primer-extension inhibition (toeprinting) assay to monitor the binding of ribosomes to mRNAs have the advantage of focusing directly on the initiation step, but care is needed to distinguish authentic initiation complexes from artifactual pauses in primer extension caused by base-paired structures or extraneous proteins bound to the mRNA. The complicating effects of mRNA secondary structure, which are prominent when avian reverse transcriptase is used for toeprinting, can be minimized by using a form of the enzyme derived from murine leukemia virus (Kozak, 1998) .

Attempts to explain the origin of multiple isoforms of eIF4G (Bradley et al., 2002) illustrate how challenging it can be to interpret translation assays. In vitro experiments presented in support of the idea that translation can initiate from five AUG codons, in a single form of eIF4G mRNA, might have been compromised by mRNA breakage; this would explain the production of an array of extraneous smaller polypeptides (Byrd et al., 2002, Fig. 3C , lanes 1, 3 and 5). Translation of some eIF4G isoforms from broken mRNAs could also explain the variability in yields noted throughout that study. When the endogenous eIF4G gene is expressed in vivo, access to certain downstream AUG codons might occur via alternative splicing or internal promoters; both mechanisms have been documented in studies of eIF4G by other investigators (Han and Zhang, 2002 , and references therein).

Thus, even though one could rationalize the production of at least three isoforms of eIF4G from one mRNA via established translational mechanisms -an overlapping upORF could shunt some 40S ribosomal subunits past the first in-frame AUG codon (position 275), and the unfavorable context at AUG 395 might allow some ribosomes to reach AUG 536 by leaky scanning -it would be premature to propose that solution. The in vitro experiments need to be repeated with careful attention to magnesium levels and with efforts to minimize mRNA breakage. The latter might be accomplished by lowering the temperature to 25 8C and limiting the window for initiation to 5 or 10 min. (Addition of edeine after the first 5 or 10 min, followed by another period of incubation, would allow polypeptides to be elongated without further initiation events.) The possible production of some eIF4G isoforms by proteolysis also needs to be ruled out, as this protein is notoriously susceptible to cleavage.

The study by Byrd et al. (2002) included experiments carried out with dicistronic transcripts, predicated on the belief that eIF4G mRNA contains IRES elements which allow direct internal initiation of translation. An eIF4G/ EGFP (enhanced green fluorescent protein) fusion gene positioned at the 3 0 end of a dicistronic transcript was translatable in vitro, but the aforementioned possibility of mRNA breakage complicates the interpretation. Indeed, the unexpected translation of EGFP from the 3 0 position even without fusion to eIF4G (Byrd et al., 2002, Fig. 7A , lane 3) is most easily explained by mRNA cleavage. (The authors invoke reinitiation as the explanation, but reinitiation cannot occur following translation of a large 5 0 cistron.) Fragmentation of the mRNA could explain why translation of the 3 0 eIF4G/EGFP cistron persisted when translation of the 5 0 cistron was blocked by a hairpin structure (Byrd et al., 2002, Fig. 7B ). The hairpin test, widely used to test for internal initiation, is meaningless without evidence that the dicistronic input mRNA remains intact.

The in vivo tests of eIF4G translation (Byrd et al., 2002, Fig. 8 ) also require careful RNA analyses to document that the vector produces only the intended dicistronic mRNA; the quality of the Northern blot in that figure falls far short of what is required. To rule out the possibility that the 3 0 cistron might be translated from an unintended monocistronic mRNA, a promoter-deletion control is needed -a control which shows that, upon deleting the promoter that precedes the 5 0 cistron, expression of the 3 0 cistron is also abolished. This test failed in studies with some other sequences, revealing that the candidate IRES actually harbors a cryptic promoter (Han and Zhang, 2002; Larsen et al., 2002) .

The foregoing discussion of eIF4G translation alludes to only some of the problems associated with dicistronic vectors; a more detailed critique may be found elsewhere (Kozak, 2001a) . Use of a certain popular vector which harbors an intron near the 5 0 end (Jopling and Willis, 2001) increases the likelihood of producing an unintended monocistronic mRNA via splicing; the candidate IRES need contribute only a cryptic 3 0 splice site. This indeed happens in some cases (Grundhoff and Ganem, 2001; Pinkstaff et al., 2001) . Claims of IRES activity are problematic when supported by in vitro assays in which translation of the 3 0 ORF is very, very weak (e.g. Deffaud and Darlix, 2000, Fig. 2; Lekven et al., 2001, Fig. 3B, lane 3; Maser et al., 2001, Fig. 4e ). The simple idea that an IRES can be identified based on the ability to support translation of a 3 0 cistron runs into trouble when, for example, the bglobin mRNA leader sequence, intended to serve as a negative control, turns out somehow to allow translation of a downstream cistron ( Van der Velden et al., 2002) . In another study, merely lengthening the intercistronic domain enabled substantial translation of the 3 0 cistron (Gallie et al., 2000) , perhaps by providing room for RNases to cleave and thus release a translatable 3 0 fragment. Even with the paradigmatic IRES elements derived from picornaviruses, the ability to support internal initiation was found to depend inexplicably on the choice and arrangement of 5 0 and 3 0 reporter genes (Hennecke et al., 2001) . These odditiesalong with the notable inability to translate the 3 0 cistron in natural dicistronic mRNAs (Table 1) -are reason to worry about the validity of experiments that employ synthetic dicistronic constructs.

The proffered rationale for a cap-independent internal initiation mechanism is that it would enable certain mRNAs to be translated when eIF4E levels decline, but recent experiments presented in support of that idea used the 5 0 UTR from poliovirus rather than 5 0 UTRs from cellular mRNAs, such as eIF4G, that are supposedly regulated via 'a dynamic interplay between cap-dependent and cap-independent processes'. Even if the proffered rationale is valid, convincing evidence for direct internal initiation with particular mRNAs is needed. The widely used dicistronic assay has flaws, as outlined above. An alternative assay which involves circularization of the mRNA has been attempted with only one viral IRES element (Chen and Sarnow, 1995) ; the results await independent verification and extension to other sequences.

The scanning model provides a framework for understanding basic patterns of eukaryotic gene expression, such as the reliance on monocistronic mRNAs, and for understanding how translation is perturbed by mutations that restructure the 5 0 UTR. A growing number of human diseases have been traced to such mutations. The scanning mechanism has been shown to operate not only with simple mRNAs that have a short 5 0 UTR and initiate at the first AUG codon, but also with mRNAs that have complicated leader sequences and multiple start codons.

One often hears the suggestion that an alternative, IRESmediated mechanism of initiation is required when a long leader sequence is encumbered by secondary structure or upstream AUG codons (Dever, 2002; Pestova et al., 2001) . That view is not well taken. Scanning can occur over long distances, as evidenced by some bifunctional viral mRNAs in which the second start site is more than 500 nt downstream from the first (e.g. peanut clump virus, southern bean mosaic virus, rice tungro bacilliform virus; Table 3 and Fig. 1C ). The structure-prone, GC-rich leader sequences on mammalian mRNAs strongly reduce translational efficiency but do not preclude operation of the scanning mechanism (Van der Velden et al., 2002) . Upstream AUG codons also reduce translational efficiency and that is why they are there. To postulate the need for an alternative mechanism is to miss the point that an encumbered leader sequence ensures that translation via scanning will be inefficient, and thus ensures against harmful overproduction of cytokines (Fig. 4) and other potent proteins.

The high frequency of intron-containing cDNA sequences (Kozak, 1991a (Kozak, , 1996 might reflect another type of regulation. Inefficient or regulated removal of the first intron has been documented in some cases (Boularand et al., 1995; Frost et al., 2000; Van der Leij et al., 2002; Wang and Rothnagel, 2001; Xie et al., 1991; Zachar et al., 1987) and I suspect that additional examples might be found -miscategorized -among the aforementioned cDNAs that are postulated to require an alternative mechanism of initiation. Removal of the intron, or use of a cryptic promoter therein, would eliminate the upstream AUG codons that are barriers to scanning. Examples in which translation is prevented deliberately by splicing-out the exon that contains the AUG START codon (Lin et al., 1998) or by other regulated splicing events (Rueter et al., 1999) underscore the point that not every transcript -hence not every cDNA -corresponds to a functional mRNA. Before postulating the need for a new mechanism to explain how a funny looking cDNA gets translated, one must be certain that it is translated.

The mechanisms discussed herein for escaping the first-AUG rule, within the constraints imposed by the scanning model, obviously cannot explain every report of initiation from an internal position. More information is needed to understand how N-terminally truncated versions of some proteins are produced apparently without truncation of the mRNA (Goss et al., 2002; Maser et al., 2001; Santagata et al., 2000; Scharnhorst et al., 1999; Vanhoutte et al., 2001 ). An IRES element was postulated in some of those cases, based on the dicistronic test, but in one study there were no accompanying analyses of RNA structure in vivo (Goss et al., 2002) , and in another study the use of an in vitro translation system produced too little of the truncated protein to be convincing (Maser et al., 2001) . Speculation about how some other interesting genes are translated (Klemke et al., 2001 ) also must be postponed pending a search for possible additional forms of mRNA. Although I listed the von Hippel-Lindau tumor suppressor gene as a possible example of leaky scanning (Table 3) , definitive tests are needed to distinguish between that and other mechanisms for producing the short isoform (Iliopoulos et al., 1998) .

Those of us with an interest in translation have a tendency to interpret every change in mRNA structure as a means to control translation, but transcriptional requirements -the need to turn on a gene in various tissues via whatever promoter works in each tissue -underlie most switches in 5 0 leader sequences. In some cases the actual sequence of the 5 0 UTR is dictated by the presence therein of transcriptional control elements (Akiri et al., 1998; Minami et al., 2001; Solecki et al., 1997; Yin and Blanchard, 2000; Yu et al., 2001; Zimmermann et al., 2000) . Regulation of transcription is the major reason for the GC-rich domains near the 5 0 end of many mammalian genes; the accompanying down-modulation of translation is an inevitable consequence -arguably a useful consequence because, given the long half-life of most mammalian mRNAs, inefficient translation might be a necessity.

It merits repeating that, although the m7G cap strongly promotes ribosome binding, the scanning mechanism is not dependent on the presence of the cap. The essence of the scanning model is 5 0 entry of ribosomes and positiondependent selection of the AUG START codon. Those key points hold with naturally uncapped mRNAs produced by some viruses (footnote g in Table 1 ) and with synthetic uncapped mRNAs used to study translation in vitro (Kozak, 1979a . The inclination to invoke internal initiation based on indirect criteria -absence of a cap, or the ability to be translated in extracts from poliovirus-infected cells -should be resisted. It is a mistake to think that, because archaeal mRNAs lack a 5 0 cap, translation in that system cannot occur via scanning. The discovery in archaea of proteins similar to certain eukaryotic initiation factors (Kyrpides and Woese, 1998) is intriguing for other reasons but has no direct bearing on whether the start codon in archaeal mRNAs might be recognized via a prokaryotic-or eukaryotic-type mechanism. That interesting question, which bears on the evolutionary origin of scanning, awaits answering.

Fundamental questions about the molecular workings of the scanning mechanism also await answering. What drives migration of the 40S subunit during the scanning phase? How does the 40S subunit hold on at a terminator codon, in order to reinitiate? What prevents reinitiation when the size of the first ORF exceeds a certain length? We know nothing about how recognition of the start codon is aided by a purine in position 23 and G in position þ 4. There is no evidence for base pairing between the GCCRCC motif and rRNA (or for binding of rRNA to any other sequence in eukaryotic mRNAs). There is as yet no convincing evidence for recognition of GCCRCC by a trans-acting protein factor. It would be easy, and meaningless, simply to find proteins that bind an RNA fragment which contains the motif. Credible experiments would require controls based on what we know about the consensus sequence: that the purine (A . G) in position 2 3 plays a dominant role, and the full effect requires that the GCCRCC motif abut the AUG codon (Kozak, 1999, Fig. 1 ).

With so much effort being directed to searching for possible exceptions to the scanning mechanism, one can only wish that some enterprising soul would tackle these important questions.

Interleukin-2-induced, melanoma-specific T cells recognize CAMEL, an unexpected translation product of LAGE-1

Suppression of ribosomal reinitiation at upstream open reading frames in amino acid-starved cells forms the basis for GCN4 translational control

Subcellular fate of the Int-2 oncoprotein is determined by choice of initiation codon

Kozak sequence polymorphism of the glycoprotein (GP) Iba gene is a major determinant of the plasma membrane levels of the platelet GP Ib-IX-V complex

Specific sequences in p120ctn determine subcellular distribution of its multiple isoforms involved in cellular adhesion of normal and malignant epithelial cells

Unconventional mRNA processing in the expression of two calcineurin B isoforms in Dictyostelium

Uroporphyrinogen III synthase. An alternative promoter controls erythroid-specific expression in the murine gene

Regulation of vascular endothelial growth factor (VEGF) expression is mediated by internal initiation of translation and alternative initiation of transcription

Abundant early expression of gpUL4 from a human cytomegalovirus mutant lacking a repressive upstream open reading frame

Apobec-1 transcription in rat colon cancer: decreased apobec-1 protein production through alterations in polysome distribution and mRNA translation associated with upstream AUGs

Identification of a brain-specific protein kinase Cj pseudogene (CPKCj) transcript

Tissue-specific and ubiquitous promoters direct the expression of alternatively spliced transcripts from the calcitonin receptor gene

Structural characterization of SIL, a gene frequently disrupted in T-cell acute lymphoblastic leukemia

Cloning and characterization of the gene encoding rabbit cardiac calsequestrin

A new 34-kilodalton isoform of human fibroblast growth factor 2 is cap dependently synthesized by using a non-AUG start codon and behaves as a survival factor

Inhibition of translation of transforming growth factor-b3 mRNA by its 5 0 untranslated region

Enhanced translational efficiency of a novel transforming growth factor b3 mRNA in human breast cancer cells

Translational initiation competence, 'leaky scanning' and translational reinitiation in areA mRNA of Aspergillus nidulans

Multiple roles for the Cterminal domain of eIF5 in translation initiation complex assembly and GTPase activation

Biosynthesis of osteogenic growth peptide via alternative translational initiation at AUG 85 of histone H4 mRNA

Expression of murine IL-12 is regulated by translational control of the p35 subunit

The two proteins encoded by the cottontail rabbit papillomavirus E6 open reading frame differ with respect to localization and phosphorylation

Biosynthesis of human fibroblast growth factor-5

Novel products of the HUD, HUC, NNP-1 and a-INTERNEXIN genes identified by autologous antibody screening of a pediatric neuroblastoma library

Yeast LEU4 encodes mitochondrial and non-mitochondrial forms of a-isopropylmalate synthase

Mutation creates an open reading frame within the 5 0 untranslated region of macaque erythrocyte carbonic anhydrase (CA) I mRNA that suppresses CA I expression and supports the scanning model for translation

The translational repression mediated by the platelet-derived growth factor 2/c-sis mRNA leader is relieved during megakaryocytic differentiation

Positive and negative regulation of myogenic differentiation of C2C12 cells by isoforms of the multiple homeodomain zinc finger transcription factor ATBF1

Mechanism of endoplasmic reticulum retention of mutant vasopressin precursor caused by a signal peptide truncation associated with diabetes insipidus

Alternative splicing of the imprinted candidate tumor suppressor gene ZAC regulates its antiproliferative and DNA binding activities

A global analysis of Caenorhabditis elegans operons

The avian reovirus genome segment S1 is a functionally tricistronic gene that expresses one structural and two nonstructural proteins in infected cells

The 2.2 kb E1b mRNA of human Ad12 and Ad5 codes for two tumor antigens starting at different AUG triplets

The human tryptophan hydroxylase gene. An unusual splicing complexity in the 5 0 -untranslated region

Mass spectrometric analysis of the N terminus of translational initiation factor eIF4G-1 reveals novel isoforms

The procaspase-8 isoform, procaspase-8L, recruited to the BAP31 complex at the endoplasmic reticulum

Bunyamwera bunyavirus nonstructural protein NSs is a nonessential gene product that contributes to viral pathogenesis

Role of two upstream open reading frames in the translational control of oncogene mdm2

Microarray identification of FMRP-associated brain mRNAs and altered mRNA translational profiles in fragile X syndrome

Identification of a novel RNA splicing pattern as a basis of restricted cell tropism of erythrovirus B19

Initiation codon scanthrough versus termination codon readthrough demonstrates strong potential for major histocompatibility complex class I-restricted cryptic epitope expression

Generation of multiple isoforms of eukaryotic translation initiation factor 4GI by use of alternate translation initiation codons

Translational control of mammalian serine hydroxymethyl-transferase expression

Targeted mutagenesis of Lis1 disrupts cortical development and LIS1 homodimerization

Two novel b-thalassemia mutations in the 5 0 and 3 0 noncoding regions of the b-globin gene

Translational control of C/ EBPa and C/EBPb isoform expression

Alternative translation initiation site usage results in two functionally distinct forms of the GATA-1 transcription factor

Structure of the PCCA gene and distribution of mutations causing propionic acidemia

Translational inhibition by a human cytomegalovirus upstream open reading frame despite inefficient utilization of its AUG codon

The secreted form of invertase in Saccharomyces cerevisiae is synthesized from mRNA encoding a signal sequence

Translation of equine infectious anemia virus bicistronic tat-rev mRNA requires leaky ribosome scanning of the tat CTG initiation codon

Transcription of feline calicivirus RNA

Translational pathophysiology: a novel molecular mechanism of human disease

A novel deletion of the L-ferritin iron-responsive element responsible for severe hereditary hyperferritinaemia-cataract syndrome

Phenotype-genotype relationships in complementation group 3 of the peroxisome-biogenesis disorders

The increased level of b1,4-galactosyltransferase required for lactose biosynthesis is achieved in part by translational control

The yeast VAS1 gene encodes both mitochondrial and cytoplasmic valyl-tRNA synthetases

Structure of the GM2A gene: identification of an exon 2 nonsense mutation and a naturally occurring transcript with an in-frame deletion of exon 2

Initiation of protein synthesis by the eukaryotic translational apparatus on circular RNAs

A novel influenza A virus mitochondrial protein that induces cell death

Translation initiation at alternate in-frame AUG codons in the rabies virus phosphoprotein mRNA is mediated by a ribosomal leaky scanning mechanism

Translational control by an upstream open reading frame in the HER-2/neu transcript

Cell type-dependent andindependent control of HER-2/neu translation

A novel missense mutation in the amino-terminal domain of the human androgen receptor gene in a family with partial androgen insensitivity syndrome causes reduced efficiency of protein translation

Two distinct protein isoforms are encoded by ntk, a csk-related tyrosine protein kinase gene

Alternative transcription and splicing of the human porphobilinogen deaminase gene result either in tissue-specific or in housekeeping expression

tRNA met functions in directing the scanning ribosome to the start site of translation

Mutational analysis of the HIS4 translational initiator region in Saccharomyces cerevisiae

Initiation factor eIF2a phosphorylation in stress responses and apoptosis

Tissue-and development-specific alternative RNA splicing regulates expression of multiple isoforms of erythroid membrane protein 4.1

Identification of the serum-responsive transcription initiation site of the zinc finger gene Krox-20

Fibroblast growth factor 2 internal ribosome entry site (IRES) activity ex vivo and in transgenic mice reveals a stringent tissue-specific regulation

c-myc internal ribosome entry site activity is developmentally controlled and subjected to a strong translational repression in adult transgenic mice

The Arabidopsis thaliana FPS1 gene generates a novel mRNA that encodes a mitochondrial farnesyldiphosphate synthase isoform

Alternatively spliced human type 1 angiotensin II receptor mRNAs are translated at different efficiencies and encode two receptor isoforms

Eukaryotic translation initiation factor 5 functions as a GTPase-activating protein

The S4 genome segment of baboon reovirus is bicistronic and encodes a novel fusion-associated small transmembrane protein

Expression and function of CCAAT/enhancer binding protein b (C/ EBPb) LAP and LIP isoforms in mouse mammary gland, tumors and cultured mammary epithelial cells

rlk/TXK encodes two forms of a novel cysteine string tyrosine kinase activated by Src family kinases

Characterization of an internal ribosomal entry segment in the 5 0 leader of murine leukemia virus env RNA

A novel subgenomic murine leukemia virus RNA transcript results from alternative splicing

EIF2AK3, encoding translation initiation factor 2-a kinase 3, is mutated in patients with Wolcott-Rallison syndrome

Transcriptional regulation of the interleukin-6 gene of human herpesvirus 8 (Kaposi's sarcomaassociated herpesvirus)

ERa gene expression in human primary osteoblasts: evidence for the expression of two receptor proteins

A liver-enriched transcriptional activator protein, LAP, and a transcriptional inhibitory protein, LIP, are translated from the same mRNA

Gene-specific regulation by general translation factors

Genomic structure of Unp, a murine gene encoding a ubiquitin-specific protease

Control of start codon choice on a plant viral RNA encoding overlapping genes

The first and third uORFs in RSV leader RNA are efficiently translated: implications for translational regulation and viral RNA packaging

Identification of eight novel 5 0 -exons in cerebral capillary malformation gene-1 (CCM1 ) encoding KRIT1

Picornavirus internal ribosome entry site elements target RNA cleavage events induced by the herpes simplex virus virion host-shutoff protein

Nucleotide sequence and expression of the small (S) RNA segment of Maguari bunyavirus

Amino-terminal extension generated from an upstream AUG codon increases the efficiency of mitochondrial import of yeast N 2 ,N 2 -dimethylguanosine-specific tRNA methyltransferases

Gastric cancers overexpress DARPP-32 and a novel isoform, t-DARPP

A common C ! T polymorphism at nt 46 in the promoter region of coagulation factor XII is associated with decreased factor XII activity

Translation of bicistronic viral mRNA in transfected cells: regulation at the level of elongation

A 31-amino acid N-terminal extension regulates c-Crk binding to tyrosine-phosphorylated proteins

The rat hepatic leukemia factor (HLF) gene encodes two transcriptional activators with distinct circadian rhythms, tissue distributions and target preferences

Molecular characterization of Pax6 2Neu through Pax6 10Neu : an extension of the Pax6 allelic series and the identification of two possible hypomorph alleles in the mouse Mus musculus

Human basic fibroblast growth factor gene encodes four polypeptides: three initiate translation from non-AUG codons

Identification of a new isoform of the human estrogen receptor-alpha (hER-a) that is encoded by distinct transcripts and that is able to repress hER-a activation function 1

A testisspecific promoter in the rat vasopressin gene

Translational stop codons in the precore sequence of hepatitis B virus pre-C RNA allow translation reinitiation at downstream AUGs

Translation of the hepatitis B virus P gene by ribosomal scanning as an alternative to internal initiation

Upstream organization of and multiple transcripts from the human folylpoly-gglutamate synthetase gene

Nonsense-mediated mRNA decay in health and disease

HYAL1 LUCA-1 , a candidate tumor suppressor gene on chromosome 3p21.3, is inactivated in head and neck squamous cell carcinomas by aberrant splicing of pre-mRNA

Viral and cellular mRNA capping: past and prospects

Splicing in a plant pararetrovirus

Position-dependent ATT initiation during plant pararetrovirus rice tungro bacilliform virus translation

Rice tungro bacilliform virus open reading frames II and III are translated from polycistronic pregenomic RNA by leaky scanning

Translation of p15.5 INK4B , an N-terminally extended and fully active form of p15 INK4B , is initiated from an upstream GUG codon

Physical evidence for distinct mechanisms of translational control by upstream open reading frames

The role of 5 0 -leader length, secondary structure and PABP concentration on cap and poly(A) tail function during translation in Xenopus oocytes

The two forms of karyogamy transcription factor Kar4p are regulated by differential initiation of transcription, translation, and protein turnover

GCD10, a translational repressor of GCN4, is the RNA-binding subunit of eukaryotic translation initiation factor-3

The mRNA structure has potent regulatory effects on type 2 iodothyronine deiodinase expression

A single-base deletion in the thrombopoietin (TPO) gene causes familial essential thrombocythemia through a mechanism of more efficient translation of TPO mRNA

Thrombopoietin production is inhibited by a translational mechanism

Hereditary thrombocythaemia in a Japanese family is caused by a novel point mutation in the thrombopoietin gene

Initiation of translation directed by 42S and 26S RNAs from Semliki Forest virus in vitro

A common polymorphism in the annexin V Kozak sequence (21 C . T) increases translation efficiency and plasma levels of annexin V, and decreases the risk of myocardial infarction in young patients

Position is the critical determinant for function of iron-responsive elements as translational regulators

Attenuated APC alleles produce functional protein from internal translation initiation

Characterization of the cis-acting elements controlling subgenomic mRNAs of Citrus tristeza virus: production of positive-and negativestranded 3 0 -terminal and positive-stranded 5 0 -terminal RNAs

Effect of sequence context at stop codons on efficiency of reinitiation in GCN4 translational control

Diverse splicing mechanisms fuse the evolutionarily conserved bicistronic MOCS1A and MOCS1B open reading frames

An imprinted, mammalian bicistronic transcript encodes two independent proteins

Mapping of the tobacco mosaic virus movement protein and coat protein subgenomic RNA promoters

A link between diabetes and atherosclerosis: glucose regulates expression of CD36 at the level of translation

The vitamin D receptor gene start codon polymorphism: a functional analysis of Fok I variants

Cloning and expression of two human p70 S6 kinase polypeptides differing only at their amino termini

Mechanisms governing expression of the v-FLIP gene of Kaposi's sarcoma-associated herpesvirus

T-cell receptor sequences that elicit strong down-regulation of premature termination codon-bearing transcripts

Mapping and expression of southern bean mosaic virus genomic and subgenomic RNAs

Synthesis in vitro of a seven amino acid peptide encoded in the leader RNA of Rous sarcoma virus

Utilization of an alternative transcription initiation site of somatic cytochrome c in the mouse produces a testisspecific cytochrome c mRNA

Heme-regulated eIF2a kinase (HRI) is required for translational regulation and survival of erythroid precursors in iron deficiency

Regulation of gene expression by internal ribosome entry sites or cryptic promoters: the eIF4G story

Human ubiquitin-activating enzyme, E1. Indication of potential nuclear and cytoplasmic subpopulations using epitope-tagged cDNA constructs

Abrogation of upstream open reading frame-mediated translational control of a plant S-adenosylmethionine decarboxylase results in polyamine disruption and growth perturbations

Mouse Atf5: molecular cloning of two novel mRNAs, genomic organization, and odorant sensory neuron localization

Functionality of alternative splice forms of the first enzymes involved in human molybdenum cofactor biosynthesis

Regulated translation initiation controls stress-induced gene expression in mammalian cells

Diabetes mellitus and exocrine pancreatic dysfunction in Perk 2 /2 mice reveals a role for translational control in secretory cell survival

Characterization of two cis-regulatory regions in the murine b1,4-galactosyltransferase gene

A cis-acting element in the BCL-2 gene controls expression through translational mechanisms

Subcellular relocalization of a long-chain fatty acid CoA ligase by a suppressor mutation alleviates a respiration deficiency in Saccharomyces cerevisiae

Translation initiation start prediction in human cDNAs with high accuracy

Upflp, Nmd2p, and Upf3p regulate the decapping and exonucleolytic degradation of both nonsense-containing mRNAs and wild-type mRNAs

Human wig-1, a p53 target gene that encodes a growth inhibitory zinc finger protein

Internal ribosome entry sites in eukaryotic mRNA molecules

Composition and arrangement of genes define the strength of IRES-driven translation in bicistronic mRNAs

The conserved 5 0 -untranslated leader of Spi-1 (PU.1) mRNA is highly structured and potently inhibits translation in vitro but not in vivo

Detection of the ORF3 polypeptide of feline calicivirus in infected cells and evidence for its expression from a single, functionally bicistronic subgenomic mRNA

An altered ribosomal protein in an edeine-resistant mutant of Saccharomyces cerevisiae

Translation of the second gene of peanut clump virus RNA 2 occurs by leaky scanning in vitro

Characterization of the gene encoding human platelet glycoprotein IX

Translational regulation of yeast GCN4. A window on factors that control initiator-tRNA binding to the ribosome

Translation initiation and assembly of peripherin in cultured cells

A translationattenuating intraleader open reading frame is selected on coronavirus mRNAs during persistent infection

Molecular basis for the dual mitochondrial and cytosolic localization of alanine:glyoxylate aminotransferase in amphibian liver cells

Pim-1 protein expression is regulated by its 5 0 -untranslated region and translation initiation factor eIF-4E

Characterization of the infections of permissive and nonpermissive cells by host range mutants of vesicular stomatitis virus defective in RNA methylation

Translational regulation of complement protein C2 expression by differential utilization of the 5 0 -untranslated region of mRNA

Multiple elements in the 5 0 untranslated region down-regulate c-sis messenger RNA translation

Caenorhabditis elegans mRNAs that encode a protein similar to ADARs derive from an operon containing six genes

b-Cateninsensitive isoforms of lymphoid enhancer factor-1 are selectively expressed in colon cancer

Rh mod syndrome: a family study of the translation-initiator mutation in the Rh50 glycoprotein gene

GTP hydrolysis controls stringent selection of the AUG start codon during translation initiation in Saccharomyces cerevisiae

Genomic structure of the locus encoding protein 4.1. Structural basis for complex combinatorial patterns of tissue-specific alternative RNA splicing

Restriction of fusion protein mRNA as a mechanism of measles virus persistence

Messenger RNA for the coat protein of tobacco mosaic virus

Multiple 5 0 -untranslated exons in the nuclear respiratory factor 1 gene span 47 kb and contribute to transcript heterogeneity and translational efficiency

Translational regulation of hepatitis B virus polymerase gene by termination-reinitiation of an upstream minicistron in a length-dependent manner

pVHL 19 is a biologically active product of the von Hippel-Lindau gene arising from internal translation initiation

Conservation of polyamine regulation by translational frameshifting from yeast to mammals

Identification of two additional translation products from the matrix (M) gene that contribute to vesicular stomatitis virus cytopathology

Decreased RIZ1 expression but not RIZ2 in hepatoma and suppression of hepatoma tumorigenicity by RIZ1

Variants of the 5 0 -untranslated region of the bovine growth hormone receptor mRNA: isolation, expression and effects on translational efficiency

Minimal truncation of the c-myb gene product in rapid-onset B-cell lymphoma

Both codon context and leader length contribute to efficient expression of two overlapping open reading frames of a cucumber necrosis virus bifunctional subgenomic mRNA

Expression of bacterial chitinase protein in tobacco leaves using two photosynthetic gene promoters

N-myc translation is initiated via an internal ribosome entry segment that displays enhanced activity in neuronal cells

Expression of human foamy virus reverse transcriptase involves a spliced pol mRNA

A common genetic polymorphism (46 C to T substitution) in the 5 0 -untranslated region of the coagulation factor XII gene is associated with low translation efficiency and decrease in plasma factor XII level

Production of three distinct mRNAs of 150 kDa oxygen-regulated protein (ORP150) by alternative promoters: preferential induction of one species under stress conditions

Two distinct estrogen-regulated promoters generate transcripts encoding the two functionally different human progesterone receptor forms A and B

Transient expression of human and chicken progesterone receptors does not support alternative translational initiation from a single mRNA as the mechanism generating two receptor isoforms

Human RNA-specific adenosine deaminase (ADAR1 ) gene specifies transcripts that initiate from a constitutively active alternative promoter

RNA targets of the fragile X protein

Molecular cloning of the human p120 ctn catenin gene (CTNND1): expression of multiple alternatively spliced isoforms

Competition between nuclear localization and secretory signals determines the subcellular fate of a single CUG-initiated form of FGF3

Expression of the open reading frame 74 (G-protein-coupled receptor) gene of Kaposi's sarcoma (KS)-associated herpesvirus: implications for KS pathogenesis

Splicing of cauliflower mosaic virus 35S RNA is essential for viral infectivity

Two overlapping reading frames in a single exon encode interacting proteins -a novel way of gene usage

Murine phospholipid hydroperoxide glutathione peroxidase: cDNA sequence, tissue expression, and mapping

A positive-strand RNA virus with three very different subgenomic RNA promoters

Caveolin-1 isoforms are encoded by distinct mRNAs

The position dependence of translational regulation via RNA-RNA and RNAprotein interactions in the 5 0 -untranslated region of eukaryotic mRNA is a function of the thermodynamic competence of 40S ribosomes in translational initiation

Binding of ribosomes to linear and circular forms of the 5 0 -terminal leader fragment of tobacco mosaic virus RNA

Familial essential thrombocythemia associated with one-base deletion in the 5 0 -untranslated region of the thrombopoietin gene

Upstream open reading frames regulate the translation of multiple mRNA variants of the estrogen receptor alpha

Independent promoters regulate the expression of two amino terminally distinct forms of latent transforming growth factor-b binding protein-1 (LTBP-1) in a cell typespecific manner

Inability of circular mRNA to attach to eukaryotic ribosomes

Migration of 40S ribosomal subunits on messenger RNA when initiation is perturbed by lowering magnesium or adding drugs

Role of ATP in binding and migration of 40S ribosomal subunits

Analysis of ribosome binding sites from the s1 message of reovirus. Initiation at the first and second AUG codons

Translation of insulin-related polypeptides from messenger RNAs with tandemly reiterated copies of the ribosome binding site

Influences of mRNA secondary structure on initiation by eukaryotic ribosomes

Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes

An analysis of 5 0 -noncoding sequences from 699 vertebrate messenger RNAs

At least six nucleotides preceding the AUG initiator codon enhance translation in mammalian cells

Effects of intercistronic length on the efficiency of reinitiation by eucaryotic ribosomes

Circumstances and mechanisms of inhibition of translation by secondary structure in eucaryotic mRNAs

Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes

Evaluation of the fidelity of initiation of translation in reticulocyte lysates from commercial sources

An analysis of vertebrate mRNA sequences: intimations of translational control

Structural features in eukaryotic mRNAs that modulate the initiation of translation

A short leader sequence impairs the fidelity of initiation by eukaryotic ribosomes

Adherence to the first-AUG rule when a second AUG codon follows closely upon the first

Interpreting cDNA sequences: some insights from studies on translation

Recognition of AUG and alternative initiator codons is augmented by G in position þ 4 but is not generally affected by the nucleotides in positions þ5 and þ6

Primer-extension analysis of eukaryotic ribosome-mRNA complexes

Initiation of translation in prokaryotes and eukaryotes

Do the 5 0 untranslated domains of human cDNAs challenge the rules for initiation of translation (or is it vice versa

New ways of initiating translation in eukaryotes?

Constraints on reinitiation of translation in mammals

Emerging links between initiation of translation and human diseases

Migration of 40S ribosomal subunits on messenger RNA in the presence of edeine

Significant impact of the þ 93 C/T polymorphism in the apolipoprotein(a) gene on Lp(a) concentrations in Africans but not in Caucasians: confounding effect of linkage disequilibrium

Identification of the translational initiation codon in human MAGED1

Genomic organization and biosynthesis of secreted and cytoplasmic forms of gelsolin

Archaeal translation initiation revisited: the initiation factor 2 and eukaryotic initiation factor 2B ab-d subunit families

The molecular biology of coronaviruses

A transcriptional repressor encoded by BPV-1 shares a common carboxy-terminal domain with the E2 transactivator

The genetics of bovine papillomavirus type 1

Targeting of a human iron-sulfur cluster assembly enzyme, nifs, to different subcellular compartments is regulated through alternative AUG utilization

Translational enhancement of mdm2 oncogene expression in human tumor cells containing a stabilized wild-type p53 protein

Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus

Genomic organization of the mouse peroxisome proliferator activated receptor b/g gene. Alternative promoter usage and splicing yield transcripts exhibiting differential translational efficiency

The hormone-sensitive lipase gene is transcribed from at least five alternative first exons in mouse adipose tissue

Polyamine regulation of ribosome pausing at the upstream open reading frame of Sadenosylmethionine decarboxylase

Mutation eliminating mitochondrial leader sequence of methylmalonyl-CoA mutase causes mut o methylmalonic acidemia

Molecular cloning and characterization of the mouse peroxiredoxin V gene

The 5 0 untranslated regions of the rat A 2A adenosine receptor gene function as negative translational regulators

Zebrafish wnt8 encodes two Wnt8 proteins on a bicistronic transcript and is required for mesoderm and neurectoderm patterning

The human galactose-1-phosphate uridyltransferase gene

Translational control of cell fate: availability of phosphorylation sites on translational repressor 4E-BP1 governs its proapoptotic potency

Cell-to-cell movement of turnip crinkle virus is controlled by two small open reading frames that function in trans

A translationally regulated Tousled kinase phosphorylates histone H3 and confers radioresistance when overexpressed

A 30-kDa alternative translation product of the CCAAT/enhancer binding protein a message: transcriptional activator lacking antimitotic activity

Intron-exon structure of the MET gene and cloning of an alternatively-spliced Met isoform reveals frequent exon-skipping of a single large internal exon

An extended RNA/RNA duplex structure within the coding region of mRNA does not block translational elongation

Characterization of the 5 0 untranslated region of the human c-fgr gene and identification of the major myelomonocytic c-fgr promoter

The promoter, transcriptional unit, and coding sequence of herpes simplex virus 1 family 35 proteins are contained within and in frame with the U L 26 open reading frame

Initiation of translation from a downstream in-frame AUG codon on BRCA1 can generate the novel isoform protein DBRCA1(17aa)

The retinoblastoma interacting zinc finger gene RIZ produces a PR domain-lacking product through an internal promoter

Mutation of the CDKN2A 5 0 UTR creates an aberrant initiation codon and predisposes to melanoma

The human cytomegalovirus UL35 gene encodes two proteins with different functions

RNA polymerase I-promoted HIS4 expression yields uncapped, polyadenylated mRNA that is unstable and inefficiently translated in Saccharomyces cerevisiae

Two isoforms of murine hck, generated by utilization of alternative translational initiation codons, exhibit different patterns of subcellular localization

The human AQP4 gene: definition of the locus encoding two water channel polypeptides in brain

Cloning and origin of the two forms of chicken vitamin D receptor

In vivo evaluation of the context sequence of the translation initiation codon in plants

Alternative splicing and promoter usage generates an intracellular stromelysin 3 isoform directly translated as an active matrix metalloproteinase

Efficiency of reinitiation of translation on human immunodeficiency virus type 1 mRNAs is determined by the length of the upstream open reading frame and by intercistronic distance

Expression of NFAT-family proteins in normal human T cells

An alternative promoter in the mouse major histocompatibility complex class II I-Ab gene: implications for the origin of CpG islands

Regulation of rat ornithine decarboxylase mRNA translation by its 5 0 -untranslated region

Translational activation of the lck proto-oncogene

An alternative mode of translation permits production of a variant NBS1 protein from the common Nijmegen breakage syndrome allele

Lost in translation

Molecular biology of luteoviruses

Posttranscriptional control of gene expression in yeast. Microbiol

Isolation, characterization, and transcription of the gene encoding mouse mast cell protease 7

Translational control of the Xenopus laevis connexin-41 5 0 -untranslated region by three upstream open reading frames

Human MxB protein, an interferon-a-inducible GTPase, contains a nuclear targeting signal and is localized in the heterochromatin region beneath the nuclear envelope

Subgenomic RNAs mediate expression of cistrons located internally on the genomic RNA of tobacco necrosis virus strain A

The three dominant female-sterile mutations of the Drosophila ovo gene are point mutations that create new translation-initiator AUG codons

Posttranscriptional control via iron-responsive elements: the impact of aberrations in hereditary disease

Synthesis of subgenomic RNAs by positivestrand RNA viruses

Synthesis of brome mosaic virus subgenomic RNA in vitro by internal initiation on (2)-sense genomic RNA

Transforming growth factor-b 1 -mediated inhibition of the flk-1/KDR gene is mediated by a 5 0 -untranslated region palindromic GATA site

Characterization of two novel nuclear BTB/POZ domain zinc finger isoforms. Association with differentiation of hippocampal neurons, cerebellar granule cells, and macroglia

Regulation of expression of the alternative mRNAs of the rat a-thyroid hormone receptor gene

Differential translational initiation of lbp mRNA is caused by a 5 0 upstream open reading frame

Identification of murine p120 cas isoforms and heterogeneous expression of p120 cas isoforms in human tumor cell lines

Inducibility and negative autoregulation of CREM: an alternative promoter directs the expression of ICER, an early response repressor

a-Thalassaemia associated with the deletion of two nucleotides at position 22 and 23 preceding the AUG codon

Upstream open reading frames as regulators of mRNA translation

The 5 0 UTR of protein kinase C 1 confers translational regulation in vitro and in vivo

Complex organisation of the 5 0 -end of the human glycine tRNA synthetase gene

Mapping of the polycistronic RNAs of tomato leaf curl geminivirus

Site-directed mutagenesis of adeno-associated virus type 2 structural protein initiation codons: effects on regulation of synthesis and biological activity

Specific cleavage of hepatitis C virus RNA genome by human RNase P

Expression of Kaposi's sarcoma-associated herpesvirus G protein-coupled receptor monocistronic and bicistronic transcripts in primary effusion lymphocytes

RAR-b4, a retinoic acid receptor isoform is generated from RAR-b2 by alternative splicing and usage of a CUG initiator codon

The HTS1 gene encodes both the cytoplasmic and mitochondrial histidine tRNA synthetases of S. cerevisiae

Translation of a nonpolyadenylated viral RNA is enhanced by binding of viral coat protein or polyadenylation of the RNA

Nucleotide sequence and expression of the capsid protein gene of feline calicivirus

Translational discrimination of mRNAs coding for human insulin-like growth factor II

Growthdependent translation of IGF-II mRNA by a rapamycin-sensitive pathway

Full-sized RanBPM cDNA encodes a protein possessing a long stretch of proline and glutamine within the Nterminal region, comprising a large protein complex

Human O-GlcNAc transferase (OGT): genomic structure, analysis of splice variants, fine mapping in Xq13

Tissue-specific initiation of murine complement factor B mRNA transcription

Downstream ribosomal entry for translation of coronavirus TGEV gene 3b

Generation from a single gene of two mRNAs that encode the mitochondrial and peroxisomal serine: pyruvate aminotransferase of rat liver

ABCD1 translation-initiator mutation demonstrates genotype-phenotype correlation for AMN

CCAAT/enhancer-binding protein mRNA is translated into multiple proteins with different transcription activation potentials

Dominant-negative mutations of CEBPA, encoding CCAAT/enhancer binding protein-a (C/EBPa), in acute myeloid leukemia

Mammalian cells express two differently localized Bag-1 isoforms generated by alternative translation initiation

Spliced and prematurely polyadenylated Jaagsiekte sheep retrovirus-specific RNAs from infected or transfected cells

Ribosomal pausing and scanning arrest as mechanisms of translational regulation from cap-distal iron-responsive elements

HCC-2, a human chemokine: gene structure, expression pattern, and biological activity

A plant viral "reinitiation" factor interacts with the host translational machinery

Genetic manipulation of arterivirus alternative mRNA leader-body junction sites reveals tight regulation of structural protein expression

The size of Rous sarcoma virus mRNAs active in cell-free translation

Starved Saccharomyces cerevisiae cells have the capacity to support internal initiation of translation

Effect of upstream reading frames on translation efficiency in simian virus 40 recombinants

Promoter choice influences alternative splicing and determines the balance of isoforms expressed from the mouse bcl-X gene

Functional organization of the human uncoupling protein-2 gene, and juxtaposition to the uncoupling protein-3 gene

Uncoupling protein 2, in vivo distribution, induction upon oxidative stress, and evidence for translational regulation

The exon structure of the mouse a2(IX) collagen gene shows unexpected divergence from the chick gene

Identification of the subgenomic mRNAs that encode 6-kDa movement protein and Hsp70 homolog of Beet yellows virus

A reassessment of the translation initiation codon in vertebrates

Characterization of human mucin gene MUC4 promoter

p76 MDM2 inhibits the ability of p90 MDM2 to destabilize p53

Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs

Structural and functional features of eukaryotic mRNA untranslated regions

Molecular mechanisms of translation initiation in eukaryotes

Human myeloid zinc finger gene MZF produces multiple transcripts and encodes a SCAN box protein

Systematic movement of an RNA plant virus determined by a point substitution in a 5 0 leader sequence

Coupled transcriptional and translational control of cyclin-dependent kinase inhibitor p18 INK4c expression during myogenesis

Hepatocyte-nuclear factor 3b gene transcripts generate protein isoforms with different transactivation properties on the glucagon gene

Internal initiation of translation of five dendritically localized neuronal mRNAs

The amphiregulin gene encodes a novel epidermal growth factor-related protein with tumor-inhibitory activity

Characterization of hypersensitive sites, protein-binding motifs, and regulatory elements in both promoters of the mouse porphobilinogen deaminase gene

Identification of a sequence in the unique 5 0 open reading frame of the gene encoding glycosylated gag which influences the incubation period of neurodegenerative disease induced by a murine retrovirus

The glycosylated gag protein of MuLV is a determinant of neuroinvasiveness: analysis of second site revertants of a mutant MuLV virus lacking expression of this protein

An alternative open reading frame of the human macrophage colony-stimulating factor gene is independently translated and codes for an antigenic peptide of 14 amino acids recognized by tumorinfiltrating CD8 T lymphocytes

Introducing RefSeq and LocusLink: curated human genome resources at the NCBI

Rat phospholipid-hydroperoxide glutathione peroxidase. cDNA cloning and identification of multiple transcription and translation start sites

Mechanisms of loss of foreign gene expression in recombinant vesicular stomatitis viruses

In vitro translation of the upstream open reading frame in the mammalian mRNA encoding S-adenosylmethionine decarboxylase

The 5 0 untranslated sequence of the c-sis platelet-derived growth factor 2 transcript is a potent translational inhibitor

A single gene encodes two isoforms of the p70 S6 kinase: activation upon mitogenic stimulation

Posttranscriptional mRNA processing as a mechanism for regulation of human A 1 adenosine receptor expression

Expression of the smoothelin gene is mediated by alternative promoters

Mouse mammary tumor virus superantigen expression in B cells is regulated by a central enhancer within the pol gene

Non-AUG initiation of AGAMOUS mRNA translation in Arabidopsis thaliana

Efficient simultaneous presentation of NY-ESO-1/LAGE-1 primary and nonprimary open reading frame-derived CTL epitopes in melanoma

Presence of ATG triplets in 5 0 untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon

A non-AUG-defined alternative open reading frame of the intestinal carboxyl esterase mRNA generates an epitope recognized by renal cell carcinoma-reactive tumor-infiltrating lymphocytes in situ

Identification of BING-4 cancer antigen translated from an alternative open reading frame of a gene in the extended MHC class II region using lymphocytes from a patient with a durable complete regression following immunotherapy

Cell-specific translational regulation of S-adenosylmethionine decarboxylase mRNA. Influence of the structure of the 5 0 transcript leader on regulation by the upstream open reading frame

Block of HAC1 mRNA translation by long-range base pairing is released by cytoplasmic splicing upon induction of the unfolded protein response

Regulation of alternative splicing by RNA editing

Continuous and discontinuous ribosome scanning on the cauliflower mosaic virus 35S RNA leader is controlled by short open reading frames

A complex translational program generates multiple novel proteins from the latently expressed Kaposin (K12) locus of Kaposi's sarcoma-associated herpesvirus

Correlation between sequence conservation of the 5 0 untranslated region and codon usage bias in Mus musculus genes

COT kinase proto-oncogene expression in T cells

N-terminal RAG1 frameshift mutations in Omenn's syndrome: internal methionine usage leads to partial V(D)J recombination activity and reveals a fundamental role in vivo for the N-terminal domains

Complete genomic sequence of the human ABCA1 gene: analysis of the human and mouse ATP-binding cassette A promoter

The pim-1 oncogene encodes two related protein-serine/threonine kinases by alternative initiation at AUG and CUG

Negative and translation termination-dependent positive control of FLI-1 protein synthesis by conserved overlapping 5 0 upstream open reading frames in Fli-1 mRNA

Normal developing rat brain expresses a platelet-derived growth factor B chain (c-sis ) mRNA truncated at the 5 0 end

Multiple murine double minute gene 2 (MDM2) proteins are induced by ultraviolet light

Transcriptional control of hepadnavirus gene expression

Internal translation initiation generates novel WT1 protein isoforms with distinct biological properties

Caveolin isoforms differ in their N-terminal protein sequence and subcellular distribution

Identification and functional analysis of the turnip yellow mosaic tymovirus subgenomic promoter

A second major native von Hippel-Lindau gene product, initiated from an internal translation start site, functions as a tumor suppressor

An amino-terminal domain of Mxi1 mediates anti-myc oncogenic activity and interacts with a homolog of the yeast transcriptional repressor SIN3

Mechanism of translation of monocistronic and multicistronic human immunodeficiency virus type 1 mRNAs

Mechanisms of synthesis of virion proteins from the functionally bigenic late mRNAs of simian virus 40

Translation initiation at a downstream AUG occurs with increased efficiency when the upstream AUG is located very close to the 5 0 cap

Cloning, expression, and nucleotide sequence of rat liver sterol carrier protein 2 cDNAs

Bovine coronavirus I protein synthesis follows ribosomal scanning on the bicistronic N mRNA

Ubiquitinactivating enzyme (E1) isoforms in lens epithelial cells: origin of translation, E2 specificity and cellular localization determined with novel site-specific antibodies

Cloning and characterization of liverspecific isoform of Chk1 gene from rat

Producing nature's gene-chips: the generation of peptides for display by MHC class I molecules

Preferential ribosomal scanning is involved in the differential synthesis of the hepatitis B viral surface antigens from subgenomic transcripts

Translation of the RNAs of brome mosaic virus: the monocistronic nature of RNA1 and RNA2

Analysis of capsid formation of human polyomavirus JC (Tokyo-1 strain) by a eukaryotic expression system: splicing of late RNAs, translation and nuclear transport of major capsid protein VP1, and capsid assembly

Sequential partially overlapping gene arrangement in the tricistronic S1 genome segments of avian reovirus and Nelson Bay reovirus: implications for translation initiation

Translational regulation of the JunD messenger RNA

Characterization of the murine hyaluronidase gene region reveals complex organization and cotranscription of Hyal1 with downstream genes, Fus2 and Hyal3

Polyoma virus has three late mRNAs: one for each virion protein

A somatic mutation in the 5 0 UTR of BRCA1 gene in sporadic breast cancer causes down-modulation of translation efficiency

The 105-kDa polyprotein of southern bean mosaic virus is translated by scanning ribosomes

The two subunits of human molybdopterin synthase: evidence for a bicistronic messenger RNA with overlapping reading frames

Poliovirus neurovirulence correlates with the presence of a cryptic AUG upstream of the initiator codon

mRNA leader length and initiation codon context determine alternative AUG selection for the yeast gene MOD5

CHOP-dependent stress-inducible expression of a novel form of carbonic anhydrase VI

The promoters for human and monkey poliovirus receptors

Characterization of two bifunctional Arabidopsis thaliana genes coding for mitochondrial and cytosolic forms of valyl-tRNA synthetase and threonyl-tRNA synthetase by alternative use of two in-frame AUGs

A small highly basic protein is encoded in overlapping frame within the P gene of vesicular stomatitis virus

Identification of downstream-initiated c-myc proteins which are dominant-negative inhibitors of transactivation by full-length c-myc proteins

Leaky scanning is the predominant mechanism for translation of human papillomavirus type 16 E7 oncoprotein from E6/ E7 bicistronic mRNA

Human molybdopterin synthase gene: identification of a bicistronic transcript with overlapping reading frames

Elements in the murine c-mos messenger RNA 5 0 -untranslated region repress translation of downstream coding sequences

Regulatory role of the conserved stem-loop structure at the 5 0 end of collagen a1(I) mRNA

The alphaviruses: gene expression, replication, and evolution

Characterization of human MAPRE genes and their proteins

Calspermin gene transcription is regulated by two cyclic AMP response elements contained in an alternative promoter in the calmodulin kinase IV gene

Structure of the 5 0 flanking region of the gene encoding human parathyroid-hormonerelated protein (PTHrP)

Statistical analysis of the 5 0 untranslated region of human mRNA using "oligo-capped" cDNA libraries

Evidence for the existence of a coat protein messenger RNA associated with the top component of each of three tymoviruses

Truncated forms of the dual function human ASCT2 neutral amino acid transporter/retroviral receptor are translationally initiated at multiple alternative CUG and GUG codons

The 5 0 -untranslated region of the mouse glial cell line-derived neurotrophic factor gene regulates expression at both the transcriptional and translational levels

Testis-specific transcription initiation sites of rat farnesyl pyrophosphate synthetase mRNA

The baculovirus transcriptional transactivator ieO produces multiple products by internal initiation of translation

Generation of stable mRNA fragments and translation of Ntruncated proteins induced by antisense oligonucleotides

Transcriptional regulation of the interferon-g-inducible tryptophanyl-tRNA synthetase includes alternative splicing

Evidence for translational regulation of the imprinted Snurf-Snrpn locus in mice

Cancerassociated alternative usage of multiple promoters of human GalCer sulfotransferase gene

The eyeless mouse mutation (ey1 ) removes an alternative start codon from the Rx/rax homeobox gene

Expression patterns of the multiple transcripts from the folylpolyglutamate synthetase gene in human leukemias and normal differentiated tissues

Ataxia caused by mutations in the atocopherol transfer protein gene

Mutations in each of the five subunits of translation initiation factor eIF2B can cause leukoencephalopathy with vanishing white matter

Structural and functional genomics of the CPT1B gene for muscle-type carnitine palmitoyltransferase I in mammals

Ribosomal scanning on the highly structured insulin-like growth factor II-leader 1

Opposing roles of Elk-1 and its brain-specific isoform, short elk-1, in nerve growth factor-induced PC12 differentiation

Identification of Rauscher murine leukemia virus-specific mRNAs for the synthesis of gag-and env-gene products

In vivo translation of the triple gene block of potato virus X requires two subgenomic mRNAs

The yeast transcription factor genes YAP1 and YAP2 are subject to differential control at the levels of both translation and mRNA stability

Rejection antigen peptides on BALB/c RLF1 leukemia recognized by cytotoxic T lymphocytes: derivation from the normally untranslated 5 0 region of the c-Akt proto-oncogene activated by long terminal repeat

Genomic architecture and transcriptional activation of the mouse and human tumor susceptibility gene TSG101: common types of shorter transcripts are true alternative splice variants

A human aminoacyl-tRNA synthetase as a regulator of angiogenesis

Analysis of the two subgenomic RNA promoters for turnip crinkle virus in vivo and in vitro

Inefficient reinitiation is responsible for upstream open reading frame-mediated translational repression of the maize R gene

Role of mRNA secondary structure in translational repression of the maize transcriptional activator Lc

Utilization of an alternative open reading frame of a normal gene in generating a novel human cancer antigen

Post-transcriptional regulation of the GLI1 oncogene by the expression of alternative 5 0 untranslated regions

A novel, testis-specific mRNA transcript encoding an NH 2 -terminal truncated nitric-oxide synthase

RNA diversity has profound effects on the translation of neuronal nitric oxide synthase

Bunyamwera bunyavirus nonstructural protein NSs counteracts the induction of alpha/beta interferon

Genomic organization of human MXI1, a putative tumor suppressor gene

Acquired mutations in GATA1 in the megakaryoblastic leukemia of Down syndrome

Infectious TYMV RNA from cloned cDNA: effects in vitro and in vivo of point substitutions in the initiation codons of two extensively overlapping ORFs

Expression of the hepatitis B virus core gene in vitro and in vivo

Cytomegalovirus assembly protein nested gene family: four 3 0 -coterminal transcripts encode four in-frame overlapping proteins

An internal open reading frame triggers nonsense-mediated decay of the yeast SPT10 mRNA

Deregulation of translational control of the 65-kDa regulatory subunit (PR65a) of protein phosphatase 2A leads to multinucleated cells

The short 5 0 untranslated region of the betaA3/A1-crystallin mRNA is responsible for leaky ribosomal scanning

The human achaete scute homolog 2 gene contains two promoters, generating overlapping transcripts and encoding two proteins with different nuclear localization

An activating splice donor mutation in the thrombopoietin gene causes hereditary thrombocythaemia

Cloning and characterization of two novel thyroid hormone receptor b isoforms

A 6700 MW membrane protein is encoded by region E3 of adenovirus type 2

The glyoxysomal and plastid molecular chaperones (70-kDa heat shock protein) of watermelon cotyledons are encoded by a single gene

Analysis and mapping of a family of 3 0 -coterminal transcripts containing coding sequences of human cytomegalovirus open reading frames UL93 through UL99

Evidence that AGUAUAUGA and CCAAGAUGA initiate translation in the same mRNA in region E3 of adenovirus

E3 transcription unit of adenovirus

Interplay of heterogeneous transcriptional start sites and translational selection of AUGs dictate the production of mitochondrial and cytosolic/nuclear tRNA nucleotidyltransferase form the same gene in yeast

Mechanisms leading to and the consequences of altering the normal distribution of ATP(CTP):tRNA nucleotidyltransferase in yeast

The 5 0 -untranslated region of the N-methyl-D-aspartate receptor NR2A subunit controls efficiency of translation

Characterization of an upstream open reading frame in the 5 0 untranslated region of PR-39, a cathelicidin antimicrobial peptide

Identification and characterization of a new mammalian mitogen-activated protein kinase kinase, MKK2

Alternative transcriptional initiation as a novel mechanism for regulating expression of a baculovirus trans activator

Expression of a mitogen-responsive gene encoding prostaglandin synthase is regulated by mRNA splicing

Inhibition of corticotropin releasing hormone type-1 receptor translation by an upstream AUG triplet in the 5 0 untranslated region

CCAAT/ enhancer binding protein 1 is preferentially up-regulated during granulocyte differentiation and its functional versatility is determined by alternative use of promoters and differential splicing

Identification of the start sites for the 1.9-and 1.4-kb rat transforming growth factor-b1 transcripts and their effect on translational efficiency

DNA methylation represses the expression of the human erythropoietin gene by two different mechanisms

An ERCC1 splicing variant involving the 5 0 -UTR of the mRNA may have a transcriptional modulatory function

Molecular identification and characterization of A and B forms of the glucocorticoid receptor

Evidence that a regulatory gene autoregulates splicing of its transcript

Identification of a new form of AQP4 mRNA that is developmentally expressed in mouse brain

Identification of a noncanonical signal for transcription of a novel subgenomic mRNA of mouse hepatitis virus: implication for the mechanism of coronavirus RNA transcription

Novel short transcripts of hepatitis B virus X gene derived from intragenic promoter

Splicing in adenovirus and other animal viruses

Tissue specific expression of the retinoic acid receptor-b2: regulation by short open reading frames in the 5 0 -noncoding region

Analysis of the CC chemokine receptor 3 gene reveals a complex 5 0 exon organization, a functional role for untranslated exon 1, and a broadly active promoter with eosinophil-selective elements

Complete testicular feminization caused by an amino-terminal truncation of the androgen receptor with downstream initiation

C/T polymorphism in the 5 0 untranslated region of the apolipoprotein(a) gene introduces an upstream AUG and reduces in vitro translation

Research in my laboratory is supported by grant GM33915 from the National Institutes of Health.