PII: 0092-8674(95)90368-2 Cell, Vol. 81, 27-40, April 7, 1995, Copyright © 1995 by Cell Press Mutations in the Proteolytic Enzyme Calpain 3 Cause Limb-Girdle Muscular Dystrophy Type 2A Isabelle Richard,* Odile Broux,* Val6rie Allamand,* Frangoise Fougerousse,* Nuchanard Chiannilkulchai,* Nathalie Bourg,* Lydie Brenguier, * Catherine Devaud,* Patricia Pasturaud,* Carinne Roudaut,* Dominique Hillaire,* Maria-Rita Passos-Bueno,t Mayana Zatz,t Jay A. Tischfield,~ Michel Fardeau,§ Charles E. Jackson,II Daniel Cohen,# and Jacques S. Beckmann*# *G6n~thon 1, rue de I'lnternationale 91000 Evry France tDepartment de Biologia Instutio de Bioci~ncias Universidad de S~.o Paulo S~o Paulo 05508-900 Brazil tDepartment of Medical and Molecular Genetics Indiana University School of Medicine Indianapolis, Indiana 46202-5251 §lnstitut National de la Sant~ et de la Recherche Medicale Unit~ 153 Centre National de la Recherche Scientifique Unite A614 17, rue du Fer & Moulin 75005 Paris France IIHenry Ford Hospital Detroit, Michigan 48202 #Fondation Jean Dausset Centre d'Etudes du Polymorphisme Humain 27, rue Juliette Dodu 75010 Paris France Summary Limb-girdle muscular dystrophies (LGMDs) are a group of inherited diseases whose genetic etiology has yet to be elucidated. The autosomal recessive forms (LGMD2) constitute a genetically heterogeneous group with LGMD2A mapping to chromosome 15q15.1-q21.1. The gene encoding the muscle-specific calcium-activated neutral protease 3 (CANP3) large subunit is located in this region. This cysteine protease belongs to the family of intracellular calpains. Fifteen nonsense, splice site, frameshift, or missense calpain mutations cosegregate with the disease in LGMD2A families, six of which were found within La Rdunion island patients. A digenic inheritance model is proposed to account for the unexpected presence of multiple independent mutations in this small inbred population. Finally, these results demonstrate an enzymatic rather than a structural protein defect causing a muscular dystro- phy, a defect that may have regulatory consequences, perhaps in signal transduction. Introduction The term limb-girdle muscular dystrophy (LGMD) was first proposed by Walton and Nattrass (1954) as part of a classi- fication of muscular dystrophies. LGMD is characterized by progressive symmetrical atrophy and weakness of the proximal limb muscles and elevated serum creatine ki- nase. The symptoms usually begin during the first two decades of life, and the disease gradually worsens, often resulting in loss of walking ability 10 or 20 years after onset (Bushby, 1994). Yet the precise nosological definition of LGMD still remains unclear. Consequently, various neuro- muscular diseases such as facioscapulohumeral, Becker muscular dystrophies, and especially spinal muscular at- rophies have occasionally been classified under this diag- nosis. These issues highlight the difficulty in undertaking an analysis of the molecular and genetic defect(s) involved in this pathology. Both autosomal dominant and recessive transmission have been reported, the latter being more common with an estimated prevalence of 10 -5 (Emery, 1991). The local- ization of a gene on chromosome 15 (LGMD2A, MIM 253600; Beckmann et al., 1991) has provided proof for the genetic basis of one form of recessive LGMDs. Subse- quent genetic analyses confirmed this chromosome 15 localization (Young et al., 1992; Passos-Bueno et al., 1993). The latter group also demonstrated genetic hetero- geneity of this disease, while a recent study localized the LGMD2B gene to chromosome 2 (Bashir et al., 1994). Yet there is evidence that at least one other locus is involved, since genetic heterogeneity was demonstrated in the in- bred Indiana Amish LGMD2 kindreds (Allamand et al., 1995a). The nonspecific nosological definition, the relatively low prevalence, and genetic heterogeneity of this disorder limit the number of families that can be used to restrict the genetic boundaries of the LGMD2A interval. No cytoge- netic abnormalities have been reported. Immunogenetic studies of the dystrophin-associated protein complex (Mat- sumara et al., 1993) and cytoskeletal or extracellular ma- trix proteins (e.g., Tom~ et al., 1994) failed to demonstrate any deficiency, in addition, there is no known specific physiological feature or animal model to suggest a candi- date gene. Thus, there was no alternative to a positional cloning strategy. Detailed genetic and physical maps of the 15q15.1- q21.1 LGMD2A region were established. Construction and analysis of a 10-12 Mb yeast artificial chromosome (YAC) contig (Fougerousse et al., 1994) permitted the mapping of 33 polymorphic markers within this interval and to nar- row the LGMD2A region to between D15S514 and D15S222. Furthermore, extensive analysis of linkage disequilibrium suggested a likely position for the gene in the proximal part of the contig (Allamand et al., 1995b). cDNA selection (Tagie et al., 1993) for muscle-expressed sequences encoded by this interval led to the identification of five known genes and 10 newly expressed sequences, CORE Metadata, citation and similar papers at core.ac.uk Provided by Elsevier - Publisher Connector https://core.ac.uk/display/82569869?utm_source=pdf&utm_medium=banner&utm_campaign=pdf-decoration-v1 Cell 28 Table 1. PCR Primers for the Localization of the CANP3 Gone PCR Product Size (Base Pairs) Position within Primer Name Primer Sequence (5'-3') the cDNA cDNA Genomic DNA Annealing Temperature (in Celsius) C A N P 3 - i n 2 . a ATGGAGCCAACAGAACTGAC 341-360 108 1758 CANP3-in2.m GTATGACTCGGAAAAGAAGGT 428-448 CANP3-inl 3.a TAAGCAAAAGCAGTCCCCAC 1893-1912 64 1043 CAN P3qnl 3.m TTGCTGTTCCTCACTTTCCTG 1936-1956 CAN P 3 - 6 a 3 . a GTTTCATCTGCTGCTTCGTT 2 3 4 2 - 2 3 6 1 130 818 CANP3-6a3.m CTGGTTCAGGCATACATGGT 2452-2471 CANP3-exlter.a TTCTTTATGTGGACCCTGAGTT 218-239 76 76 CANP3-exlter.m ACGAACTGGATGGGGAACT 275-293 58 58 56 55 all potential c a n d i d a t e g e n e s (Chiannilkulchai et al., 1995). O n e of these, p r e v i o u s l y c l o n e d by S o r i m a c h i et al. (1989), a p p e a r e d also to b e a functional c a n d i d a t e gone, as it e n c o d e s a m u s c l e - s p e c i f i c protein, C A N P 3 ( n a m e d for cal- pain large p o l y p e p t i d e L3), w h i c h b e l o n g s to the c a l p a i n f a m i l y (or c a l c i u m - a c t i v a t e d neutral p r o t e a s e [CANP]; EC 3.4.22.17). C a l p a i n s a r e n o n l y s o s o m a l intracellular cysteine prote- ases (Murachi, 1989; S u z u k i and Ohno, 1990; Croall a n d D e m a r t i n o , 1991). T h e m a m m a l i a n c a l p a i n s include t w o u b i q u i t o u s proteins, CANP1 and C A N P 2 , as well as t w o s t o m a c h - s p e c i f i c proteins ( S o r i m a c h i et al., 1993a) a n d CANP3. T h e u b i q u i t o u s e n z y m e s consist of h e t e r o d i m e r s with distinct large s u b u n i t s a s s o c i a t e d with a c o m m o n small s u b u n i t (Murachi, 1989), all of w h i c h are e n c o d e d by different g e n e s (Ohno et al., 1989). The association of tissue-specific l a r g e s u b u n i t s with a small s u b u n i t has not yet b e e n d e m o n s t r a t e d . T h e large s u b u n i t s of c a l p a i n s can be s u b d i v i d e d into four d o m a i n s (Ohno et al., 1984). D o m a i n s I and III, w h o s e functions r e m a i n u n k n o w n , s h o w no h o m o l o g y with known proteins. T h e f o r m e r , h o w e v e r , m i g h t be i m p o r t a n t for the regulation o f the proteolytic activity (Imajoh et al., 1986). D o m a i n II s h o w s similarity with o t h e r cysteine proteases, w h i c h s h a r e histidine, cys- teine, and a s p a r a g i n e r e s i d u e s at their active sites (Sori- m a c h i et al., 1989). D o m a i n IV c o m p r i s e s four EF hand structures t h a t are potential c a l c i u m - b i n d i n g sites. In addi- tion, three u n i q u e regions with no k n o w n h o m o l o g y a r e p r e s e n t in the muscle-specific CANP3 protein, n a m e l y NS, IS1, and IS2, the latter c o n t a i n i n g a n u c l e a r t r a n s l o c a t i o n signal (Sorimachi et al., 1989). T h e s e r e g i o n s m a y be im- p o r t a n t for the m u s c l e - s p e c i f i c function of CANP3. W e have d e t e r m i n e d the g e n o m i c o r g a n i z a t i o n of the h u m a n CANP3 g e n e , w h i c h consists of 24 e x o n s a n d ex- t e n d s o v e r 40 kb, 35 kb of w h i c h have b e e n s e q u e n c e d . A s y s t e m a t i c s c r e e n i n g of this g o n e in L G M D f a m i l i e s led to the identification of 15 different mutations, establishing t h a t m u t a t i o n a l events in CANP3 are r e s p o n s i b l e for A G e n o m i c s t r u c t u r e of t h e C A N P 3 gene 20 24 12 14 ~111~/i~f I ~ ..~I .I ILl I ~l.t~_l LI~,LI~JI[I I Li II • X / 1 2 3 4 5 6 7 8 9 10 1113 15 6 3 17 21 B EcoR, restrlction10_16 kb map ~8 "~t~ ~[ ~ 1 ~ ~,, ~1 ~,~ E E EE E EE E E E I ,;~ I I I [ II I l I C C o s m i d m a p , 1 G 3 • 2 B l l • t 1 B 8 • 1 F l l • • • 3A4 • 2G3 • • • 2G8 2A6 • • = 1Al1~. Figure 1. GenomicOrganizationoftheCANP3 Gone (A) The CANP3 gone covers a 40 kb region of which 35 kb were sequenced. Introns and exons are drawn to scale, exons being indi- cated by numbered vertical bars. Intron 1 is the largest one and remains to be fully sequenced as well as intron 8. Positions of intragenic mi- crosatellites are indicated by asterisks. Closed and cross-hatched arrows indicate the orienta- tion of Alu and MER2 repeat sequences, re- spectively. (B) EcoRI restriction map. An EcoRI (E) restric- tion map was established with cosmids from this region. The location of the CANP3 gone is indicated by a closed bar. The size of the corresponding fragments are indicated and underlined when determined by sequence analysis. (C) Cosmid map of the CANP3 gone region. Cosmids, from a library constructed by sub- cloning YAC 774G4, are presented as lines. Overlaps are based on sequence-tagged site information from a cosmid contig established by I. R. et al. (unpublished data). Dots on lines indicate positive sequence-tagged sites, which are boxed in rectangles. A minimum of three cosmids covers the entire gone. Mutations in CANP3 Cause LGMD2A 29 A ~ c ~ g t t ~ = ~ c q q c c t ~ c c a c o ~ = ~ t ~ c c t t q ~ ~ 1 ~ 1 c t = = = ~ c = = 9 ¢ t c . = q = q a = c ¢ t = c ¢ ~ = t q = t g ~ = q t ~ a ~ = t t ¢ c c t = t e c t = ~ t q = ¢ t t t t t ~ t L t t t t t t . . . . . . . ~ = c - 1 0 7 1 . g . . t t ~ g a c c t c c ~ . t ~ c c c t ~ c ~ l g c g t t i t ~ i ~ t ~ t ~ t t ~ t t t ~ / a t ~ / g ~ g t ~ t t g ~ l g t c t g r i t a - $ 4 J . . c t ~ c c t a c l c ~ t a ~ c g t c l g ~ t o c t ~ c t t c t t t c ~ g t g t c ~ c t t ~ ¢ t ~ g . t a ~ t t t ~ t t t t c t g ~ s ~ c c - 4 5 1 • g ~ c t t ~ t ~ 9 . a a t g t c c c ~ ¢ 1 ¢ ~ ~ t c t c t & © ~ 9 ~ g t c t ~ c t t ~ c t c t ~ t . c t ~ l r ~ t ~ t t c t t ~ ~ e o ' ~ 1 o ~ ~ Q ~ V W K ~ p ~ Z C E N ~ R F ~ Z D G A N R C D ~ o 4 s o ~zo ~ o s s o s ~ o ~ o e~.~ ~ ) o s ~ o eso s ~ o ~ o v z z ~ o • z z ] o z ~ s o z z v o 1 0 9 o z z ~ o • z ~ z o ~ z z o 1 2 5 o ~ o s D X ~ O ~ W ~ V S V N ~ ¢ ~ W V e ~ C S . ' . ~ ~ C ~ n r z ~ s 0 ~s~o z s ~ o * 1 ~ o x :e ~ a ¢ s ~ o a L ~J K O r ~" ~ Y R .~. S e ^ ~ S X *r • Z S ~ c ~ r ~ ' r c c c ~ . c ~ , r r c c ~ c , r c . c c , r c c c ~ c ~ . , . c ~ , r c ^ ¢ c ¢ ~ c c ¢ c c x c ~ * c ¢ ~ u ; c c c c ~ c ~ c x ~ c ~ e ~ o ~ s ~ o z s s o zs~o 1 ~ o 1 9 1 o p K o • ( ; s r a P Q P = S S O O B S ~ ~ O 0 ~ S F ~ " Z ~ K 0 2 O 9 O 2 Z l ( I V 2 t 3 0 2 ~ 5 0 " 2 ~ 1 0 2 4 ~ 0 C i ~ ~ c ~ c c ' ~ c c t ~ t c ~ " ~ / c ~ t g ~ / ~ l t c l c t ~ / g " t t t ~ g t t t ~ c c c t = t a t t t ¢ ~ a l g c c ~ c t ~ a c c t ~ a l ~ g l c c ~ g ~ a ~ c c c c t l ~ g g c t t c c . ~ c c t ~ t ~ l t ~ t g t t o c t c o t c ~ t c t t l ¢ ¢ c c c ~ c c ~ c c t t g l t ~ / ~ / t c " t g c c t a l ~ Z g c e t ~ . c c c t t t l g " ~ . t g ~ / t ~ g g ~ l ~ c c c t t ~ l t c c ¢ ~ t t g c ~ t t t ~ a ~ g l ~ g t ' : / c " t g e c t c c ~ l g t e e ~ " ~ c c 2 ~ 1 q ~ c t N ~ t t C t g ~ ' e . ~ l ~ g ~ . t c ~ q c t t . ¢ ¢ t ~ g c ~ c t a q ~ c ~ t ¢ ~ ¢ ¢ ~ c c g g t g ~ c t ~ g ~ c c t c © t t ~ t ~ c t 3 l ; 1 a ~ a q ~ c t c ~ t ~ t t ~ c q c t g c c ~ c ~ t g g g c ~ l l ~ l ¢ ~ g c a c t g g g t t c t l c t c t t q ~ g t a a a c t e . ~ g t 4 1 ~ g t c c c c t t l / t g t t t t ~ t t ~ t C t ~ t t t a g a t a t c a g c ~ t g ~ t g a c c g a l t g ~ ¢ t t c ~ t ~ © c t a t ~ ¢ c ~ l g ~ ~ g e l ~ t g ~ a " a ~ l ~ t c t t ~ / " a t t t t t t ~ t ~ t g c c t a ~ / c t a t t t c t g ~ l ~ t a a l / a a t g ~ / c t c ~ / a t l c ~ i a ~ / c t ~ / t t t ~ t t t g c ~ ¢ ~ c t c t g ~ / a c ~ t g g ~ t 9 ~ t a t c t ~ t " ~ / g ~ t c c t ~ ' t g t c t t c ~ = c ~ t t t c c t t c t t ~ t ~ / c 9 ~ c " ~ g c ~ g ~ g g ~ t ~ c ~ t ~ " e t ~ t g ~ a ~ g t M ~ g " ~ t ~ t ~ " t t ~ t ~ g t ~ c t t ~ t t t t c ~ t ~ t t 1 2 5 1 a t a t t C ¢ t ~ t ~ a t ~ a ~ a t t t t ~ t t t t ~ ` c c ~ t ~ t t t ~ c ~ a t t ~ a t t ~ g g ~ c ~ t ~ t t g a a t ~ Figure 2. Sequence of the Human CANP3 cDNA and Flanking 5' and 3' Genomic Regions (A) and (C) show the polyadenylation signal and putative CAAT and TATAA sites, which are boxed. The putative Spl (position -477 to -472), MAF2-binding sites (position -364 to -343), and CArG box (position -685 to -672) are in bold. The Alu sequence present in the 5' region is underlined. (B) shows the corresponding amino acids, indicated below the sequence. The coding sequence between the ATG initiation codon and the TGA stop codon is 2466 bp, encoding for an 821 amino acid protein. The adenine in the first methionine codon has been assigned position 1. Locations of introns within the CANP3 gene are indicated by arrowheads. Nucleotides that differ from the pre- viously published ones are indicated by asterisks. L G M D 2 A . Finally, this demonstrates a case of muscular dystrophy resulting from an enzymatic rather than a struc- tural defect. R e s u l t s L o c a l i z a t i o n of C A N P 3 w i t h i n t h e L G M D 2 A Interval c D N A capture using Y A C s from the LGMD2A interval al- lowed the identification of 15 positional candidate genes. CANP3 was one of two transcripts that showed muscle- specific expression as evidenced by Northern blot analysis (Chiannilkulchai et al., 1995). T h e CANP3 g e n e had pre- viously been localized to chromosome 15 (Ohno et al., 1989). Primers (Table 1 ; Figure 1) designed from different parts of the published h u m a n c D N A s e q u e n c e (Sorimachi et al., 1989) were used to position the g e n e in a region previously defined as 15q15.1-q21.1 (Richard et al., 1994) and, more precisely, on three Y A C s (774G4, 9 2 6 G 1 0 , and 9 2 3 G 7 ) localized in this region, between D15S512 and D15S488, in a candidate region suggested by linkage dis- equilibrium studies (Allamand et al., 1995b). T h e s e primers were also used to screen a cosmid library derived from Y A C 7 7 4 G 4 (I, R. et al., unpublished data). Five cosmids were identified. Experiments with different primer pairs (Table 1) established that these cosmids cover all CANP3 exons except exon 1 and that a second group of four cosmids contain this exon (Figure 1C). A minimal set of three overlapping cosmids ( 2 G 8 - 2 B 1 1 - 1 F l l ) covers the entire gene. D N A from these cosmids was used to construct an EcoRI restriction map of this region (Figure 1B). T h e C A N P 3 G e n e S e q u e n c e Most of the sequences were obtained through shotgun sequencing of partial digests of cosmid 1 F l l subcloned in M 1 3 and Bluescript vectors and by walking with internal primers. The s e q u e n c e assembly was in agreement with the restriction map of the cosmids. S e q u e n c e s of exon 1 and adjacent regions were obtained by sequencing cos- mid D N A or polymerase chain reaction (PCR) products from human genomic DNA. T h e first and eighth introns are still not fully sequenced, but there is evidence that they may be between 10 and 16 kb and about 2 kb in length, respectively (based on hybridization of restriction fragments; data not shown). T h e entire g e n e spans more than 4 0 kb. T h e determined CANP3 s e q u e n c e completes the pub- lished human c D N A s e q u e n c e (Sorimachi et al., 1989). It contains the missing 129 bases corresponding to the N-terminal 4 3 amino acids (Figure 2B). It also differs from the published sequence at 11 positions, three of which occur at third base positions of codons and preserve the encoded amino acid sequence. T h e other eight differ- ences lead to c h a n g e s in amino acid composition (Figure 2). A s these different exons were sequenced repeatedly on at least 15 distinct genomes, we are confident that the sequence shown in Figure 2 represents an authentic Cell 30 5O h%Iman IMP TVI SASVAP RTAAEPRSPGPVP HPAO SKATEAGGGNP S G I Y SAII SRNFP I I GVKEKTF~Q~HKKCLEKK~ zat 2 ...... PT ..... G .............. G.T ...... H.G ................................... pig cow ioo 9¢ 15o 2 L ..................................... G .......... D ..... L .......... ER ....... 2 . . . . . T . . . . . . . . . . . . . . . D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 .......... T ..... K .......... R .............................................. R 3 ....... D .......... E.R.R 4 .......... E ......... R.R , 35o ~ e | D S D LDPRGS D E RP TRT I I PVOY~ RMAC~VRG ~V~GLDEVP F ~ E K V ~ L " ~ Q ~ W K D 2 ....... A..D..S...V ....................... E.AL .... . ...................... G... 3 • .5. . .EV. .D ...... V. . .F ................... E.AL .......................... S. . , 4 • .I. . .EV. .D .... M.V. . .F ................... E.ALY ......................... S. . . 4o0 450 | ~,ISFVDKDEKARLQHQVTE ,lz~, J~YE~ y HF TKLI~TADA~QS D KLQTWTVSVN~GC S~GCR~ .......................... D..V ................ E ............................ 3 .......................... D ................... E ....................... TG... 4 ..Y ............... ~) 5o0 2 ............................................... N ........................... 3 .................................... R .......... N ........................... 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . N . 3 ......... R ......... E ........................ M ........ K ............... R ..... 650 | P I I FVS D RAN SNKELGVDOE SEE GKGKT SPDKOKOS POPOP GS S DOE S E EOOOFRNI F KQIAGDDME~CADE~KK .................... A .... D..G .... GE .... R..HT .............. R ............... N 3 ........................ QD ....... EK..K.E. SNT ....... @ 7 0 O 75O ] V~NTVVNKHKD L KTH~ TLE S CRSMI ALM~D GS GKLNLe~H H~NKI KAWQK~FKHY~TD Q S~T I NSY~M~h~ 2 .............. Q ........ . ........... R .......... K ............... H ........... ~) aoo VND~HLNNQLYDI I T~ADKHNnNI D~D S~I C~V~G~RAF HAF D KD GD N IKLNVLE~LTMYA 2 ......... S ............................................................. Figure 3. AlignmentsofAminoAcid Sequences of the Muscle-Specific Calpains The human CANP3 protein is shown on the first line. The three muscles-specific se- quences (NS, IS1, and IS2) are underlined, and the three active site residues are indicated by asterisks. The second line corresponds to the rat sequence (SwissProt accession number P16259). The third and fourth lines show the deduced amino acid sequences encoded by pig and cow expressed sequence tags (Gen- Bank/EMBL accession numbers U05678 and U07858, respectively). Amino acids that are conserved among all known calpains are boxed in black. A period indicates that the same amino acid is present in the homologous sequence, and letters refer to the variant amino acid. Positions of missense mutations are given as encircled numbers above the corre- sponding amino acid. sequence rather than minor polymorphic variants. Further- more, these modifications increase the local similarity with the rat CANP3 amino acid sequence (Sorimachi et al., 1989), although the overall similarity is still 94%. The ATG numbered 1 in Figure 2B is presumed to be the translation initiation site based on homology with the rat CANP3 and is within a sequence with five nucleotides out of eight in common with the Kosak consensus se- quence (Kosak, 1984). Putative CAAT and TATA boxes were observed 590 or 324 (CAAT) and 544 or 33 bp (TATA) upstream of the initiating ATG codon (Bucher, 1990). PCR amplification on muscle cDNA libraries excluded the downstream TATA site since it was contained within the amplified products (data not shown). A GC-box binding the Spl protein (Dynan and Tjian, 1983) was identified at position - 4 7 7 . Consensus sequences corresponding to potential muscle-specific regulatory elements were identi- fied (Figure 2A). These include a myocyte-specific en- hancer-binding factor 2 (MEF2)-binding site (Gosset et al. 1989), a CArG box (serum response elements observed in muscle-specific genes and in genes inducible by growth hormones; Minty and Kedes, 1986), and four E boxes (con- tact sites for basic-helix-loop-helix proteins found in mem- bers of the MyoD family; Blackwell and Weintraub, 1990). The functional significance of these putative transcription factor-binding sites in the regulation of CANP3 gene ex- pression remains to be established. Two AAUAAA were identified, 520 and 777 bp down- stream of the TGA stop codon. The sequencing of a partial CANP3 cDNA containing a poly(A) tail demonstrated that the first AAUAAA is the polyadenylation signal. The latter is embedded in a region well conserved with the rat CANP3 sequence and is followed after 4 bp by a GT cluster, pres- ent in most genes 3' of the polyadenylation site (Birnstiel et al., 1985). The 3' untranslated region of the CANP3 mRNA is 565 bp long. Considering the most likely pro- moter location, the predicted length of the cDNA should therefore be approximately 3550 bp. The DNA and protein sequences of the human CANP3 gene show high degree of conservation with other calpains (Figure 3), in particular rat (SwissProt accession numbers J05121 and P16259), bovine (GenBank accession num- ber U07858), and porcine (GenBank accession number U05678) homologous sequences. High local similarities between the human and rat DNA sequences are even observed at the 5' (75%) or in different parts of the 3' un- translated regions (over 60%) (data not shown), sugges- tive of evolutionary pressures in their untranslated regions. Genomic Organization of the CANP3 Gene A comparison of the published CANP3 human cDNA (Sod- machi et al., 1989) with the corresponding genomic se- quence led to the identification of 24 exons ranging in length from 12 bp (exon 13) to 309 bp (exon 1), with a mean size of 100 bp (see Figure 1A). The size of introns ranges from 86 bp to about 10-16 kb for intron 1. The intron-exon boundaries (Table 2) exhibit close adherence Mutations in CANP3 Cause LGMD2A 31 Table 2. Sequences at the Intron-Exon Junctions Score Score Splice Donor Site (%) Intron (%) Splice Acceptor Site Exon , C T C C G g t g a g t . , . , G C T A G g t a g g a . . , , T C C A G g t g a g g , . , , G C T A A g t a a g c . . , , T T G A T g t a a g t . . . . C C C G G g t g t g t , . . . A T G A G g t a a g c . . . . G A T A G g t a g g t . . . . T T C T G g t g a g t . . . . C C C A G g t g g g a . . . . A C G A G g t g t g t , . . . A A G A G g t a t a g . . . . T C T G A g t g a g t . . . . C A G T G g t g a g t . . . . C C A A G g t a g g t . . . . C A C A G g t g t c t . . . . G A G A T g t g a g t . . . .CAAACgtgagt . . . T G G A T g t a t c c .. .GGCAGgtggga . . .CGCAGgtgctg . . . G T T C A g t a a g t . . .TGGAGgtaaag .. 88.5 ~lntron 1~ 99.0 83.5 ~lntron 2~ 90.0 92 ~lntron 3-,- 81.5 82 *-Intron 4~ 81.5 87 ~lntron 5 ~ 79.5 77.5 ~lntron 6~ 91 94 ~lntron 7-* 78.5 89 *-Intron 8~ 91.5 88 ~lntron 9~ 92 80 *-Intron 10~ 68.5 85.5 *-Intron 11~ 86 70 ~lntron 12~ 87 76.5 ~lntron 13~ 97 89 -,-Intron 14~ 93,5 89 ~lntron 15~ 87 80 ~lntron 16~ 88 84 ~lntron 17-* 92,5 83 *-Intron 18~ 90 56 *-Intron 19~ 88 80 ,--Intron 20-* 94 66 ~lntron 21~ 91 79 "~-Intron 22~ 93.5 81 ~lntron 23-* 79 ., t t t t t g t t t c a c a g G A A A T . . . g t g t c t g c c t g c a g G G G A C . . . a c g c t t c t g t g c a g T T C T G . . . a t c c t c t c t c t a a g G C T C C . .. c c a t c g g g c c t c a g G A T G G . . . t t a c t g c t c t a c a g A C A A T . . . . t c t g t g t g c t t a a g G T C C C . . . . c a t t t t c c c a c c a g A T G G A . . t t c c a a c c t c t c a g G A T G T . . , t t c t g g g g g t g c a g A T A C T . . . t g t t t c t t c t c a a g G T T C C . . . t c c c c a t c t c t c a g A T G C A . . . t g t a t t c c t c a c a g G G A A G . . . c t t t t c t t a t g c a g A A A A A . . . c c t c c t c t c t c c a g C C C A T . . . t t g t g c c t c c a c a g C C A C A . . . c c c t t c c t c c t c a g G A C A T . . . c t c c a t c c c c c c a g A C A A G . . . c c t c c c t c c t c c a g A C A G A . . . t t t t c t a t t g c c a g A A A T A . . . g g t c c c c t c c a c a g G A T T C . . . g c a t t c t t t c a c a g G A G C T . . , g g g a c t t c t t t c a g T G G C T . Exon 1 (309 bp) ~ Exon 2 ( 7 0 bp) ~ Exon 3 (119 bp)-* Exon 4 (134 bp) ~ Exon 5 (169 bp) ~ Exon 6 (144 bp) ~ Exon 7 (84 bp) -~ Exon 8 (86 bp) ~ Exon 9 (78 bp) ~ Exon 10 (161 bp) ~ Exon 11 (170 bp) ~ Exon 12 (12 bp) ~ Exon 13 (209 bp)-* Exon 14 (37 bp)--* Exon 15 (18 bp)--* Exon 16 (114 bp)-* Exon 17 (78 bp)-* Exon 18 (58 bp)~ Exon 19 (65 bp) ~ Exon 20 (69 bp)~ Exon 21 (79 bp) ~ Exon 22 (117 bp)~ Exon 23 (59 bp)~ Exon 24 (27 bp)~ A score expressing adherence to the consensus was calculated for each size according to Shapiro and Senapathy (1987). Sequences of exons and introns are in uppercase and lowercase, respectively. Sizes of exons are given in parenthesis. to 5' a n d 3' s p l i c e site c o n s e n s u s s e q u e n c e s (Shapiro a n d S e n a p a t h y , 1987). W h e n t h e g e n o m i c s e q u e n c e w a s sub- mitted to G R A I L analysis ( U b e r b a c h e r a n d Mural, 1991), 11 e x o n s w e r e c o r r e c t l y r e c o g n i z e d , four were not identi- fied, six were i n a d e q u a t e l y defined, and t w o w e r e too small to be r e c o g n i z e d (data not shown). As a l r e a d y noted, the CANP3 g e n e has three u n i q u e s e q u e n c e blocks, NS (amino acid residues 1-61), IS1 (res- idues 2 6 7 - 3 2 9 ) , a n d IS2 (residues 5 7 8 - 6 5 3 ) . It is interest- ing to note that each IS s e q u e n c e , as well as the n u c l e a r translocation signal (residues 5 9 5 - 6 0 0 ) inside IS2, is es- sentially f l a n k e d b y introns (Figure 4). T h e e x o n - i n t r o n o r g a n i z a t i o n of the h u m a n CANP3 is similar to that re- p o r t e d for the c h i c k e n CANP (the only o t h e r large subunit calpain g e n e w h o s e g e n o m i c structure is k n o w n ; Emori et al., 1986). Four microsatellite s e q u e n c e s w e r e identified (see Fig- ure 1A). T w o of t h e m are in the distat part of t h e first intron: an (AT)~4 a n d a p r e v i o u s l y identified n o n p o l y m o r p h i c m i x e d - p a t t e r n microsatellite, D15S498 ( F o u g e r o u s s e et al., 1994). A (TA)7(CA)4(GA)~3 w a s identified in the s e c o n d intron, and g e n o t y p i n g of 64 unrelated C e n t r e d'Etudes du P o l y m o r p h i s m e H u m a i n (CEPH) individuals revealed t w o alleles (with f r e q u e n c i e s of 0.10 and 0.90). T h e fourth microsatellite is a m i x e d (CA),(TA)m r e p e a t present in the ninth intron. T h e latter and t h e (AT),4 r e p e a t h a v e not been i n v e s t i g a t e d for p o l y m o r p h i s m s . O n e MER2 r e p e a t a n d 14 m e m b e r s of the Alu f a m i l y were identified in the CANP3 g e n e (see F i g u r e 1A), which has, thus, on a v e r a g e o n e Alu e l e m e n t p e r 2.5 kb. Expression of the CANP3 Gene T h e pattern of tissue specificity was i n v e s t i g a t e d b y North- ern blot hybridization, T h e r e is no e v i d e n c e for the exis- t e n c e of an a l t e r n a t i v e l y s p l i c e d form of CANP3, although A E x o n s met 1 Inn'on p o s i t i o n s B CANP3 protein ~ Protein domaln_s I Nonsense [B519 mutations M1394 [M32 M2888 P.27 Frameshlft B505 mutations B805 R14 M 1 8 9 4 _ _ Splice site IR11,14,~ . t I7 1921 23 mutauon [2"26"7 ' ' j 2 1 3 1 ~ 5 I 617j8'9] 10 j I l t ~ 3 t~11561781920212223241 ~ 1 1 F F I [ R t o n II ]II I V cysteine pro~ase domain Ca2+ binding dommn (4EF-Hands) R l l 0 X tT360X 945ddG 2069delCA 2313delAGAC ....... 550delA • , 946-1 AG-->AA 2362AG-->TCATCT Figure 4. Distribution of the Mutations along CANP3 Protein (A) Positions of the 23 introns are indicated by vertical bars. Numbers written below refer to the corresponding amino acids. (13) Schematic representation of the CANP3 protein, its four domains (I, II, Ill, and IV), and muscle-specific sequences (NS, IS1, and IS2). The three active site residues are underlined. The positions of missense mutations are indicated by dots. The effects of nonsense and frameshift mutations are illustrated as truncated lines, representing the extent of protein synthesized. The out-of-frame sequence is shown by hatched lines. Names of the families carrying these mutations are indicated to the left of the lines. Cell 32 Table 3. PCR Primers for the Analysis of the CANP3 Gene PCR Product Size Annealing (in Base P a i r s ) Temperature Primer Name Primer Sequences (5'-3') Amplified Region (Genomic DNA) (in Celsius) CANP3-pro.a TTCAGTACCTCCCGTTCACC Promoter 296 59 CANP3-pro.m GATGCTTGAGCCAGGAAAAC CANP3-exl.a CTTTCCTTGAAGGTAGCTGTAT Exon 1 438 60 CANP3-exl .m GAGGTGCTGAGTGAGAGGAC CANP3-ex2.a ACTCCGTCTCAAAAAAATACCT Exon 2 239 57 CANP3-ex2.m ATTGTCCCTTTACCTCCTGG CANP3-ex3.a TGGAAGTAGGAGAGTGGGCA Exon 3 354 58 CANP3-ex3.m GGGTAGATGGGTGGGAAGTT CANP3.ex4.a GAGGAATGTGGAGGAAGGAC Exon 4 292 59 CANP3-ex4.m TTCCTGTGAGTGAGGTCTCG CANP3-ex5.a GGAACTCTGTGACCCCAAAT Exon 5 325 56 CANP3-ex5.m TCCTCAAACAAAACATTCGC CANP3-ex6.a GTTCCCTACATTCTCCATCG Exon 6 315 57 CANP3-ex6.m GTTATTTCAACCCAGACCCTT CANP3-ex7.a AATGGGTTCTCTGGTTACTGC Exon 7 333 56 CANP3-ex7.m AGCACGAAAAGCAAAGATAAA CANP3-ex8.a GTAAGAGATTTGCCCCCCAG Exon 8 321 58 CANP3-ex8.m TCTGCGGATCATTGGTTTTG CANP3-exg.a CCTTCCCTTCTTCCTGCTTC Exon 9 173 56 CANP3-ex9.m CTCTCTTCCCCACCCTTACC CANP3-exl0.a CCTCCTCACCTGCTCCCATA Exon 10 251 56 CANP3-exl 0.m TTTTTCGGCTTAGACCCTCC CANP3-exl 1 .a TGTGGGGAATAGAAATAAATGG Exon 11 355 57 CANP3-exl 1 .m CCAGGAGCTCTGTGGGTCA CANP3-ex12.a GGCTCCTCATCCTCATTCACA Exon 12 312 61 CANP3-exl 2.m GTGGAGGAGGGTGAGTGTGC CANP3-ex13.a TGTGGCAGGACAGGACGTTC Exon 13 337 60 CANP3-ex13.m TTCAACCTCTGGAGTGGGCC CANP3-ex14.a CACCAGAGCAAACCGTCCAC Exon 14 230 61 CANP3-ex14.m ACAGCCCAGACTCCCATTCC CANP3-ex15.a TTCTCTTCTCCCTTCACCCT Exon 15 225 57 CANP3-exl 5.m ACACACTTCATGCTCTCTACCC CANP3-ex16.a CCGCCTATTCCTTTCCTCTT Exon 16 331 56 CANP3-exl 6.m GACAAACTCCTGGGAAGCCT CANP3-ex17.a ACCTCTGACCCCTGTGAACC Exon 17 270 61 CANP3-exl 7.m TGTGGATTTGTGTGCTACGC CANP3-ex18.a CATAAATAGCACCGACAGGGA Exon 18 258 59 CANP3-exl 8.m GGGATGGAGAAGAGTGAGGA CANP3-ex19.a TCCTCACTCTTCTCCATCCC Exon 19 159 57 CANP3-ex19.m ACCCTGTATGTTGCCTTGG CANP3-ex20.a GGGGATTTTGCTGTGTGCTG Exon 20-21 333 61 CANP3-ex20.m ATTCCTGCTCCCACCGTCTC CANP3-ex22.a CACAGAGTGTCCGAGAGGCA Exon 22 282 57 CANP3-ex22.m GGAGATTATCAGGTGAGATGCC CANP3-ex23.a CAGAGTGTCCGAGAGGCAGGG Exon 22-23 608 61 CANP3-ex23.m CGTTGACCCCTCCACCTTGA CANP3-ex24.a GGGAAAACATGCACCTTCTT Exon 24 375 58 CANP3-ex24.m TAGGGGGTAAAATGGAGGAG C A N P 3 - p A . a ACTAACTCAGTGGAATAGGG Polyadenylation signal 413 56 C A N P 3 - p A . m GGAGCTAGGATAGCTCAAT this c a n n o t be e x c l u d e d . A transcript of a b o u t 3 . 4 - 3 . 6 kb w a s d e t e c t e d in skeletal m u s c l e m R N A (Chiannilkulchai et al., 1995). This size therefore further s u p p o r t s the position - 5 4 4 as t h e functional T A T A box. Mutation Screening CANP3 fulfils both positional a n d functional criteria to be a c a n d i d a t e g e n e for LGMD2. We therefore s y s t e m a t i c a l l y s c r e e n e d 38 L G M D families for the p r e s e n c e of n u c l e o t i d e c h a n g e s in CANP3 using a c o m b i n a t i o n of h e t e r o d u p l e x (Keen et al., 1991) and direct s e q u e n c e analyses. PCR p r i m e r s were d e s i g n e d to specifically amplify t h e e x o n s and splice junctions as well as t h e regions con- taining putative C A A T and T A T A b o x e s a n d t h e p o l y a d e - nylation signal (Table 3). P C R p r o d u c t s m a d e on D N A of LG MD patients w e r e then s u b j e c t e d either to h e t e r o d u p l e x analysis or direct s e q u e n c i n g , d e p e n d i n g on w h e t h e r t h e mutation, b a s e d on h a p l o t y p e analysis, w a s e x p e c t e d to be h e t e r o z y g o u s or h o m o z y g o u s , respectively. It w a s oc- c a s i o n a l l y n e c e s s a r y to clone the PCR p r o d u c t s to identify the mutations precisely (i.e., for m i c r o d e l e t i o n s or inser- tions and for s o m e heterozygotes). D i s e a s e - a s s o c i a t e d m u t a t i o n s are s u m m a r i z e d in T a b l e 4, a n d their position Mutations in CANP3 Cause LGMD2A 33 Table 4. CANP3 Mutations in LGMD2A Families Families Amino (R(~union Nucleotide Nucleotide Acid Effect of Restriction Site Protein Exon Haplotype) Position Change Position Mutation Change Domain Mutation 11 13 19 21 22 22 22 2 B519" 328 CGA~TGA 110 4 M42 545 CTG~CAG 182 4 M1394, M2888 550 C~tu~-*CA 184 5 M35, M37 701 GGG~GAG 234 6 M32 945 CGG--*CG 315 6-7 Rl1", R14, R16", R17, R19", 946-1 G~A R21", R26", R27 (I, II) 8 M2407" 1 0 6 1 GTG-*GGG 354 8 M 1394 1079 TGG-*TAG 360 M2888 1468 CGG--*TGG 490 R12" (VII) 1715 CGG~CCAG 572 R27 (VIII) 2069-2070 Deletion AC 690 R14, R17 (111) 2230 AGC--*GGC 744 A*, B501", M32 2306 CGG--*CAG 769 B505 2313-2316 Deletion AGAC 771-772 B505, R14 (IV) 2362-2363 AG~TCATCT 788 Arg~stop -- II R110X Leu~GIn - II L182Q Frameshift -- II 550AA Gly-*Glu - II G234E Frameshift Without Smal II 945AG Abberant -- II 946-1 G--*A splicing Val~Gly - II V354G Trp-*stop Without Bstnl, II T360X without EcoRI, without Scrfl Arg--*Trp III R490T Arg-*GIn Without Mspl Ill R572Q Frameshift IV 2069ACA Ser-*Gly Without Alul IV $744G Arg-*GIn -- IV R769Q Frameshift -- IV 2313AAGAC Frameshift -- IV 2362AG~TCATCT The first letter of the family code refers to the origin of the population (B, Brazil; M, metropolitan France; R, Isle of La R6union; A, Amish; the corresponding haplotypes for La R6union island families are numbered in parentheses). Families that are homozygous for mutation are indicated by asterisks. Positions are numbered on the basis of the cDNA and protein sequences starting from ATG and the first methionine residue, respectively. The mutated nucleotides are underlined. a l o n g t h e p r o t e i n is s h o w n in F i g u r e 4. E a c h m u t a t i o n w a s c o n f i r m e d b y h e t e r o d u p l e x a n a l y s i s , s e q u e n c i n g of b o t h s t r a n d s in s e v e r a l m e m b e r s of t h e f a m i l y o r e n z y m a t i c d i g e s t i o n w h e n t h e m u t a t i o n m o d i f i e d a restriction site. S e g r e g a t i o n a n a l y s e s of t h e m u t a t i o n s , p e r f o r m e d o n D N A s f r o m all a v a i l a b l e m e m b e r s of t h e f a m i l i e s , c o n - f i r m e d t h a t t h e s e s e q u e n c e v a r i a t i o n s a r e on p a r e n t a l c h r o m o s o m e s c a r r y i n g a LGMD2A m u t a t i o n . T o a s s e s s t h e p o s s i b i l i t y t h a t m i s s e n s e s u b s t i t u t i o n s m i g h t b e poly- m o r p h i s m s , t h e i r p r e s e n c e w a s s y s t e m a t i c a l l y t e s t e d in a c o n t r o l p o p u l a t i o n : n o n e of t h e m u t a t i o n s w a s s e e n a m o n g 120 c o n t r o l c h r o m o s o m e s f r o m t h e C E P H r e f e r e n c e f a m - ilies. Chromosome 15 Ascertained Families T h e initial s c r e e n i n g for c a u s a t i v e m u t a t i o n s w a s per- f o r m e d on f a m i l i e s f r o m t h e L a R 6 u n i o n i s l a n d ( B e c k m a n n et al., 1991), f r o m t h e Old O r d e r A m i s h of n o r t h e r n I n d i a n a ( Y o u n g et al., 1992), a n d f r o m Brazil ( P a s s o s - B u e n o et al., 1993). La R~union Island Families G e n e a l o g i c a l s t u d i e s a n d t h e g e o g r a p h i c o r i g i n of t h e f a m - ilies f r o m La R 6 u n i o n w e r e s u g g e s t i v e of a s i n g l e f o u n d e r effect. G e n e t i c a n a l y s e s a r e , h o w e v e r , i n c o n s i s t e n t w i t h this h y p o t h e s i s a s t h e f a m i l i e s p r e s e n t h a p l o t y p e h e t e r o - g e n e i t y , w i t h at l e a s t six d i f f e r e n t c a r r i e r c h r o m o s o m e s ( A l l a m a n d e t al., 1995b). In t h e c o u r s e o f this w o r k , d i s t i n c t m u t a t i o n s c o r r e s p o n d i n g to s i x h a p l o t y p e s h a v e b e e n i d e n t i f i e d ( T a b l e 4). Y e t t h e s e m u t a t i o n s a r e a b s e n t in s o m e of t h e m i n o r h a p l o t y p e s . Thus, s o m e of t h e m m u s t still c o n t a i n at l e a s t o n e o t h e r u n i d e n t i f i e d m u t a t i o n . In f a m i l y R14, e x o n s 13, 21, a n d 22 s h o w e d e v i d e n c e for s e q u e n c e v a r i a t i o n u p o n h e t e r o d u p l e x a n a l y s i s (Figu re 5). S e q u e n c i n g of t h e a s s o c i a t e d P C R p r o d u c t s r e v e a l e d a p o l y m o r p h i s m in e x o n 13, a A--*G m u t a t i o n in e x o n 21 t r a n s f o r m i n g S e r - 7 4 4 to g l y c i n e in t h e s e c o n d EF h a n d ( s e e F i g u r e 4), a n d a A G - * T C A T C T f r a m e s h i f t m u t a t i o n in e x o n 22 c a u s i n g p r e m a t u r e t e r m i n a t i o n a t n u c l e o t i d e 2 4 0 0 w h e r e an i n - f r a m e s t o p c o d o n o c c u r s (see F i g u r e 4). T h e e x o n 21 m u t a t i o n a n d t h e p o l y m o r p h i s m in e x o n 13 f o r m an h a p l o t y p e t h a t is a l s o e n c o u n t e r e d in f a m i l y R17. A f f e c t e d i n d i v i d u a l s in f a m i l y R12 a r e h o m o z y g o u s o v e r t h e e n t i r e LGMD2A i n t e r v a l ( A l l a m a n d et al., 1995b). Se- q u e n c i n g of t h e P C R p r o d u c t s of e x o n 13 r e v e a l e d a G---A t r a n s i t i o n at b a s e 1 7 1 5 of t h e c D N A r e s u l t i n g in a s u b s t i t u - t i o n o f g l u t a m i n e for A r g - 5 7 2 ( F i g u r e 6) i n s i d e d o m a i n Ill, a r e s i d u e t h a t is h i g h l y c o n s e r v e d t h r o u g h o u t all k n o w n c a l p a i n s . This m u t a t i o n , d e t e c t a b l e b y loss of an M s p l r e s t r i c t i o n site, is p r e s e n t o n l y in this f a m i l y a n d in no o t h e r e x a m i n e d L G M D 2 A f a m i l i e s o r u n r e l a t e d c o n t r o l s ( d a t a not s h o w n ) . In f a m i l y R27, h e t e r o d u p l e x a n a l y s i s f o l l o w e d b y se- i ~ t i l l E 4 E 5 E 6 E 8 E l i E l 9 E 2 I E 2 2 Figure 5. Representative Mutations Identified by Heteroduplex Analysis Examples of mutations detected by heteroduplex analysis. Lanes C represent control samples. Exon numbers (E) are indicated below the gels. Pedigree B505 displays the segregation of two different muta- tions in exon 22. Cell 34 A E X O N 2 A A T C C C C G A T T T A A iilVVVl ,,', N o r m a l [,i .î l li~ lli~,"~ M sequence/~Y,.,I~_ .."~ ~.~"i~"'_ i~ A A T C C C T G A T T T A C G A - > T G A A r g 1 1 0 S t o p B E X O N S A G C T G G T G C G G C T i i ~ J, , i i i i Normal A G C T G G G G C G G C T G T G - > G G G Va1354 G l y C E X O N 13 D E X O N 22 C C A T G C G G T A C G C s e q u e n c e :'~ ,. r. 11 / i t:~AV i; . . . . ~,.., C C A T G C A G T A C G C ,Jtv]',:i C G G - > C A G A r g 7 6 9 G i n C C A T G C A G T A C G C C G G -> C A G A r g 7 6 9 G i n Figure 6. Sequence of Homozygous Mutations Sequences from a healthy control are shown above the mutant se- quences present in axons 2 (A), 8 (B), 13 (C), and 22 (D). Asterisks indicate the position of mutated nucleotides. The consequences on codon and amino acid residues are indicated at the left of each panel together with the name of the family. T C C T C C G G G T C T T ~ i i , ?. . N o r m a l ,.,, a i! :: ~: ! r s e q u e n c e I ~ i~ ~ i~-'~ I~,'~ T C C T C C A G G T C T T C G G - > C A G A r g 5 7 2 G l n quencing of the PCR products of an affected child revealed a 2 bp deletion in exon 19 (see Figure 5; Table 4). One AC out of three is missing at this position of the sequence, producing a stop codon at position 2069 of the cDNA se- quence (see Figure 4). Since both mutation detection enhancer heteroduplex analysis and direct sequencing failed to reveal the muta- tions in the two major haplotypes (I and II), an attempt was made to uncover aberrant splicing of the muscle-specifi c calpain mRNA from these patients by taking advantage of the illegitimate transcription in lymphocytes (Chelly et al., 1989). The PCR products corresponding to exons 3-9 showed the presence of a band about 400 bases longer than expected. Fine PCR mapping localized the mutation at the junction of exons 6 and 7, while sequencing of the corresponding products (Figure 7) revealed an acceptor splice site mutation (AG~AA), resulting in the utilization of an alternative site within intron 6, 391 bp upstream of exon 7. The same mutation was seen in haplotypes I and II (Table 4), even though they differ in 14 out of 28 markers. It is still unclear whether these are related to one another or a coincidence of two independent recurrent mutational N o r m a l c D N A s e q u e n c e exon6 / exon7 / P r o T h r A r g [Fhr Ile lie C O 3 / ~ C C G O A C A A T C A T T S p l i c e site m u t a t i o n exon 6 < 391 bo -~ exon 7 P r o T h r A r g IGlu Stop *l C C G A C C C G O G A A T A G C T A C A A t A C A A T C A T T . :~: I / Figure 7. Sequence Analysis of the Junction between Exons 6 and 7 of CANP3 of Illegitimately Transcribed mRNAs The corresponding sequence from a normal muscular CANP3 cDNA (above) or from a LG MD2A patient (below) are shown, together with the corresponding amino acids. The asterisk indicates the G ~ A transition responsible for the loss of the normal acceptor site leading to the use of a cryptic splice site within intron 6. This aberrant splicing results in the presence in the mRNA of an additional 391 bp intronic sequence, represented by an arrow, and of a stop codon as indicated above the sequence. events. The presence of these mutations was confirmed on genomic DNA. The net result is presumably the loss of CANP3 activity since they lead to premature termination by a stop codon that is adjacent to the aberrant splice site (Figure 7). Amish Families As expected, owing to multiple consanguineous links, the examined northern Indiana Amish LGMD2A patients were homozygous for the carrier hapiotype (Allamand et al., 1995b). A G ~ A missense mutation was identified at nucle- otide 2306 within exon 22 (see Figure 6), transforming Arg-769 to glutamine. This residue, which is conserved throughout all members of the calpain family in all species, is located in domain IV of the protein within the third EF hand at the helix-loop junction (see Figure 4). This muta- tion was encountered in a homozygous state in all patients from 10 chromosome 15-1inked Amish families. We also verified that this nucleotide change was not present in patients from the six southern Indiana Amish LGMD fami- lies for which the chromosome 15 locus was excluded by linkage analyses (Allamand et ai., 1995a), thus confirming the genetic heterogeneity of this disease in this genetically related isolate. Brazilian Families As a result of consanguineous marriages, two Brazilian families (B501 and B519) are homozygous for extended LGMD2A carrier haplotypes (Allamand et al., 1995b). Af- fected individuals from family B501 were shown to have the same exon 22 mutation as the Amish LGMD2A pa- tients (see Figure 6), but embedded in a completely differ- ent haplotype (Allamand et al., 1995b). In family B519, the patients carry a C ~ T transition in exon 2 in a homozygous Mutations in CANP3 Cause LGMD2A 35 Table 5. CANP3 Polymorphisms Amino Acid Restriction Site Location Families Nucleotide Position Nucleotide Change Position Change Promoter M42 -408 T~C -- -- 1 M31 96 ACT~ACC 32 Ddel~Secl 3 M37, M4O 495 TTC~TTT 165 Without Mboll 13 R14, R17 1668 ATC~ATT 556 -- Intron 22 Rll, R19, R20, M35, M37 2380 + 12 Deletion A -- -- See Table 4 legend for details. state, replacing Arg-328 with a TGA stop codon (see Fig- ure 6), thus presumably leading to a very truncated and inactive protein (see Figure 4). Analysis of Other LGMD Families Having validated the role of CANP3 in the chromosome 15 ascertained families, we next examined by heteroduplex analysis LGMD families for which linkage data were not informative. These included one Brazilian (B505) and 10 metropolitan French pedigrees. Additional mutations were uncovered enabling the ascertainment of eight more LGMD2A families. Heteroduplex bands were revealed for exons 1,3, 4, 5, 6, 8, 11, 22, and the promoter region of one or more pa- tients (see Figure 5). Of all sequence variants, 10 were identified as pathogenic mutations (five missense, one nonsense, and four frameshift mutations) and four as poly- morphisms. Altogether, two morbid alleles were identified in five families, one in three, and none in three others (Table 4). Identical mutations were uncovered in appar- ently unrelated families. The mutations shared by families M35 and M37 or by M2888 and M1394, respectively, are likely to be the consequence of independent events since they are embedded in different marker haplotypes. In con- trast, it is likely that the point mutation present in exon 22 of the Amish and in the M32 kindreds corresponds to the same mutational event, as both chromosomes share a common four-marker haplotype (D15S779-D15S512- D15S782-D15S780) around CANP3 (unpublished data), possibly reflecting a common ancestor. The same holds true for the exon 22 substitution shared by families B505 and R14. In addition to the polymorphisms present in exon 13 in families R14 and R17 (position 1668) and in the intragenic microsatellites, four additional neutral variations were de- tected (Table 5). Discussion Several lines of evidence implicate the CANP3 gene in the etiology of LGMD2A. This gene is localized inside the 3 Mb LGMD2A interval, in the region suggested by linkage disequilibrium studies (Allamand et al., 1995b). Southern blot experiments (Ohno et al., 1989) and sequence-tagged site screening (data not shown) suggest that there is but one copy per genome of this member of the calpain family. Transcription studies suggested that it is an active gene rather than a pseudogene, and its muscle-specific pattern of expression is consistent with the phenotype of this disor- der (Sorimachi et al., 1989; Chiannilkulchai et al., 1995). A minimum of 17 independent mutational events were identified in families from different ethnic and geographic groups. These represent 15 different mutations, distrib- uted throughout the gene (Figure 4), that cosegregate with the disease in LGMD2A families. The discovery of two nonsense, one splice site, and five frameshift mutations in CANP3 supports the hypothesis that a deficiency of this gene product causes LGMD2A. All eight mutations result in a premature in-frame stop codon, leading to the produc- tion of truncated and presumably inactive proteins (Figure 4). Evidence for the morbidity of the missense mutations come from several sources: their relative high incidence among LGMD2A patients, the failure to observe these mu- tations in control chromosomes, and the occurrence of mutations at evolutionarily conserved residues, in regions of documented functional importance, or both (Sorimachi et al., 1989). Of seven mutations, four change an amino acid that is conserved in all known members of the calpain family in all species (Figure 3). Two of the remaining muta- tions affect less conserved amino acid residues, but are located in important functional domains: V354G is four residues before the asparagine at the active site; $744G within the second EF hand may impair the calcium- dependent regulation of calpain activity or the interaction with a small subunit (Figure 4). Several missense muta- tions change a hydrophobic residue to a polar one or vice versa (Table 4), possibly disrupting higher order struc- tures. The R6union Paradox Only four different mutations were identified by hetero- duplex and sequence analyses in the families of La R~- union, despite the fact that all exons were scanned. Even- tually, the use of illegitimate transcription allowed us to uncover aberrant splicing, leading to the recognition of the same splice site mutation in haplotypes I and II (Table 4). Thus, five different mutations have been identified so far in this population. But given that none of them was found on the remaining carrier haplotypes, there should be at least one other (as yet unidentified) mutation. The presence of at least six different mutations among patients from the La R~union island demonstrates the va- lidity of the suggestion based on haplotype analysis of a multifounder effect (AIlamand et al., 1995b). This is, how- Cell 36 ever, an unexpected result given the multiple consanguin- eous links in these families. Indeed, the affected patients of La R~union all belong to a small genetic isolate, pre- sumed to derive from a single ancestor who immigrated to this island in the 1670s. These patients were all thus expected to carry the same LGMD2A mutation. The occur- rence of multiple independent events in other small popu- lations is not unprecedented (e.g., Bach et al., 1994; Rod- ius et al., 1994; Heinisch et al., 1995), although no satisfactory explanation has been forwarded so far. Thus, the presence of multiple mutations in the La R~- union population needs to be reconciled with the reported low prevalence of this disease, a problem we refer to as the R~union paradox. To begin, the global prevalence of LGMD2A could just be much more common than initially presumed, but this seems highly unlikely. Another hypothesis assumes that heterozygous healthy individuals could benefit from a se- lective advantage (as in 13-thalassemia), resulting in a high local prevalence. This hypothesis, however, seems also unlikely since one would not expect such selective pres- sures to lead to these results in such a limited timespan (i.e., 320 years at most). We therefore speculate that this condition, which has this far been considered as a monogenic disorder, may reflect a more complex inheritance pattern, in which ex- pression of the calpain mutations would be dependent on genetic background (nuclear or mitochondrial). Consider, for instance, a digenic model: only in the presence of spe- cific alleles at a permissive second unlinked locus (e.g., a compensatory, partially redundant, regulatory, or modifier gene) will there be expression of calpain mutations. Since one would need mutations at both loci to be affected, the disease prevalence would remain low. Under this model, members of the La R~union island community would, as a result of genetic drift, have a disease-associated allele at the hypothesized second locus at high frequency (or even fixed in this small population), conditions that would explain the apparent complete penetrance of the calpain mutations. The complete penetrance of this disease in the Amish and in the other described LGMD2A pedigrees would also be under control of the second locus. If this model is true, there may be fewer selective pres- sures against the appearance of CANP3 mutations, as a result of the conditional penetrance. In other words, the frequency of the calpain variants in the overall population can be much higher than initially deduced, based on the estimates of the prevalence of the disease under a simple monogenic model. Assume, for instance, recessive, inde- pendent, and fully penetrant expression at two autosomal loci, with similar frequencies for both deleterious alleles. Considering the reported prevalence (10-5; Emery, 1991) and the estimated genetic heterogeneity, 1 in 25 persons would be a carrier for a deleterious calpain allele (and 1 in 625 would carry deleterious alleles at both loci). The frequency of calpain variants could thus approach that of the cystic fibrosis gene. If we assume the allele frequency at the second locus to be, respectively, 0.1 or 0.2, this would give us one LGMD2A carrier per 58 or 116 individu- als. But, since expression of the trait would require the simultaneous presence of both sets of deleterious alleles, the overall prevalence remains, as expected, on the order of 10 -~. Under this model, some of the families for which the role of the LGMD2A locus was previously excluded, based on linkage analyses assuming simple monogenic inheri- tance, might be authentic LGMD2A families, reflecting dif- ferential segregation of these two unlinked genes. The digenic inheritance model th us predicts that in a number of kindreds, there will be healthy individuals with two mutant calpain genes. To test the validity of this model, we are currently performing a systematic screening for the pres- ence of calpain mutations in DNA from affected probands from all small nuclear families that cannot be identified as LGMD2A on the basis of linkage analyses. We will then assess the cosegregation of the mutations in these pedi- grees, searching for asymptomatic carriers of two mutant calpain genes. An alternative possibility assumes that the calpain muta- tions are pathogenic only in a specific mitochondrial con- text. It is worthwhile remembering that the mitochondria are central suppliers of energy and could therefore influ- ence the fate of muscle cells. The mitochondrial model predicts a monogenic inheritance pattern within nuclear families (all sibs share the same mitochondrial genome). In other pedigrees, however, we should fail to get expres- sion of the disease in individuals carrying calpain muta- tions in a nonpermissive mitochondrial genetic back- ground. It is important to stress that the observations in favor of the proposed departure from simple monogenic inheri- tance are not unprecedented. There has been the report of digenic inheritance of retinitis pigmentosa (Kajiwara et al., 1994). And there are a number of additional descrip- tions of traits for which expression is clearly influenced by the presence of unlinked mutations (e.g., Oppenheim et al., 1990). The concomitant involvement of two or more loci in the LGMD2A phenotype may have some immediate practical consequences. Since the current estimates for genetic heterogeneity are based on a simple monogenic model, these would be truly inadequate under digenic or more complex inheritance models. In addition, if one of the latter turns out to be true, its impact on genetic counseling will also need to be carefully evaluated. Clearly, the mechanism underlying the results reported here may have major bearing on our comprehension of simple genetic traits. They may imply that one may occa- sionally face similar situations in other inbred human popu- lations, such as Finland. The proposed di- or oligogenic model could also explain some of the failures to reproduce known or expected human phenotypes in mouse gene knockout experiments, since the genetic backgrounds allowing for expression of the corresponding traits may not have been present. One particularly convincing example is the work of Rudnicki et al. (1993) on MyoD and Myf-5 transgenic mice, where only double knockout animals ex- press a pathological phenotype. Another example are the I g f l knockouts, which show variable survival rates de- pending on the genetic background (Liu et al., 1993). One Mutations in CANP3 Cause LGMD2A 37 prediction of this model is that the mouse CANP3 knock- outs may also fail to have a pathological phenotype. C A N P 3 a n d L G M D 2 A Identification of CANP3 as a defective gene in LGMD2A suggests a novel pathological mechanism leading to a muscular dystrophy in which this condition is caused by mutations affecting an enzyme and not a structural com- ponent of muscle tissue. This result is to be contrasted with all known other muscular dystrophies, such as Duchenne and Becker (Bonilla et al., 1988), severe childhood autoso- real recessive (M atsumara et al., 1992), Fu kuyama (Matsu- mara et al., 1 9 9 3 ) , merosin-deficient congenital muscular dystrophies (Tom~ et al., 1994), and primary adhalin defi- ciencies (Roberds et al., 1994). The understanding of the LGMD2A phenotype needs to take into account the fact that there is likely to be no active CANP3 protein in several patients, a loss compatible with the recessive manifestation of this disease. Simple models in which this protease would be involved in the degradation or destabilization of structural components of the cytoskel- eton, extracellular matrix, or dystrophin complex must therefore be ruled out. Furthermore, there are no signs of such alterations by immunohistochemical studies on LGMD2 muscle biopsies (Matsumara et al., 1993; Tome et al., 1994). Likewise, since LGMD2A myofibers are ap- parently not different from others that are dystrophic, it seems unlikely that this calpain plays a role in myoblast fusion, as proposed for ubiquitous calpains (Wang et al., 1989). Alternative hypotheses must therefore be forwarded. One could imagine that the CANP3 participates in the acti- vation of an enzyme or other protein involved in muscle metabolism, or it could be responsible for the catabolism of protein compounds in the muscle; the absence of such activity would result in a toxic accumulation of these com- pounds and eventually in the degeneration of muscle fi- bers. We favor another hypothesis wherein the CANP3 protein plays an active role in transduction of a signal, a hypothesis that takes the previously described properties of the ubiquitous calpain and of CANP3 into account. Indeed, studies of CANP3 expression (Sorimachi et al., 1993b) reported that its mRNA is abundant (its expression being 10-fold higher than that of the ubiquitous calpains) and specific to fully differentiated myotubes and myofi- bers. Yet attempts to detect this protein failed (Sorimachi et al., 1993b). This is interpreted as a consequence of its extremely rapid turnover mediated by autocatalysis, possibly reflecting the need for precise regulation of its activity. When the CANP3 protein was expressed and measured in transfected COS and L8 myoblast cells, it was shown to have a nuclear localization (Sorimachi et al., 1993b), possibly mediated by the nuclear translocation signal in the IS2 region. This result suggests that this cellu- lar compartment could be its natural site of action. The calpains have been proposed as having regulatory rather than degradative role, mediated by restricted proteolysis of specific proteins (Wang et al., 1989; Suzuki and Ohno, 1990; Croall and Demartino, 1991). In fact, ubiquitous cal- pains have been reported to regulate transcription by spe- cific cleavage of c-Jun and c-Fos (Hirai et al., 1991) and by controlling NF-~