49 서 론 [1]. . (statistical power) (heterogeneity) . . . (initiation) (progr- ession) . (severe case) (recall bias) [1]. . (unrelated subjects) (adjustment) [2]. imprinting (parent-of-origin) effect . . , . 1948 (Framingham cohort) . (Framingham study) . 1 ( 1). “ ” (family- based cohort population-representative family) . 가족기반 코호트와 가족연구의 사례 1. 프램험 연구 (The Framingham Heart Study 1948 28 62 5,209 ( 2336 , 2873 ) , . 3 2 (systematic sample . 1,644 . 2 . 1971 5,124 2 . 가족기반 코호트 연구의 사례와 전망 설재웅 박수경 오희철 지선하 서울대학교 의과대학 예방의학교실 연세대학교 의과대학 예방의학교실 연세대학교 보건대학원 국민건강증진연구소 원 저 50 4 . 1980 . 1948 (2 ) . 1990 330 . 330 3,041 - , 2,796 , 1,595 . 2000 330 1,399 Affymetrix 100K GeneChip (genotyping) . (Genotype) (cleaning) 1,345 (278 1948 , 1,087 ) . 2002 1948 , 3 2005 4,095 . (phenotype) 8 , (arterial stiffness), , , , , (subclinical atherosclerosis), , , , . 1,731 . , 100 linkage . Linkage (candidate gene approach) . 2003 1739 ESR1 c.454-397T>C polymorphism . CC (genotype) CT TT (genotype) 2 (p value=0.004), 3 (p value=0.001)[9]. Whole-genome wide association . 2006 694 116,204 SNPs (genotype) . (quantitative traits) PBAT “conditional mean model” (multiple testing) 2 (two-stage testing) . , screening 2 , (gen- otype) , 2 SNP (power) . 2 1 No. Country Starting Year Study Name Study Phenotype No. of Subjects Study design 1 USA 1948 Framingham SHARe 8 phenotypes group 15,876 Community-based, longitudinal, family based cohort 2 EU 8 countries, and USA 1999 International Multi-Center ADHD Genetics Project ADHD 2,835 Parent-offspring trios 3 Australia 1991 The Victorian Family Heart Study (VFHS) CVD 2,911 (767 families) Family based cohort 4 USA 1995 The Amish Family Diabetes Study Diabetes 617 Family Study 5 USA 1991 The NIMH Alzheimer’s Disease Study Alzheimer’s Disease 1439 (437 families) Family based cohort 6 USA, Canada, and Australia 1995 The Breast Cancer Family Registry 11,950 families Cancer Family Registry 1:Cupples et al.[3], 2:Kuntsi et al.[4], 3:Ellis et al.[5], 4:Pollin et al.[6], 5:Bertram et al.[7], 6:John et al.[8] Table Table Table Table 1. 1. 1. 1. Examples of family-based study 51 SNP , 2 SNP . , 1 2 screening , , , 50 SNP 100 power SNP 2 , 50 100 . ( , p-value 5x10-4 ) power over-correction unrel- ated population - . FBAT 2 SNP rs7566605 P-value 0.0026 . SNP . KORA 3,996 SNP Linear regression p-value 0.008 . Maywood African-American 866 PBAT SNP p-value 0.009 [10]. 2007 100K GeneChip 17 [3]. 17 (phen- otype) 17 . linkage , FBAT generalized estimating equation (GEE) . 2 . linkage association . monocyte chemoattractant protein-1 1 LOD score=4.96 OR10J1 OR10J3 rs4128725 rs2494250 SNP . factor VII 13 rs561241 SNP SNP . 2007 9,300 550,000 SNPs Whole-genome wide association . 2. International Multi-Center ADHD Genetics (IMAGE) Project IMAGE project 8 12 (Attention Deficit Disorder with Hyperactivity ADHD) 958 (parent-child trios) . 8 . 1,400 [4]. ADHD 3-10%, 2-4% (neurodeve- lopmental) . 6-17 ADHD , . 2006 51 1,038 single-nucl- Phenotype working group Trait SNP rs ID Chr GEE P-value FBAT P-value IN/NEAR gene Select biomarkers Monocyte chemoattractant protein-1 Rs2494250 1 1.0 * 10-14 3.5 * 10-8 FCERIA, ORIOJ3 Monocyte chemoattractant protein-1 Rs4128725 1 3.7 * 10-12 3.3 * 10-8 ORIOJ1 Kidney/Endocrine Cystatin C Rs1158167 20 8.5 * 10-9 0.006 CST9L/CST9 Diabetes fasting plasma glucose Rs2722425 8 2.0 * 10-8 0.005 ZMAT4 Neruology Total Cerebral Brain Volume (ATCBV) Rs1970546 20 4.0 * 10-8 0.005 CDH4 Hemostatic factors Factor VII Rs561241 13 4.5 * 10-16 3.4 * 10-4 F7 Source: Cupples et al.[3] Table Table Table Table 2. 2. 2. 2. Results of Framingham 100K GeneChip 52 eotide polymorphisms (SNP) [11]. 3 haplotype . (Multiple testing) type 1 error SNP haplotype . WHAP 5 SNP sliding window . WHAP (http://pngu.mgh.harvard.edu/~purcell/whap/). TPH2 haplotype ADHD 206 (transmitted) (psedu-control) 151 (tra- nsmitted) haplotype ADHD (p-value =0.004). 600,000 tag SNP genome- wide association scan . 3. The Victorian Family Heart Study (VFHS) 1991 1996 2,911 VFHS . 2,911 (40-70 ) (18-30 ) 767 . (Caucasian) . . VFHS Genome-wide associ- ation , (height) Genome-wide linkage 3 (LOD-score 3.14)[5]. 400 microsatellite marker 22 X 10cM (resolution) genome-wide linkage . 3 . 3 (region) 1-2cM (resolution) fine-mapping linkage 3 78cM . 4. The Amish Family Diabetes Study The Amish Family Diabetes 2 1995 . 2 (18 ) . 1995 1997 727 . 2004 (lipid) 617 (serum lipid level) linkage [6]. 617 28 , 3 69 . Linkage 373 microsatellite markers 9.7 centimo- rgans (average density) genome- wide linkage . 4 . Gene Marker Window P-value Transmitted Non- Transmitted OR Haplotype-specific P-value NET1 16-17-18-19-20 0.005 119 95 1.25 0.101 TPH2 36-37-38-39-40 0.007 206 151 1.36 0.004 PER2 3-4-5-6-7 0.016 188 160 1.18 0.133 ADRB2 4-5-6-7-8 0.024 137 98 1.40 0.011 HTR1E 9-10-11-12-13 0.031 15 8 1.88 0.144 MAOA 12-13-14-15-16 0.033 167 133 1.26 0.050 CHRNA4 11-12-13-14-15 0.046 14 3 7.12 0.008 Source: Brookes et al.[11] Table Table Table Table 3. 3. 3. 3. Haplotype analysis using 5-SNP sliding window method and analyses using UNPHASED 53 LDL 2p23, 3p25, 19p13 LOD-score . whole-genome wide association . 5. The NIMH Alzheimers disease Study 1991 1997 1,439 . 2 437 . 994 , 411 . 10 . 2005 linkage 9q22 3 19 SNP . FBAT cond- itional logistic regression . UBQLN1 SNP [7]. 6. The Breast Cancer Family Registry The Breast Cancer Family Registry , , 3 2003 9 11,950 . 6,126 population-based case , 2,990 population-based control , 1,647 clinic-based control , 1,187 clinic-based control [8]. 2006 ATM . odds ratio=5.5 [12]. 가족기반 코호트 연구에서 주로 사용되는 통계분석방법 1. Linkage 분석 linkage . Linkage (disease locus) (recombination) . (recombination) (crossing-over) . , Link- age (segregate) (no linkage) . , (recombination) (θ) . 50% (recombination) . (marker) (disease gene) (recombination rate) Chromosome location Distance (cM) Closet marker(s) Trait LOD (P-value) Positional candidate genes 2p23 32 D2S312/D2S220 LDL-C 2.17 (0.0008) APOB, LPIN1, ABCG5, ABCG8 3p25 25 D3S1263 LDL-C 2.47 (0.0004) PPARG 11q23 135 D11S1345 LNTG 2.03 (0.001) APOC3, APOA1, APOA4, APOA5 19p13 27 D19S221 LDL-C 2.15 (0.0008) LDLR 39 D19S433 2.23 (0.0007) LDL-C: serum low density lipoprotein cholesterol, LNTG: ln-transformed TG Candidate genes: ABCG5: ATP-binding cassette, subfamily G, member5; ABCG8: ATP-binding cassette, subfamily G, member8; APOA1: apolipoprotein A1; APOA4: apolipoprotein A4; APOA5: apolipoprotein A5; APOB: apolipoprotein B; APOC3: apolipoprotein C3; LDLR: low density lipoprotein receptor; PPARG: peroxisome proliferators-activated receptor-gamma Source: Pollin et al.[6] Table Table Table Table 4. 4. 4. 4. Multipoint linkage analysis peaks with LOD>=2.0 (p<0.0012) in the Amish Family Diabetes Study 54 . linkage [13]. marker locus restriction fragment length polymorphisms(RFLP), variable number of tandem repeats(VNTR), microsatellite ( : CA repeats), single nucleotide polymorphism(SNP) microsatellite SNPs . VFHS 400 micros- atellite 10cM (resolution) whole genome linkage . , VFHS (resolution) fine-mapping linkage . Linkage . (extended families) (nuclear families) (pairs of sibs) linkage . linkage (parametric) linkage (non-parametric) linkage . Linkage Merlin . Merlin web-site (http://www.sph.umich.edu/csg/abecasis/Merlin/). 2. TDT 분석과 FBAT (association) case-parents trio Trans- mission Disequilibrium Test (TDT) (extended family-based association study data) Family based association test (FBAT) package . TDT trios ( , ) . TDT . marker allele (transmitted) 50% . 1 general model M1 (allele) (transmitted) M2 (allele) (transmitted) . , M1 50% . M1 allele (risk allele) ( 1). (example) 1 (dad) M1 allele (transmitted) , M1 allele (M2 allele) (transmitted) . b cell (count) . (mom) M1 allele(M3) , M1 allele(M4) . d cell (count) . trios a,b,c,d cell (count) . M1 50% . b cell c cell M3M4M1M2 M1M3 M3M4M1M2 M1M3 M3M4M1M2 M1M3 Null hypothesis (Ho): probability of transmitting the M1 allele is 0.5 McNemer’s Chi-Square test:  =(b-c) 2 /(b+c) Degree of freedom (d.f.)=1 If  > 3.84, then reject HoM1 a b c d M1 M1 Tra n sm itte d Not Transmitted M1 M1 0 2 0 0 M1 M1 Tra n sm itte d Not Transmitted M1 M1 0 2 0 0 M1 M1 Tra n sm itte d Not Transmitted M1 a) general model b) example 1 c) example 2 Fig.Fig.Fig.Fig. 1.1.1.1. Example of TDT analysis 55 . McNemar chi-square test: Χ 2 =(b-c) 2 /(b+c) 2 [14]. FBAT TDT trios (extended family-based asso- ciation study data) . TDT (binary trait) FBAT (quantitative trait) . (missing) FBAT . FBAT website (http://www.biostat. harvard.edu/~fbat/fbat.htm) [15]. Illumina Affymerix Gene Chip Whole-Genome association study . TDT PLINK [16]. 3. Imprinting (parent-of-origin) effect 분석 Imprinting(parent-of-origin) effect screening Cordell STATA . Cordell (maternal transmission) (paternal transmission) TDT [17]. 5 Cordell imprinting effect . bipolar disorder 344 TDT . 5 SNP rs789024 (transmission) p-value 0.003 , (transmission) p-value 0.00007 , (transmission) p-value 0.78 . SNP bipoloar disorder [18]. , PLINK SNP (GWA ) Imprinting effect screening [16]. imprinting effect Weinberg log-linear . imprinting (genotype effect) (genotype effect) (adjustment) imprinting [19]. 가족기반 코호트 연구의 장점, 단점 및 문제점 1. 장점 . , popu- lation stratification . Population stratification (allele freq- Rs789024 Rs1893157 T U P-value T U P-value Stratified TDT Paternal 77 35 0.00007 Paternal 52 19 0.00009 Maternal 60 57 0.78 Maternal 24 23 0.88 Total 137 92 0.003 Total 76 42 0.002 Source: Mulle et al.[18] T: transmitted, U: untransmitted Table Table Table Table 5. 5. 5. 5. Transmission of alleles to bipolar I offspring, stratified by parent-of-origin of alleles 56 uency) - (false positive) . , linkage association . linkage association [2]. , (heritability) . , familial risk . epigenetic study (unrelated individual) (information) . , , , imprinting . Imprinting imprinting [20] , imprinting . 2. 단점 및 문제점 - [21]. , - [15]. , TDT main effect . , , main effect [22]. , “ ” (GEE ) , genetic correlation environmental correlation . 가족기반 코호트 연구의 방향 및 전망 Linkage (candidate gene) , genome-wide association (GWA) . GWA SNP (multiple testing) . - multi-stage . multi-stage , - data set screening testing [23]. GWA . - . , - population substructure . Wellcome Trust Case Control Conso- rtium 7 Genome-wide association population structure [24]. GWA - [15]. imprinting . , IMAGE project GWA . GWA [25]. GWA . 요 약 57 . population structure , imprinting (association) linkage . IMAGE project genome-wide association . linkage TDT , imprinting effect . . - genome-wide association . genome-wide association . genome- wide association genome-wide imprinting . . 참고문헌 1. Collins FS. The case for a US prospective cohort study of genes and environment. Nature. 2004 May 27;429(6990):475-7. 2. Gauderman WJ, Conti DV. Commentary: Models for longitudinal family data. Int J of Epidemiology 2005;34:1077-1079. 3. Cupples LA, Arruda HT, Benjamin EJ, D’Agostino RB Sr, Demissie S, DeStefano AL, Dupuis J, Falls KM, Fox CS, Gottlieb DJ, et al. The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med Genet. 2007;8 Suppl 1:S1. 4. Kuntsi J, Neale BM, Chen W, Faraone SV, Asherson P. The IMAGE project: methodological issues for the molecular genetic analysis of ADHD. Behav Brain Funct. 2006 Aug 3;2:27. 5. Ellis JA, Scurrah KJ, Duncan AE, Lamantia A, Byrnes GB, Harrap SB. Comprehensive multi-stage linkage analyses identify a locus for adult height on chromosome 3p in a healthy Caucasian popul- ation. Hum Genet. 2007 Apr;121(2):213-22. 6. Pollin TI, Hsueh WC, Steinle NI, Snitker S, Shuldiner AR, Mitchell BD. A genome-wide scan of serum lipid levels in the Old Order Amish. Atherosclerosis. 2004 Mar;173(1):89-96. 7. Bertram L, Hiltunen M, Parkinson M, Ingelsson M, Lange C, Ramasamy K, Mullin K, Menon R, Sampson AJ, Hsiao MY, Elliott KJ, Velicelebi G, Moscarillo T, Hyman BT, Wagner SL, Becker KD, Blacker D, Tanzi RE. Family-based association between Alzheimer’s disease and variants in UBQ LN1. N Engl J Med. 2005 Mar 3;352(9):884-94. 8. John EM, Hopper JL, Beck JC, Knight JA, Neuha- usen SL, Senie RT, Ziogas A, Andrulis IL, Anton- Culver H, et al. The Breast Cancer Family Registry: an infrastructure for cooperative multinational, interdisciplinary and translational studies of the genetic epidemiology of breast cancer. Breast Cancer Res. 2004;6(4):R375-89. 9. Shearman AM, Cupples LA, Demissie S, Peter I, Schmid CH, Karas RH, Mendelsohn ME, Hous- man DE, Levy D. Association between estrogen receptor alpha gene variation and cardiovascular disease. JAMA. 2003 Nov 5;290(17):2263-70. 10. Herbert A, GerryNP, McQueen MB, Heid IM, Pfeufer A, Illig T, Wichmann HE, Meitinger T, Hunter D, Hu FB, Colditz G, Hinney A, Hebebrand J, Koberwitz K, Zhu X, Cooper R, Ardlie K, Lyon H, Hirschhorn JN, Laird NM, Lenburg ME, Lange C, Christman MF. A common genetic variant is associated with adult and childhood obesity. Science. 2006 Apr 14;312(5771):279-83. 11. Brookes K, Xu X, Chen W, Zhou K, Neale B, Lowe N, Anney R, Franke B, Gill M, Ebstein R, 58 et al. The analysis of 51 genes in DSM-IV combined type attention deficit hyperactivity disorder: ass- ociation signals in DRD4, DAT1 and 16 other genes. Mol Psychiatry. 2006 Oct;11(10):934-53. 12. Bernstein JL, Teraoka S, Southey MC, Jenkins MA, Andrulis IL, Knight JA, John EM, Lapinski R, Wolitzer AL, Whittemore AS, West D, Seminara D, Olson ER, Spurdle AB, Chenevix-Trench G, Giles GG, Hopper JL, Concannon P. Population- based estimates of breast cancer risks associated with ATM gene variants c.7271T>G and c.1066- 6T>G (IVS10-6T>G) from the Breast Cancer Family Registry. Hum Mutat. 2006 Nov;27(11):1122-8. 13. Thomas DC. Statistical Methods in Genetic Epide- miology. Oxford UniversityPress. 2004; 15-17. 14. Spielman RS, McGinnis RE, Ewens WJ. Transm- ission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet. 1993 Mar;52(3):506 -16. 15. Laird NM, Lange C. Family-based designs in the age of large-scale gene-association studies. Nat Rev Genet. 2006 May;7(5):385-94. 16. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep; 81(3):559-75. 17. Cordell HJ, Barratt BJ, Clayton DG. Case/pseudoc- ontrol analysis ingenetic association studies: A unified framework for detection of genotype and haplotype associations, gene-gene and gene-envi- ronment interactions, and parent-of-origin effects. Genet Epidemiol. 2004 Apr;26(3):167-85. 18. Mulle JG, Fallin MD, Lasseter VK, McGrath JA, Wolyniec PS, Pulver AE. Dense SNP association study for bipolar I disorder on chromosome 18p11 suggests two loci with excess paternal transmi- ssion. Mol Psychiatry. 2007 Apr;12(4):367-75. 19. Weinberg CR. Methods for detection of parent- of-origin effects in genetic studies of case-parents triads. Am J Hum Genet. 1999 Jul;65(1):229-35. 20. Le Stunff C, Fallin D, Bougnères P. Paternal transmission of the very common class I INS VNTR alleles predisposes to childhood obesity. Nat Genet. 2001 Sep;29(1):96-9. 21. Hopper JL, Bishop DT, Easton DF. Population- based family studies in genetic epidemiology. Lancet. 2005 Oct 15-21;366(9494):1397-406. 22. Weinberg CR, Umbach DM. A hybrid design for studying genetic influences on risk of diseases with onset early in life. Am J Hum Genet. 2005 Oct; 77(4):627-36. 23. Van Steen K, McQueen MB, Herbert A, Raby B, Lyon H, Demeo DL, Murphy A, Su J, Datta S, Rosenow C, Christman M, Silverman EK, Laird NM, Weiss ST, Lange C. Genomic screening and replication using the same data set in family-based association testing. Nat Genet. 2005 Jul;37(7):683 -91. 24. Wellcome Trust Case Control Consortium. Genome- wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007 Jun 7;447(7145):661-78. 25. Sung J, Cho SI. Strategy Considerations in Genome Cohort Construction in Korea. Korean J Prev Med. 2007 Mar;40(2):95-101. 59 Family-based designs are commonly used in genetic association studies to identify and to locate genes that underlie complex diseases. In this paper, we review two examples of genome-wide association studies using family-based cohort studies, including the Framingham Heart Study and International Multi-Center ADHD Genetics Project. We also review statistical methods of family-based designs, including the transmission disequilibrium test (TDT), linkage analysis, and imprinting effect analysis. In addition, we evaluate the strengths and limitations of the family-based cohort design. Despite the costs and difficulties in carrying out this type of study, a family-based cohort study can play avery important role in genome wide studies. First, the design will be free from biases due to population heterogeneity or stratification. Moreover, family-based designs provide the opportunity to conduct joint tests of linkage and association. Finally, family-based designs also allow access to epigenetic phenomena like imprinting. The family-based cohort design should be given careful consideration in planning new studies for genome-wide strategies. : Family-based cohort study, transmission disequilibrium test (TDT), linkage study Examples and outlook of family-based cohort study Jae Woong Sull 1,4) , Sue Kyung Park 2) , Heechoul Ohrr 3) , Sun Ha Jee 1,4)