Gene extinction and allelic origins in complex genealogies Proc. R. Soc. Lond. B 219, 241-251 ( 1983) Printed in Great Britain 241 Gene e x tin c tio n a n d allelic origins in com plex genealogies B y E l i z a b e t h A. T h o m p s o n Statistical L a b o r a t o r y ,Depart ment of Pure Mathematics and Mathematical Statistics, 16 Mill Lane, Cambridge CB2 1 U.K. W ith the increasing em phasis on d a ta analysis in m athem atical genetics, problem s of param etrizing genealogical stru ctu re become of practical im portance. A complete specification of the genetic effects of genealogical stru ctu re is provided by th e probabilities of genetically d istin ct states of gene id en tity by descent. A lthough this provides a direct p aram etrizatio n for the jo in t d istrib u tio n of tra its on a set of related individuals, it is an unwieldy tool in the analysis of large and complex genealogies. P ro b ­ abilities of jo in t descent of founder genes and likely ancestries of alleles provide altern ativ e characterizations of relationship and have direct application in practical problems. J o in t extinction probabilities of founder genes can also be derived as ancestral likelihoods: evolutionarily, the m ost significant characteristic of a genealogical stru ctu re m ust be its effect on the survival and extinction of genes. 1. P o p u l a t i o n s t r u c t u r e Genetic v ariab ility is the basis of evolution, and much of the evolution of higher organisms, and especially of m an, m ay have tak en place w ithin small isolated groups of individuals, w ithin which short-term history m ay have had long-term consequences. An analysis of the stru ctu re of such groups is an im p o rtan t p a rt of an understanding of the role of detailed genealogical history in the determ ination of cu rren t genetic distributions. Over the last few years the em phasis in m a th e ­ m atical genetics has m oved from analyses of genetic models of evolutionary processes tow ards m ethods for the analysis of d a ta , and th u s tow ards more detailed descriptions of small-scale phenom ena. In a small population or population sample, it is the genealogical stru c tu re which provides th e essential link between observable characteristics of individuals and genetic models for the determ ination of such observations. I shall restrict discussion of population stru ctu re to the context of a single Mendelian autosom al locus. T h a t is, for the p articu lar characteristic of interest, the ty p e of an individual is determ ined by the types of the two genes th a t he carries, one of which he received from his fath er and the other from his m o th e r; to each of his offspring he will pass on a random ly chosen one of these two genes. A gene in an individual will refer to one of these two homologous genes, and a trait will be an observable characteristic of individuals determ ined by the unordered pair of types of these two genes. Of course, evolutionary processes involve very much more th a n the segregation of discrete Mendelian autosom al genes: but, w hatever the ram ifications of DNA sequences and complex multi-locus systems, it rem ains the fact th a t much of th e norm al v ariation observed w ithin populations is of [ 21 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 242 Elizabeth A. Thompson F ig u r e i . Genealogy used for the purposes of example throughout this paper. Males are denoted by squares and females by circles. Oblique strokes denote current and current carrier individuals (see table 1). T a b l e 1. Spe c if ic at io n of t h e e x a m p l e ge n e a l o g y of f ig u r e 1 individual mother father sex comment i 0 0 1 founder 2 0 0 2 founder 3 0 0 1 founder 4 2 1 2 — 5 4 3 2 — 6 0 0 1 founder 7 0 0 2 founder 8 2 1 1 — 9 7 6 2 — 10 7 6 2 — 11 2 1 1 — 12 9 8 1 — 13 9 8 1 current 14 10 11 2 current carrier 15 10 11 1 current carrier 16 14 13 1 current carrier 17 0 0 1 founder 18 5 17 2 current 19 18 12 2 current v arian ts w ith o u t m arked selective effects, n o t closely linked to oth er m arkers, and segregating according to M endel’s first law. Such tra its are involved in m any of the open questions of d a ta analysis. In principle, a genealogy is a graph w ith some special characteristics. Everyone has precisely two parents, and a specification of the p aren ts of all individuals p a st [ 22 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 and present is the genealogy (see figure 1 and table 1). In practice, only the p aren ts of some lim ited set of individuals can be specified. Genealogical relationships are thus defined relative to some set of individuals w ith unspecified p a ren ts; the founders of the genealogy. These m ay be actual im m igrants to an isolated population or th ey m ay be designated founders in a purely artificial sense. A lthough this specification of offsp rin g -m o th er-fath er trip lets is the genealogy, it is of little use as it stands. A useful param etrizatio n of stru ctu re m ust relate to relevant genetic events, such as the survival or ancestry of certain genes, and m ethods of param etrizatio n m ust provide m ethods of d a ta analysis. As an exam ple I shall use th e small genealogy of figure 1, which shows useful complex features, b u t is still easily analysed. Six individuals are assum ed to constitute the current p o p u la tio n ; three of them are supposed to carry a certain ty p e of gene of interest (table 1). 2. G e n e i d e n t i t y b y d e s c e n t Specified genes in a set of individuals are said to be identical by descent if all are received by repeated segregation from a single gene in some common ancestor. In this paper, id en tity of genes will refer always to id en tity by descent ra th e r th a n of type. The genes of n specified individuals m ay be considered as an ordered set of n unordered pairs of genes. The 2 ngenes fall into disjoint subsets, the genes w ithin any subset being identical. However, m any of the partitio n s of the 2 genes are genetically equivalent, due to the fact th a t the two genes w ithin an individual act as an unordered pair in the determ ination of tra its. By defining equivalence relations between partitio n s obtained from each other from interchanging the two genes of some subset of individuals, one obtains equivalence classes th a t are genetically distinct states of gene id en tity (Thompson 1974). The num ber of equivalence classes increases rapidly w ith n, although n o t as quickly as the num ber of partitions. F o r n = 6 there are 4213597 p artitio n s in 198091 genetically d istinct gene id en tity states. F o r convenience of exam ple and reference, consider here two sum m ary statistics of the probabilities of gene id en tity states. The kinship coefficient, \}r, between two (not necessarily distinct) individuals B x and is the probability th a t a gene random ly chosen from B 1 is identical to a gene independently selected from The inbreeding coefficient of an individual is th e kinship coefficient between his parents, or the probability th a t he is autozygous\ th a t is, th a t he carries two identical genes. I f the two paren ts of an individual share no ancestors (relative to the specified genealogy), the individual has zero probability of autozygosity. Between two such individuals there are only three possible states of gene id e n tity : the individuals have i genes in common w ith probability kt, 0 , 1 , 2 (k0 + k1 + k2 = 1). Their kinship coefficient is B 2) = \ k2(Bx, B 2) + \ k1(B1, B 2), (1) for when the individuals have 1(2) gene(s) in common there is p robability \ (|) th a t the gene chosen from each will be identical to each other. More generally, ^ is a linear com bination of gene id en tity state probabilities. Gene extinction and allelic origins 243 [ 23 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 244 The genealogy of individuals determ ines a p ro b ab ility of each of the possible states. The converse is n o t t r u e ; sta te probabilities do n o t uniquely determ ine a genealogy. F o r exam ple, uncle, half-sib and g ra n d p a ren t all have the same sta te probabilities (table 2). However, th e genealogical relationship affects th e jo in t probability d istrib u tio n of observable genetic tra its only through th e sta te probabilities. P (d a ta | genealogy) = X P ( d a ta | state) P (state|genealogy). (2) states F u rth er, any p robability sta te m e n t a b o u t types of fu tu re jo in t descendants of the individuals is dependent on th e ancestral genealogy only th ro u g h these cu rren t T a bl e 2. P r o b a b il it ie s of g e n e id e n t it y st a t e s b e t w e e n a p a ir of NO N -INBRED RELATIVES Elizabeth A. Thompson kt = P(i genes in common) &2 K K kinship, \]r parent-offspring 0 l 0 14 full-sib 14 12 14 14 uncle, half-sib, grandparent 0 12 12 18 double-first-cousin X16 _6_16 JL16 18 quadruple-half-first cousin X32 M32 1732 18 T a b l e 3. P r o b a b il it ie s of st a t e s of g e n e id e n t it y b y d e sc e n t for INDIVIDUALS 16 AND 19 OF FIGURE 1 210 x probability all four genes identical 1 both autozygous, with distinct genes 3 only 16 autozygous; 1 gene shared with 19 34 only 16 autozygous; 0 genes shared with 19 90 only 19 autozygous; 1 gene shared with 16 10 only 19 autozygous; 0 genes shared with 16 18 neither autozygous; 2 common genes 10 neither autozygous; 1 common gene (kx) 336 neither autozygous; 0 common genes 522 total 1024 state probabilities. In th is precise sense, the genetic consequences of genealogical relationship are sum m arized by th e state probabilities th a t the genealogy determ ines. A lthough directly related to tr a it distributions, th e set of gene id en tity state probabilities has two m ajor disadvantages as a p aram etrizatio n of a genealogy. N ot only is there a large num ber of possible states, b u t the set of states of positive [ 24 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 probability is n o t easily recognizable from the genealogy. Even between two individuals, each of whom m ay be autozygous, there are in general 9 states, and, for example, these all have positive probability for the two individuals 16 and 19 of figure 1 (see table 3). On the oth er hand, only 1794 of the 198091 states between six individuals have non-zero probability for the six current individuals of figure 1. This can only be determ ined essentially by counting, and a set of 1794 probabilities is in any case an unwieldy specification of th eir jo in t relationship. The other disadvantage is more serious. Genealogical relationships cannot be characterized as probability distributions on the set of gene id en tity states, for not all distributions are a tta in ab le even in the lim it. In general the tru e space of state probabilities is unknown. In the sim plest case of relationships between two non-inbred individuals, Thom pson (1976) has shown th a t k\ ^ 4&0&2. (3) F u rth er, given any specified dyadic-rational 1) satisfying (3) a genealogy providing these ktcan be constructed. I t is of interest th a t both restriction and construction derive from a consideration of th e cross-parental kinship coefficients. Any dyadic-rational value in [0, 1] is a ttain ab le as a kinship coefficient between two individuals in some genealogical structure. K inship co­ efficients th u s seem to provide a more n a tu ra l param etrizatio n of relationship. However, they are an insufficient sum m ary of relationship: half-sibs, double- first-cousins and quadruple-half-first-cousins all have the same coefficient of kinship (table 2), b u t different &r values and hence different jo in t distributions of genetic tra its. Gene extinction and allelic origins 245 3. D e s c e n t p r o b a b i l i t i e s Despite th eir inadequacies, kinship coefficients are the one universally recognized sum m ary of genealogical structure. They are also readily com puted for th ey satisfy a simple recursive equation. Provided B x is n o t nor a direct ancestor of t ( B x, B 2) = h m M 1, B 2)}, (4) where M x and Fx are the p aren ts of B x, since when a gene of B x is random ly selected it is a gene of M x (Fx) w ith probability | (|). F u rth e r xlr{Bx, B x) = £{1 (5) for the genes chosen ‘ w ith replacem ent ’ from B x are the same gene w ith probability and are the two distinct genes of B x (one from M x and one from Fx) also w ith probability K arigl (1981) has extended the definition of kinship coefficients to a rb itra ry num bers of genes. He defines xjr{Bx, B 2. .. ) to be the probability th a t one gene chosen from each of the n individuals, B x, B 2.. .,B n, ( n > 1), are all identical. These generalized kinship coefficients satisfy generalizations of (4) and (5), and are related to probabilities of gene id en tity states by generalizations of (1). Since gene id en tity state probabilities are uniquely determ ined by a sufficient set of these generalized multiple kinships, the la tte r provide an equivalent param etrization of genealogical [ 25 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 246 structure. This p aram etrizatio n is less directly related to jo in t distributions of genetic tra its (equation (2)), b u t is more closely related to the original genealogical specification in term s of individuals and th eir two parents, and to ancestry of genes. T a bl e 4. D e s c e n t p r o b a b il it ie s from f o u n d e r g e n e s to c u r r e n t INDIVIDUALS (The notation 1(2) denotes that both genes of individual 1 are included in set S, 6(1) that 1 gene of 6 is included, and so on. Probabilities are given as a pair, denoting i/2?.) ancestral set, S current set Elizabeth A. Thompson {14,15,16,19} {14,15,16} {16,19} {16} {19} {1(1)} 17,14 5,9 1,6 1,3 3,5 (6(1)} 3,12 5,9 3,8 1,3 1,4 {6(2)} 3,10 3,7 1,5 1,2 1,3 {6(1), 7(1)} 21,12 9,8 5,7 1,2 1,3 {6(1), 1(1)} 61,13 11,8 3,6 1,2 5,5 {6(1), 1(2)} 177,13 43,9 25,8 3,3 1,2 However, m ultiple kinships are ra th e r stric t in insisting on id e n tity of all of a large num ber of genes, and ra th e r loose in allowing id e n tity to any ancestral gene. An altern ativ e generalization is to define d s ( B 1, B 2, . . . , B 1, to be the p robability th a t genes chosen from each of the n individuals are all descended from some gene in a specified set of founder genes S (not necessarily from the same gene w ithin this set). Provided B 1 is d istin ct from individuals B 2, __ , B n (if any) and is n o t an ancestor of any of them , clearly gs (Blt B „ . . . , B n) = (i) {gs (MvB „ ., B n) + (6) F u rth er, if B x = . . . = B r(1 < r ̂ n )is d istin ct from and n o t ancestral to any of the oth er (n — r) individuals (not them selves necessarily distinct) 9s (Bi , B 2, . . . , B n) = (!) r̂ ^{dsiBi, B r+1, . . . , B n) + (2(r-1) - 1)gs {Mx, B r+1, . . . , (7) since th e p robability th a t the same gene is selected from B x on each of r occasions is (|)(r_1), and if two different genes are selected th ey consist of a random gene from each of the p aren ts M x and Fx of B x. (The functions are, like \Jr, sym m etric in th eir argum ents.) These probabilities m ay th u s be com puted readily for a rb itarily specified founder sets S, gssatisfying simple boundary conditions where individuals B t have genes specified to be in 8. C om putationally, the num ber n is lim ited b u t the num ber of genes in 8 is not. Thus we can com pute probabilities th a t specified current genes descend from various ancestral sets, and, more im p o rtan t, the dependence between descent from certain ancestors to different current individuals. Consider, for example, the genealogy of figure 1. The jo in t descent probabilities to various of the current individuals from various ancestral sets are given in table 4. N ote th a t descent of [ 26 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 a given gene can only increase the probability of descent of the same gene to a relative, and only decrease the probability of descent of other genes. J o in t descent to 16 and 19 from th e couple (6, 7) has probability whereas the pro d u ct of the separate probabilities is only On the other hand descent to 16 and 19 from (1,2) makes descent to 14 and 15 from (6, 7) less probable. In this small genealogy with only four generations interactions are small, b u t on a large and complex genealogy Thom pson (1983) has shown jo in t descent probabilities more th a n 100 tim es the pro d u ct of separate values. Gene extinction and allelic origins 247 4. I n f e r r i n g a nc est ral t y p e s of g e n e s The descent probabilities of the previous section have direct practical application in inferring the ancestral origins of certain alleles (th a t is, genes of a certain type) in the current population. We shall denote the p articu lar allele of in terest by and a gene of any oth er ty p e by a2. Suppose we have some num ber of individuals (Bx, B n) known to carry a x, and consider a set S of hypothesized ancestral copies of th is allele. Then gs {Bx, B 2, . .. ,B is th e probability th a t a random ly chosen gene in each of these current individuals derives from the ancestral set S, and com paring these probabilities for altern ativ e sets 8 provides relative likelihoods of these sets as the ancestral allelic ax copies. There are two m ajor oversimplifications here. F irst, descent only to individuals carrying th e a x allelle is considered. Any inform ation on its non-descent to other individuals is n o t included. In figure 1 descent to the assum ed carriers (14, 15 and 16) is sym m etric between couples (1, 2) and (6, 7) (see table 4) b u t the fact th a t 18 and 19 are n o t carriers m ust make (6, 7) the more likely founder carriers. F u rth er, analysis of descent only to carriers m ust bias the analysis tow ards the conclusion of more ancestral copies. H ypotheses involving different num bers of original copies are n o t com parable. Secondly, n o t only are d a ta on current non-carriers disregarded, b u t also inform ation on types of ancestors. F o r example, individuals carrying two copies of th e ax allele m ay have decreased survival probabilities: often, tra its of interest in large and complex genealogies are of this recessive type. Ancestors th en have some reduced (perhaps even zero) probability of having carried two such alleles. In a complex genealogy over m any generations inclusion of this fact can alter inferences. Against these disadvantages there are two advantages. A lthough for sim plicity the genes of S were above specified as being genes of founders, in fact, provided S does not involve individuals who are ancestors of each other, th ey m ay be any ancestral genes. Hence descent of a p articu lar allele m ay be traced down the genealogy, by hypothesizing ancestral sets S a t different generations. The second advantage is the ease of c o m p u ta tio n : m any altern ativ e hypotheses m ay be very rapidly assessed. These advantages are a p p aren t in a re-analysis of the d a ta of K idd et al. (1980) on the ancestry of propionic acidaem ia in a M ennonite-Amish genealogy. The two disadvantages also apply, b u t n o t w ith strong force, since individuals w ith two copies of the allele can be w ithout clinical sym ptom s, and few individuals among the ancestors have a 'priori high probability of carrying two genes identical by descent. Thom pson (1983) shows th a t p a tte rn s of joint descent, [ 27 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 jointly between current carrier p aren ts of affected individuals and jointly between altern ativ e founder carriers, are im p o rta n t in a q u a n tita tiv e assessm ent of the possible hypotheses. I f the above is an oversimplified approxim ation, w h at is th e full solution? Suppose th a t, for a given com bination of ty p es of original founder genes, one could com pute the pro b ab ility of th e d a ta observed on cu rren t individuals of all types, under a given genetic model, perhaps involving inform ation a b o u t varying T a bl e 5. P art of t he a n c es t r a l l ik e l ih oo d for t he p e d i g r e e of f ig u r e 1 UNDER THE DATA OF TABLE 1 (Carriers are known to carry one ax gene, the other current individuals none. Founders 3 and 17 are here taken as the most likely combination a2a2, and the marginal likelihood for the other two founder couples is tabulated, the full function being given by symmetry between the two members of each couple. Figures in brackets give the likelihood when no ancestor can have carried two a1 genes. The numbers each divided by 215 give the exact probability of data under the ancestral combination.) 248 Elizabeth A. Thompson couple (1, 2) x jUj axax x axa2 couple (6, 7) x a2a2 cl̂ cl2 x cl-̂cl2 axa2 x a2a2 a2a2 x a2a2 a1a1 x a1a1 0 0 0 0 0 0 axax x axa2 0 60 160 250 560 1200 X Cb2Cb2 0 192 384 480 768 1152 axa2 x axa2 0 300 480 720 (240) 1140 (475) 2016 (560) axa2 x a2a2 0 784 896 1330 (665) 1260 (1260) 1232 (1232) a2a2x 0 1920 1536 2688 (896) 1408 (1408) 0 (0 ) viability of ty p es of ancestors. This would th en be a likelihood for th a t com bination of founder types. In principle this can be done, using th e m ethod of Cannings et al. (1978), th e basis of which is th e following. Define a cutset of individuals to be a set who to g eth er divide th e genealogy. F o r present purposes it will be sufficient to consider cutsets dividing a cu rren t set of individuals from a set of ancestors including all the founders of the genealogy; for exam ple, individuals {12, 13, 18, 10, 11} in figure 1. I f the types of th e genes carried by such a set of individuals are specified, genetic events in sets of individuals on different sides of the cutset are statistically independent. D a ta on individual 4, for exam ple, convey no inform ation a b o u t th e tra its of 12 and 15, and vice versa. We th en consider probabilities of d a ta below a given cutset, conditional on each possible com bination of types of genes in individuals in th e cutset, and work sequentially back through the genealogy from one cu tset to the next, incorporating parent-offspring segre­ gation probabilities and any oth er inform ation on tra its of individuals of types of ancestral genes. F inally we o btain the probability of all d a ta observed on the genealogy given each possible (ordered) com bination of founder gene types, or sim ultaneously the likelihoods of every possible founder com bination. In table 5 is shown p a r t of the ancestral likelihood function for the exam ple genealogy. N ote th a t couple (6, 7) are indeed the more likely ancestral ax carriers, it being m ost likely th a t both members of the couple are so. N ote also the ordering of th e likelihoods, which m ay be unexpected a priori. The different orderings [ 28 1 D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 between rows and between columns show the necessity for joint inferences on the two couples, even in this small exam ple. The effect of excluding possible cqaq ancestors is here to reduce th e likelihood of x axa2 founder couples; not surprisingly, since the o th er ancestors of the current population consist m ainly of offspring of these couples. In a larger genealogy th e decreased likelihood of any such ancestral couples can have varied effects on inferences a b o u t original founders. In a large complex genealogy, th is approach m ay n o t be com putationally feasible. A t each stage all possible com binations of types of genes for all members of the current cu tset m ust be considered. A lthough it is sometimes possible to work sequentially through large genealogies of isolated populations w ith cutsets of no more th a n 9 or 10 individuals, this is n o t always feasible and determ ining the optim al cutset sequence is in general an unsolved problem. F u rth er, the num ber of founders m ay be prohibitive. In m any cases it will be necessary to consider jointly only some subset of the founders, under some (probabilistic) assum ptions ab o u t the types of the rem ainder. One population for which this can be done is the small isolated population of T ristan da Cunha, where eleven early founders contribute 80 % of current genes. Here Thom pson (1978) has shown th a t inferences can be made ab o u t the jo in t types of genes in founders living before 1827. The m ultiple complex p a th s of relationship increase com putational difficulty, b u t provide the inform ation required. Such inferences are lim ited to simple genetic tra its. Nonetheless, the power to make inferences ab o u t th e types of genes seven generations ago indicates th a t, conversely, these types can affect current tra it distributions. This exam ple of the T ristan da Cunha population is considered fu rth er below. Gene extinction and allelic origins 249 5. G e n e s u r v i v a l a n d g e n e e x t i n c t i o n This analysis of ancestry in term s of jo in t likelihoods on sets of founders leads to an altern ativ e characterization of genealogical stru ctu re. F o r the essence of evolution in a population is gene survival: the num ber and v ariety of distinct surviving genes. So instead of ancestry let us consider gene survival, or equivalently extinction. J u s t as descent probabilities have in terp retatio n as approxim ate ancestral likelihoods, so th e complete ancestral likelihoods provide extinction probabilities. Consider a tr a it for which there are ju st two alleles oq and and a specified com bination of alleles am ong original founders. Then the probability of extinction of (at least) those founder genes labelled oq is th e probability th a t, given the ancestral com bination, the population now consists entirely of individuals w ith two a2 genes. B u t this is also th e ancestral likelihood of the p articu lar com bination of ancestral oq and a2 genes, given this cu rren t population. W orking backw ards from a current population in which all individuals are assumed to carry two a2 genes, we can therefore com pute sim ultaneously the extinction probabilities of all com binations of founder genes. Again a joint analysis is im p o rtan t. Survival of th e genes of a given founder over a specified genealogy decreases the survival probabilities of genes in other founder individuals who share descendants w ith the first. P articularly, therefore, [ 29 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 survival decreases survival of spouse genes, and, indeed, survival of one gene in a founder decreases the survival p ro b ab ility of th e other. Sim ilarly extinction of some genes decreases extinction probabilities of o th e rs : some genes m ust survive in an e x ta n t population. The six individuals of figure 1 carry a t least three and a t m ost nine distin ct genes, although there are six founders to the genealogy. N o t all four genes of either original couple can be ex tin ct, nor b o th those of 17. A lthough there is little interaction betw een th e tw o couples, since th e population consists m ainly of th eir grandchildren, survival of a gene of 3 decreases the p ro b ab ility of survival of all four genes of (1, 2) from to ŝ. Survival of b o th genes of 7 decreases th e survival p ro b ab ility of both genes of 6 from to A- The e x te n t to which survival or extinction of certain subsets of founder genes precludes survival or extinction of o th er disjoint subsets provides a m easure of th e stru c tu re of th e genealogy w ith respect to th e lim ited p a th s for descent of genes. How m any genes do survive in a small isolated p o pulation ? Questions a b o u t th e exact num bers of genes are n o t precisely th e same as those a b o u t th e fate of (at least) a certain labelled set of genes. However, answers to th e la tte r, which are provided by th e ancestral likelihoods, can be transform ed to provide th e required p robability distributions. To tu rn finally to a real exam ple again, th e eleven early founders of th e T ristan de C unha population provided 22 p o ten tial genes, b u t n o t all can be present now. Thom as & Thom pson (1983) have shown th a t w ith p robability 0.994 betw een 10 an d 18 genes survive, these being m ade up of betw een 4 and 6 of th e six genes in the three founder females an d of betw een 6 and 13 of th e sixteen genes in the eight founder males. A lthough interactions are generally small in this expanding population, survival of some founder genes does reduce survival probabilities for others; note th a t 6 (female genes) + 13 (male genes) > 18 (total genes). Such analyses emphasize ju s t how rap id th e loss of variab ility can be in a small isolated population, and ju st how crucial certain segregations can be in determ ining the cu rren t genetic constitution. 250 Elizabeth A. Thompson R e f e r e n c e s Cannings, C., Thompson, E. A. & Skolnick, M. H. 1978 Probability functions on complex pedigrees. Adv. appl. Prob. 10, 26-61. Karigl, G. 1981 A recursive algorithm for the calculation of identity coefficients. Ann. hum. Genet. 45, 299-305. Kidd, J. R., Wolf, B., Hsia, Y. E. & Kidd, K. K. 1980 Genetics of propionic acidemia in a Mennonite-Amish kindred. Am. J. hum. Genet. 32, 236-245. Thomas, A. & Thompson, E. A. 1983 The number of genes on Tristan da Cunha. (Submitted.) Thompson, E. A. 1974 Gene identities and multiple relationships. Biometrika 30, 667-680. Thompson, E. A. 1976 A restriction on the space of genetic relationships. Ann. hum. Genet. 40, 201-204. Thompson, E. A. 1978 Ancestral inference. II. The founders of Tristan da Cunha. Ann. hum. Genet. 42, 239-253. Thompson, E. A. 1983 A recursive algorithm for inferring gene origins. Ann. hum. Genet. 47, 143-152. Discussion A. W . F . E d w a r d s ( CaiusCollege, Cambridge University, U.K.). How does one prove th a t every dyadic ratio for a kinship coefficient corresponds to some genealogical relationship ? [ 30 ] D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1 E lizab eth A. T h o m p s o n. The form of equations (4) and (5) shows th a t this m ust be so, and the fact has been known for a very long tim e. However, th e nicest proof I know is a constructive dem onstration given only recently by Dr G. K arigl. Expressing the required kinship coefficient as a binary expansion, the sequence of zeros and ones can be used to define an explicit sequence of outbreeding and backcrossing which produces the required result. A lthough such a genealogy is unlikely in hum an populations, this n eatly proves th e theoretical result. A t the meeting, Professor Felsenstein, Professor Hill, Professor Bodm er and Professor K ingm an raised questions of com plexity of genealogies, inaccuracies in genealogies, complex genetic models and the approxim ations it is then necessary to introduce into com putations. In principle, the m ethods of obtaining ancestral likelihoods apply to arb itra rily complex genetic models on a rb itrarily complex genealogies. In practice, there are of course com putational lim itations, although really quite complex situations can be considered. In my paper I have covered w hat m ight be referred to as ‘ the theory of exact com putations on genealogies ’. The n ex t stage, which requires both theoretical and practical work, is a stu d y of approxim ate com putations. By how much are results altered by om itting ap p aren tly uninform ative sections of genealogy ? How dependent are results on certain critical links in a genealogy, and how can we determ ine which th ey are? W h at is the expected gain in using linked loci to increase th e power to make inferences ? How much is lost by having only phenotypic ra th e r th a n genotypic d a ta? Although some work has been done in this area, these rem ain im p o rtan t open questions. Gene extinction and allelic origins 251 I O [ 31 ] Vol. 2 1 9 . B D ow nl oa de d fr om h tt ps :/ /r oy al so ci et yp ub li sh in g. or g/ o n 05 A pr il 2 02 1