THE MATHEMATICAL THEORY OF 
 PROBABILITIES
 
 ■y^y^- 
 
 THE MACMILLAN COMPANY 
 
 NEW YORK BOSTON CHICAGO 
 
 DALLAS SAN FRANCISCO 
 
 MACMILLAN & CO.. Limited 
 
 LONDON BOMBAY CALCUTTA 
 MELBOURNE 
 
 THE MACMILLAN CO. OF CANADA. Ltd. 
 
 TORONTO
 
 THE MATHEMATICAL THEORY 
 
 OF 
 
 PROBABILITIES 
 
 AND ITS Al'PLKWTIOX TO 
 
 FREQUENCY CURVES AND STATISTICAL 
 
 METHODS 
 
 ARNE FISHER. 
 
 TRANSLATED FROM THE DANISH 
 BY 
 
 rilARLOTTE DICKSON, B.A. 
 
 (COLUMBIA) 
 
 MATHEMATieAL ASSISTANT IN THE DEPARTMENT OF DEVELOPMENT 
 
 AND HK8EAKCH OF 
 
 THE AMERICAN TELEPHONE AND TELEGRAPH COMPANY 
 
 AND 
 
 \villia:\i bonynge, b.a. 
 
 (BELFAST) 
 
 WITH INTRODUCTORY NOTES 
 
 BY 
 
 M. C. RORTY 
 
 AND 
 
 F. W. FRANKLAND, F.I.A., F.A.S, F.S.S. 
 
 VOLUME I 
 
 Mathematical Probabilities, P'requency Curves, Homogradk and 
 Heterograde Statistics 
 
 SECOND EDITION GREATLY ENLARGED 
 
 NEW YORK 
 
 THE MACMILLAN COMPANY 
 
 1922
 
 Copyright, 1915 and 1922, 
 By ARNE fisher 
 
 Set up and electrotj-ped. Published November, 1915. 
 Second Edition, greatly enlarged, May, 1922. 
 
 r*IN1KI> IN THE IT.SlTEll STATES Of AMEBICA
 
 y^^ UNIVERSITY OF CALIFORNIA 
 
 ^ 7_3 SANTA BARBARA COLLEGE LIBRARY 
 
 /9J^^ 69545 
 
 V, f 
 INTRODUCTORY NOTE TO THE SECOND EDITION. 
 
 Mr. Fisher has requested that an introduction be written to 
 this, the second edition of his work on probabilities, which shall 
 indicate some of the practical applications of the mathematical 
 theory with which his treatise deals. 
 
 The writer has only a limited knowledge of mathematical 
 technique — yet it has so happened that in twenty-five years of 
 active work as engineer, statistician and executive he has had 
 frequent occasion to call upon the skill of trained mathematicians 
 for the solution of practical problems involving frequency curves 
 and pi-obal)ilities. Among such mathematicians none has been 
 more helpful, or quicker to perceive the possibility of making 
 valuable applications of higher mathematics to business problems, 
 than ]\Ir. Fisher himself. For this reason it is a duty as well as a 
 privilege to outline, at his request, certain actual practical expe- 
 riences with mathematical applications and to indicate such 
 possible applications for the future. 
 
 The writer's initial experience with frequency curves and 
 probabilities was in the j-ears 1902 and 1903, when it became 
 evident, in analyzing various problems in telephone traffic, that 
 certain peak loads, which were superimposed upon the normal 
 seasonal, weekly, and daily fluctuations, could be accounted for 
 only by the laws of chance. Recourse was, therefore, had to the 
 formulie then available for approximate summations of the terms 
 of the binomial expansion, and from these a series of curves was 
 drawn which indicated for an}' given normal hourly traffic (as 
 indicated by studies of seasonal, weekly, and daily variations) the 
 probability that any given short period load would be equalled or 
 exceeded. Practical experience with these curves soon showed that, 
 in spite of minor errors, they were close enough to the real facts 
 to make them of primary importance in traffic studies of all kinds, 
 and particularly in the development of mechanical switching de- 
 vices. Their use for such purposes has now become a conmion- 
 place in telephone engineering. 
 
 As a by-product of the preceding application there have been 
 other interesting uses of the same probability curves. Effective 
 studies have been made of the decrease in the total stocks of small 
 machine parts that could be made possible by standardizing and
 
 VI INTRODUCTORY NOTE TO THE SECOND EDITION. 
 
 reducing the number of types of screws, bolts, nuts, etc. The 
 curves can also be applicnl directly to every line of business and 
 every type of operation where prompt service must be given and 
 where the demand arises from a large number of independent 
 sources, and is, therefore, subject to p(^ak loads determined by the 
 laws of chance, which may be superimposed upon other "normal" 
 peak loads varying with the days of the week, the hours of the 
 day, etc. 
 
 Entirely separate applications of frequency curves are those 
 necessary in actuarial work. T'hese are relatively well known. 
 But it is less generally known that one of the most important of 
 business problems, that of depreciation, can be treated (effectively 
 only when approached on an actuarial basis with a full under- 
 standing of the fi-equency curves which govern the displacement, 
 year b}^ year, of the physical units involved. 
 
 A still further use of frequency curves and the theory of probabil- 
 ities, which is of immediate practical importance, is in connection 
 with sampling operations. The theory of sampling has already 
 been well developed, but adequate efforts have not yet been made 
 by mathematicians to reduce the processes of sampling to de- 
 pendable simple rules that can be applied by business executives 
 and statisticians untrained in higher mathematics. In census 
 work, and in statistical and other reports made by business or- 
 ganizations, the waste of money, that could be avoided by an 
 intelligent application of the theory of sampling, is very great. 
 Not only can many reports and analyses be made much more 
 cheaply and quickly by sampling processes, but they can also be 
 made more accurately. Many important items of information can 
 be determined only by trained specialists. In such cases the only 
 procedure, that does not involve prohibitive expense in large census 
 operations, is to tie such items, by a sampling process, to other 
 items which ai-e susceptible of exact enumeration by relatively 
 unskilled enumerators, and then to compute the totals for the 
 special items from the relations of such items to the items which 
 are completely enumerated. 
 
 All of the preceding are in the field of inmiediate practicalities. 
 When we come to the future, one of the most promising uses of 
 mathematics is in the d(>velopment of logical processes. It is not 
 going too far to say that all business, and most engineering opera- 
 tions are fundamentally based on probabilities. The business man 
 is always dealing in degrees of uncertainty, and even the engineer
 
 INTRODUCTORY NOTE TO THE SECOND EDITION. VU 
 
 has only occasionally a definite set of conditions ujion which to 
 base his computations. Where the problem is primarily- a financial 
 one, he must balance the cost of overbuilding agamst the cost of 
 underbuilding; and, if he combines business judgment with en- 
 gineering skill, he will multiply the amount of each possible loss 
 by the probability of its occurring, and will ordinarily choose, 
 among all possible plans, the plan which involves the minimum 
 probable loss. Here it is not inappropriate to interject the idea 
 that the most practical logic must always be in terms of probabil- 
 ities, and that a logic which deals, or pretends to deal, in certainties 
 only is not alone useless, but is also harmful and misleading, when 
 difficult problems are to be approached. Such problems can 
 rarely, if ever, be solved except through the cumulation toward a 
 certainty of many small probabilities established from un corre- 
 lated, or only partialh-^ correlated, viewpoints. 
 
 A final suggestion which is to-day speculative, but may assume 
 important practical aspects in the near future, is with respect to 
 the applications of frequency curves and probabilities to phj-sical 
 and cosmic mathematics. In such mathematics we are forced to 
 assume that all of our measures must arise out of the things meas- 
 ured. When we deal with physical velocities, it would seem that 
 our only measures of velocity can arise out of the velocities them- 
 selves. Similar considerations hold true with respect to funda- 
 mental measures of physical extension. Under these circum- 
 stances we may talk in terms of infinite space and of infinite time, 
 but we can hardly talk in terms of infinites when we are dealing 
 with the dimensions of atomic structure and the velocities of 
 material particles. In these cases it seems very highly probable 
 that we are dealing with frequency distributions which we must 
 measure and define in terms derived from such distributions them- 
 selves. With respect to such measures some of our frequency 
 curves may have hifinite "tails," but it is more proljable that the 
 frequency forms are such that they can be completely defined in 
 finite terms. Along this same line, we may even risk a closing 
 speculation that the relative proportions of organized matter and 
 space in the stellar universe are determined through the opera- 
 tions of the laws of chance in establishing heterogeneities in what 
 is otherwise a homogeneous void-filling medium. 
 
 M. C. RORTY. 
 
 New York City, 
 March 2, 1922.
 
 PREFACE TO THE SECOND EDITION. 
 
 At the time when the first edition of this little book was published 
 in 1916, I ex})ected to issue a second volume shortly after, dealing 
 with fre(|uency curves and frequency surfaces as well as the re- 
 lated problem of co-variation (correlation). The manuscript for 
 this volume was completed and i^rinting had already connnenced 
 on some of the; chapters, when a series of misfortunes, not neces- 
 sarily unexpected, oven-took the work. A major part of the manu- 
 script while in transit to a friend in Denmark for review and cor- 
 rections went down with a Danish vessel when torpedoed by an 
 outlaw German submarine. A duplicate copy was for some i-eason 
 or other withhc^ld by the British rnilitarj^ censor and not i-eturned 
 to the writer until lonp; after the termination of the world war. 
 My third and final copy of the manuscript, which I had submitted 
 to an American friend for critical review was also lost in transit. 
 The veritable nemesis which seems to have followed my efforts is, 
 however, only a verification of the all prevailing laws of chance, 
 which ever}^ serious minded student must face with unperturbed 
 attitude. In fact, the al)ove misfortunes have, after all, only 
 made me more determined to complete another collection of notes, 
 which I eventually hope to put into proper shape for publication. 
 
 In the meantime the first edition has been out of print for more 
 than two years, and when the publisher asked me to prepare a 
 new edition I took advantage of this opportunity to add several 
 chapters on frequency functions and their application to het- 
 erograde statistical series so as to give a complete treatment of 
 statistical functions involving one variable. The book is, there- 
 fore, twice its original size and contains the major part of what I 
 originally intended for a second volume. 
 
 The reader will readily notice that my treatment of the subject 
 is based throughout upon the principles of the classical prol^ability 
 theory as founded by Bernoulli, De JMoivre and above all by the 
 great Laplace and his disciple, Poisson, I am of the opinion that 
 these principles and their further extension by the Scandinavian 
 statisticians and actuaries. Gram, Thiele, Westergaard, Charlier, 
 Wicksell and Jorgensen, offer as yet the best and also the most 
 powerful tools for the treatment of collected statistical data by 
 means of mathematical methods. In the way of admnbration and
 
 X PREFACE TO THE SECOXI) KDITIOX. 
 
 economy of thouiilit the Laplaccan methods stand unsurpassed 
 in the whole reahn of math(Miiatical statistics. I have, therefore, 
 in this volume limited nn' investi<>;ations to a systematic treat- 
 ment along these lines. I hope, however, in the forthcoming 
 second volunu^ to treat the methods of Pearson, Edgeworth, 
 Kapteyn, Bachelier and Knibbs and show tlunr inflation to La- 
 place's theory. 
 
 Th(» reason why the Laplacean doctrine^ of freciuency curves lias 
 l)een ignored until comixiratively recent years and has remained 
 more or h^ss obscure is })erhaps tlue to the fact that for moi-e than 
 a cent my it remained a theory Jiure and simple and was used l)ut 
 sparingly in pi-actical calculations. 
 
 Any statistical theory, in order to be of use in practical work, 
 must be arranged in such a manner that it is readily adaptable to 
 numerical compulations. Advanced mathematical computation 
 has not been given its due reward and proper att(^ntion in our 
 ordinaiy academic instruction. A high grade mathematical 
 computer is indeed a ''rare bird," nmch more so in fact than a 
 good mathematician. To arrange and plan the numerical work 
 in connection with the theoretical fonnulae so that the detailed and 
 painstaking work is reduced to a mininmm, and at the same time 
 afford the pi'oper means for checking and counterclu^cking, is by 
 no means an easy task and often requires as much ingenuity as 
 the actual development of the theoretical fornmhip. While Gauss 
 has always been acknowledged as one of the world's gr(^at(»st com- 
 puters and in addition to his extensive work in pure mathematics 
 also did much practical work in surveying, physics, and in financial 
 and actuarial investigations, Laplace during his entire career 
 remained a pure mathematician and apparently failed to grasp the 
 paramount attributes required by a successful (computer. His 
 attempt to inject himself into pul)lic life, as for instance when he 
 secured for himself an a])pointment as minister of the interior, 
 must be regarded as a dismal failure as admitttnl in Najjoleon's 
 memorandum on his dismissal. 
 
 The failure of Laplace to recognize fully the all-important phase 
 of numerical computations in all observations on statistical mass 
 phenomena is in my opinion the main reason why the Gaussian 
 theoi\- of observations and the allied subject on the theory of least 
 squai-es has hitherto supj^lanted the admittedly superior theory 
 of the great Frenchman. Gauss in addition to his theory furnished 
 an essentially useful and elegant method for performing the nec(>s-
 
 PREFACE TO THE SECOND EDITION. XI 
 
 sary nuinericul (;ul(;ulat,ions, while Laplace left this decidedl}^ 
 important aspect out of consideration altogether. It remained in 
 reality to C'hai-iier to fiunish the Laplacean doctrine with a prac- 
 tical method for comi)uting the various statistical parameters. 
 And in the meantime the Gaussian methods reigned supreme 
 while Laiila('(^'s gi-eat woi'k was neglected. 
 
 The car(>ful rc^ader will readily notice that in the treatment of 
 frequency cnives I have allowed the semi-invariants, originally 
 introduc(xl in the theory of statistics by Thiele, to occupy a central 
 position. In my opinion the semi-invariants represent a more 
 powerful tool than the method of moments. I have also tried to 
 rescue from oblivion the important and original memoir by the 
 Danish actuary, Gram, and give to him and the French math- 
 ematician, Ilermite, their due recognition as the earliest investi- 
 gators of skew frequency distributions. Gram was perhaps the 
 first investigator to make proper use of the orthogonal functional 
 properties of the Lai)lacean normal frequency curve and its deriva- 
 tives. By means of an application of the orthogonal properties of 
 the Hermite polynomials and their close relation to the theory of 
 integral equations, tlu^ \\\\o\v theory of frequency distribution 
 can be presented in a d(>('idedly compact form; and I deem no 
 apology necessary for having introduced in my treatment of 
 frequency curves some of tlu^ more elementary theorems of integral 
 equations, that youngest branch of higher analysis, which at 
 present occupies a central position in advanced mathematics. 
 
 The most recent investigations along those lin(\s have been 
 made by the Swedish astronomer, Charlier, and his disciples, 
 Jorgensen and Wicksell. Unfortunately these investigations have 
 hitherto not received adequate and systematic treatment in Eng- 
 lish and American texts on statistics, and it is my hope that 
 the following pages may be of service in opening the eyes of 
 English speaking statisticians to the practical utility of these 
 methods. 
 
 The examples have all been selected so as to give a complete and 
 detailed illustration of the application of the theory to essentially 
 practical problems. I hav(>, on the other hand, purposely refrained 
 from giving the customary exercises, so-called, usually found in 
 statistical texts, especially those in German and English. 
 
 Although I have been a close student of and have read most of 
 the published statistical text-books in about seven languages for 
 the last ten j^ears, I regret to state that I have found little or no
 
 XU PREFACE TO THE SECOND EDITION. 
 
 practical \a\uv in such trick exei'cises, which as a rule have but 
 slight relation to problems occurring in daily life. 
 
 Since lh(^ appearance of the first edition of this book in 1916 
 a nunil)er of exc(>llent statistical texts have been issued. Among 
 these I may mention a new edition of Yule's well-known ele- 
 mentary text, a greatl}' enlarged (xlition of Bowle3''s FAements of 
 Statistics, the new treatise by Caradog Jones, an enlarged German 
 translation of Charlier's Grunddragen, a very lucid Swedish text by 
 Wicksell, the scholarly and broadly planned Statistikens Teori i 
 Grundrids (in Danish) by Westergaard, and last but not least, the 
 thesis b}' Jorgensen, Frekvensflader og Korrelaiion} 
 
 Although an extended residence in the United States has per- 
 haps imi^roved my barbaric Dano-English, I fear that I must still 
 apologize to the reader for my shortcomings in rhetoric and gram- 
 mar. Most of the serious defects have, I hope, been overcome by 
 the diligent efforts of my co-editor and translator. Miss C. Dickson, 
 mathematical assistant in the department of Development and 
 Research of the American Telephone and Telegraph Company, 
 jNIiss Dickson's work has indeed been much beyond that of mere 
 translation. Her knowledge of the mathematical theory of prob- 
 abilities has enabled her to suggest to me several improvements in 
 my Danish notes. 
 
 I am also under great obHgations to a number of friends and 
 colleagues who have assisted me in the preparation of this volume. 
 I am especially indebted to Mr. E. C Molina, the well-known 
 jirobability expert of the American Telephone and Telegraph 
 Company. Mr. Molina's extensive knowledge of the works 
 of the old French masters, especially of those of Laplace, 
 has been of the greatest value to me, and I can truthfully 
 say that I have nowhere met a mathematician so thoroughly 
 acquainted with the intricacies of the Theorie Anahjtiquc as 
 Mr. Molina. 
 
 My thanks are also due to Mr. F. L. Hoffman, the Statistician 
 of the Prudential Insurance Compan}^, for the interest he took in 
 ni>- work along those lines while I was employed as a computer in 
 his department. To Messrs. M. C. Rorty and D. R. Belcher of the 
 Ameiican Telephone and Telegraph Company, I beg leave to 
 
 ' As a pure probability text we may mention G. Castelnuovo's, Calculo delle 
 Prohabilita (Milano, 1919), as an exceptionally lucid and rifz;oi-ou3 treatise. 
 The recently issued Treatise on Prohnhilily by J. M. Keynes is briefly discussed 
 in paragraph 138 of this book. A. F.
 
 PREFA{ E TO THE SECOND EDITION. XIU 
 
 express my best thanks for th(ur kind advice and encouragement 
 in the preparation of this volume. 
 
 It is indeed impossible to adequately express in a mere formal 
 preface my ol)ligations to Mr. Ilorty in this matter. His introduc- 
 tory note I regard as one of the highest rewards I have received in 
 this field of endeavor where one must usually be content with the 
 appreciation of one's peers. In this connection it is of interest to 
 note that Mr. Rorty is the pioneer investigator in the application 
 of the mathematical theory of probabilities to telephone engineer- 
 ing, which has been further developed in recent years by Molina of 
 America, Erlang and Johannsen of Denmark, Holm of Sweden, 
 Odell and Clrinsted of Great Britain. The pioneer work by Mr. 
 Rorty in this eminently practical fi(>ld antedates the earliest work 
 by L^rlang in Tidsskrift for Matemalik by nearly five years. 
 
 Last, but not least, I wish to convey my sincerest thanks to my 
 Scandinavian compatriots, Westergaard, Charlier, Jorgensen, 
 Wicksell and Guldberg from whose works I have drawn so freely. 
 To these gentlemen and to the works of the late Messrs. Gram and 
 Thiele of Copenhagen I really owe anything of value which may 
 be contained in this work. 
 
 Arne Fisher. 
 New York, 
 
 April, 1922.
 
 INTRODUCTORY NOTE TO THE FIRST EDITIOX. 
 
 I feel it a great honor to have been asked by my friend and 
 colleague, Mr. Arne Fisher, of the Equitable Life Assurance 
 Society of the United States, to write an introductory note to 
 what appears to me the finest book as yet compiled in the English 
 language on the subject of which it treats. As an Examiner 
 myself in Statistical ^Method for a British Colonial Government, 
 it has been to me a heart-breaking experience, when implored by 
 intending candidates for examination to recommend a text-book 
 dealing with ]\Ir. Fisher's subject matter, that it has heretofore 
 been impossible for me to recommend one in the English language 
 which covers the whole of the ground. Until comparatively 
 recent years the case was even worse. While in P^rench, in Italian, 
 in German, in Danish, and in Dutch, scientific works on statistics 
 were available galore, the dearth of such hterature in the English 
 language was little short of a national or racial scandal. With 
 such works as those of Yule and Bowley, in recent years, there 
 has been some possibility for the English-speaking student to 
 acquire part of the knowledge needed. But it is harflly necessary 
 to point out what a very large amount of new ground is covered 
 by Mr. Fisher's new book as compared with such works as I have 
 referred to. 
 
 Despite my professional connection with statistical and actu- 
 arial work of a technical character my own personal interest in 
 ]\Ir. Fisher's book is concentrated principalh' on the metaphysical 
 basis of the Probability-theory, and it is with regard to this 
 aspect of the subject alone that I feel qualified to comment on his 
 achievement. With all the controversy that has gone on through 
 many decades among metaphysicians and among writers on logic 
 interested especially in the bases of the theories of probability and 
 induction, between the pure empiricists of the type of J. S, ]\IilI 
 and John Venn (at all events in the earliest edition of his work) 
 on the one hand, and the (partly) a priori theorists who base their 
 doctrine on the foundation of Laplace on the other hand, it has
 
 XVi IXTRODUCTORY XOTE TO THE FIRST EDITION. 
 
 been a source of intense satisfaction to me, as in the main a dis- 
 ciple of the latter group of theorists, to note the masterly way in 
 which Mr. Arne Fisher disentangles the issues which arise in the 
 keen and sometimes almost embittered controversy between these 
 two schools of thought. It has always seemed to the present 
 writer as if the very foundations of Epistcmology were involved 
 in this controversy. The impossibility of deriving the corpus of 
 human knowledge exclusively from emj)irical data by any logic- 
 ally valid process — an impossibility which led Immanuel Kant 
 to the creation of his epoch-making ])hilosophical system — is 
 hardly anywhere made more evident than in what seems to the 
 present writer the unsuccessful effort of thinkers like John Venn 
 to derive from such purely empirical data the entire Theory of 
 Probability, The logical fallacy of the process is analogous to 
 that perpetrated by John Stuart ^Nlill in endeavoring to base the 
 Law of Causality on what he termed an " induct io per simpliceni 
 enumerationemy Probably there is nowhere a more trenchant 
 and conclusive exposure of the unsoundness of this point of view, 
 than in the Right Honorable Arthur James Balfour's monu- 
 mental work "A Defense of Philosophic Doubt." It is there- 
 fore satisfactory to find that IVIr. Fisher emphasizes, quite at the 
 beginning of his treatise, that an a priori foundation for " Proba- 
 bility " judgments is indispensable. 
 
 Hardly less gratifying, from the metaphysical point of view, 
 is Mr, Fisher's treatment of the celebrated quaestio vexata of 
 Inverse Probabilities and his qualified vindication of Bayes' 
 Rule against its modern detractors. 
 
 Aside altogether from metaphysics, it is particularly satis- 
 factory to note the full and clear way in which the author treats 
 the Lexian Theory of Dispersion and of the "Stability" of sta- 
 tistical series and the extension of this theory by recent Scandi- 
 navian and Russian investigators, — a branch of the science which 
 has till the ajjpca ranee of this new work not been adequately 
 covered in English text-books. 
 
 It may of course be a moot question whether the preference 
 given by our author to Charlier's method of treating " Frequency 
 Curves" over the method of Professor Karl Pearson is well 
 advised. But whatever the experts' verdict may be on debatable
 
 INTRODUCTORY NOTE TO THE FIRST EDITION. XVll 
 
 questions like these, the scientific world is to be congratulated on 
 INIr. Fisher's presentment of a new and sound point of view, and 
 he emphatically is to be congratulated on the production of a 
 text-book which for many years to come will be invaluable both 
 to students and to his confreres who are engaged in extending 
 the boundaries of this fascinating science. 
 
 F. W. Frankland, 
 
 Member of the Actuarial Society of America, 
 Felloiv of the Institute of Actuaries of Great 
 Britain and Ireland, and Felloiv of the 
 Royal Statistical Society of London. 
 New York, 
 October 1, 1915.
 
 PREFACr: TO THE FIRST EDITION. 
 
 " Probal)ility " lias lono- ago ceased to be a mere theory of games 
 of cliance and is everywhere, esi)ecially on tlie continent, regarded 
 as one of the most important branches of api)Hed mathematics. 
 This is proven by the increasing number of standard text-books in 
 French, German, Itahan, Scan(Hnavian and Russian which have 
 appeared during the hist ten years. 1 )uring this time the research 
 work in the tlieory of probabihties lias receive<l a new impetus 
 through the hibors of the Enghsh biometricians under the leader- 
 ship of Pearson, the Scandinavian statisticians Westergaard, 
 Charlier and Kiji^r, the German statistical school under Lexis, and 
 the brilliant investigations of the Russian school of statisticians. 
 
 Each group of these investigations seems, however, to have 
 moved along its own particular lines. The English schools have 
 mostly limited their investigations to the field of biology as pub- 
 lished in the extensi^'e memoirs in the highly specialized journal, 
 Biometrika. The Scandinavian scholars have produced researches 
 of a more general character, but most of these researches are un- 
 fortunately contained in Scandinavian scientific journals and are 
 for this reason out of reach to the great majority of readers who 
 are not familiar with any of the allied Scandinavian languages. 
 This applies in a still greater degree to the Russians. German 
 scholars of the Lexis school have also contributed important 
 memoirs, but strangely enough their researches are little known 
 in this country or in England, a fact which is emphasized through 
 the belated English discussion on the theory of dispersion as devel- 
 oped by Lexis and his (lisci])les. The same can also be said with 
 regard to the Italian statisticians. 
 
 In the present work I have attempted to treat all these modern 
 researches from a common point of view, based upon the mathe- 
 matical principles as contained in the immortal work of the great 
 Laplace, "Theorie analyticjue des Probabilites," a work which 
 despite its age remains the most important contribution to the 
 
 xL\
 
 XX PKEFACK TO THE FIRST EDITION. 
 
 theory of probabilities to our present day. Charlier has rightly 
 observed that the modern statistical methods may be based upon 
 a few condensed rules contained in the great work of Laplace. 
 This holds true des])ite the fact that many modern English 
 writers of late have shown a certain distrust, not to say actual 
 hostility, towards the so-called mathematical probabilities as 
 defined by the French savant, and have in their place adopted the 
 purely empirical probability ratios as defined by IMill, Venn and 
 Chrystal. It is quite true that it is possible to build a consistent 
 theory of such ratios, as for an instance is done by the Danish 
 astronomer and actuary, Thiele. The theory, however, then 
 becomes purely a theory of observations in which the theory of 
 probability takes a secondary place. The distrust in the so-called 
 mathematical a priori probabilities of Laplace I believe, however, 
 to be unfounded, and the criticism to which that particular kind 
 of probabilities is subjected by a few of the modern English 
 writers is, I believe, due to a misapprehension of the true nature 
 of the Bernoullian Theorem. This renowned theorem remains 
 to-day the cornerstone of the theory of statistics, and upon it I 
 have based the most important chapters of the present work. 
 Following the beautiful investigations of TschebychefT and 
 Pizetti in their proofs of Bernoulli's Theorem and the closely 
 related theorem of large numbers by Poisson I have adopted the 
 methods of the Swedish astronomer and statistician, Charlier, 
 in the discussion of the Lexian dispersion theory. 
 
 The theory of frequency curves is treated from various points 
 of view. I have first given a short historical introduction to the 
 various investigations of the law of errors. The Gaussian 
 normal curve of error was by the older school of statisticians 
 held to be sufficient to represent all statistical frequencies, and 
 actual observed deviations from the normal curve were attributed 
 to the limited lunnber of observations. Through the original 
 memoirs of Lexis and the investigations of Thiele the fallacy of 
 such a dogmatic belief was finally sliown. The researches of 
 Thiele, and later of Pearson, developed Inter tli(> theory of skew 
 curves of error. As recently as 1905 Charlier finally showed 
 that the whole theory of errors or frequency curves may be 
 brought back to the principles of Laplace. I have treated this
 
 PREFACr: TO THE FIRST EDITION. XXI 
 
 subject by the methods of both PearsoM and Charlier, although I 
 have given tlie methods of t!ie hitter a ])re<h)minant jilace, because 
 of their easy and sim])le ap])Hcation in the])raetieal computations 
 required by statistical work. The mathematical theory of cor- 
 relation, which is tr(^ated in an elementary manner only, is based 
 upon the same ])rinci])les. 
 
 The statistical cxanii)lcs serve as illustrations of the theory, and 
 it will be noted that it is ])ossible to solve all the im])()rtant sta- 
 tistical })r()bh'ms ])resentiiig themselves in daily work on the basis 
 of a theory of mathematical probabilities instead of on a direct 
 theory of statistical metliods. I have here again followed Charlier 
 in dividing all statistical problems into two distinct groups, 
 namely, the homograde and th(> heterograde groups. 
 
 In treating the ])hil()S()i)hical side of the subject I have naturally 
 not gone into much detail. However, I have tried to emphasize 
 the two diametrically o})posite standpoints, namely the principle 
 of what von Kries has called the principle of "cogent reason," 
 and the principle which Boole has aptly termed "the equal 
 distribution of ignorance." These two principles are clearly illus- 
 trated in the case of the so-called inverse probabilities. As far as 
 pure theory is concerned, the theory of "inverse probabilities" 
 is rigorous enough. It is only when making practical applications 
 of the rule of inverse probabilities (the so-called Bayes' Rule) 
 that many writers have made a fatal mistake by tacitly assuming 
 the principle of " insufficient reason " as the only true rule of com- 
 putation. Thisleads to paradoxical results as illustrated by the 
 practical problem from the region of actuarial science in Chapter 
 VI in this book. 
 
 In a work of this character I ha\e naturally made an extended 
 use of the higher mathematical analysis. However, the reader 
 wdio is not versed in these higher methods need not feel alarmed 
 on this account, as the elementary chapters are arranged in such a 
 way that the more difficult paragraphs may be left out. I have 
 in fact divided the treatise into two separate parts. The first 
 part embraces the mathematical probabilities proper and their 
 applications to homograde statistical series. This i)art, I think, 
 constitutes what is usually given as a course in vital statistics in 
 many American colleges, I hardly deem it worth while to give a
 
 XXll PREFACE TO THE FIRST EDITIOX. 
 
 detailed discussion on the collection and arrangement of the sta- 
 tistical data as to various frequency distributions. The mere 
 graphical and serial representation of frequency functions by 
 means of histographs and frequency columns is so sim])le and 
 evident that a detailed description seems superfluous. The fitting 
 of the various curves to analytical formulas and the determination 
 of the various parameters seem to me of much greater impor- 
 tance. The theory of curve fitting which is treated in the second 
 volume is founded upon a more advanced mathematical analysis 
 and is for this reason out of reach to the average /American student 
 who desires to learn only the rudiments of modern statistical 
 methods. Practical statisticians, on the other hand, will derive 
 much benefit from these higher methods. It is a fact generally 
 noted in mathematics that the practical application of a difficult 
 theory is much simpler than that of a more elementary theory. 
 This is amply proven by the appearance of an excellent little 
 Scandinavian brochure by Charlier: "Grunddragen af den mate- 
 matiske Statistikken." ("Rudiments of Mathematical Statis- 
 tics.") I have always attempted to adapt theory to actual 
 practical problems and requirements rather than to give a purely 
 mathematical abstract discussion. In fact it has been my aim 
 to present a theory of probabilities as developed in recent years 
 which would ])rove of value to the practical statistician, the 
 actuary, the biologist, the engineer and the medical man, as 
 well as to the student who studies mathematics for the sake of 
 mathematics alone. 
 
 The nucleus of this work consisted of a number of notes written 
 in Danish on various aspects of the theory of })robabilities, col- 
 lected from a great number of mathematical, philosophical and 
 economic writings in various languages. At the suggestion of 
 my former esteemed chief, Mr. H. W. Robertson, P\A.S., As- 
 sistant Actuary of the Equitable Life Assurance Society of the 
 United States, I was encouraged to collect these fragmentary 
 notes in systematic form. The rendering In l-'nglish was done 
 by myself personally with the assistance of ^Ir. W. Bonynge. 
 With his assistance most of the idiomatic errors due to my 
 barbaric Dano-English have been eliminated. The notes stand, 
 however. In tli(> main as a faithful reproduction of my original
 
 PREFACK TO THE FIRST EDITIOX. XXUl 
 
 English copy. Although the resulting " Duno-English " may- 
 have its great shortcomings as to rlietoric aiul grammar, I hope 
 to have succeeded in exj)ressing what I wanted to say in such 
 a manner that my possible readers may follow me without 
 difficulty. 
 
 I gladly take the opi)()rtunity of expressing my tjianks to a 
 number of friends and colleagues who in various ways have as- 
 sisted me in the preparation of this work. ]My most grateful 
 thanks are due to ]Mr. F. W. Frankland, Mr. II. \V. llobertson 
 and Mr. Wm. Bonynge not only for reading the manuscript and 
 most of the proofs, but also for the friendly help and encourage- 
 ment in the completion of this volume. The introductory note 
 by Mr. Frankland, coming from the pen of a scholar who for the 
 most of a life-time has worked with statistical-mathematical 
 subjects and who has taken a special interest in the i)hilosophical 
 and metaphysical aspects of the probability theory, I regard as 
 one of the strong points of the book. My debts to Messrs. 
 Frankland and Robertson as well as to Dr. W. Strong, Associate 
 Actuary of the Mutual Life Insurance Company, are indeed of 
 such a nature that they cannot be expressed in a formal preface. 
 My thanks are also due to Mr. A. Pettigrew in correcting the 
 first rough draught of the first three chapters at a time when my 
 knowledge of English was most rudimentary, to ]\Ir. ]\I. Dawson, 
 Consulting Actuary, and ^Ir. R. Henderson, Actuary of the Equit- 
 able Life, for reading a few chapters in manuscript and making 
 certain critical suggestions, to Professors C. Grove and W. Fite, of 
 Columbia University, for lumierous technical hints in the working 
 out of various mathematical formulas in Chapter VI, to Miss 
 G. Morse, librarian of the Equitable Library, in the search of 
 certain bibliographical material. Last but not least I wish to 
 express my sincerest thanks to several of my Scandinavian com- 
 patriots for allowing me to quote and use their researches on 
 various statistical subjects. I want in this connection especially 
 to mention Professor Charlier, of Lund, and Professors Wester- 
 gaard and Johannsen, of Copenhagen. 
 
 To The ^lacmillan Company and The New Era Printing Com- 
 pany I beg leave to convey my sincere appreciation of their very 
 courteous and accommodating attitude in the manufacture of
 
 XXIV PREFACE TO THE FIRST EDITION. 
 
 this work. Their spirit has been far from commercial in this — 
 from a pure business standpoint — somewhat doubtful under- 
 taking. 
 
 Arne Fisher. 
 
 New York, 
 October, 1915.
 
 TABLE OF CONTENTS. 
 PART I. 
 
 MATHEMATICAL PROBABILITIES AND IIOMOGRADE 
 
 STATISTICS. 
 
 Chapter I. 
 
 Introduction: General Principles and Philosophical Aspects. 
 
 1. Methods of Attack ^""^^ 
 
 2. Law of C'ausality , 
 
 3. Hyi)othoticaI Ju(lj:;niont.s o 
 
 4. Hj'pothofical Disjunctive .ludgnients 4 
 
 5. General Definition of the Probability of an Event .5 
 
 6. Equally likely Cases g 
 
 7. Objective and Subjective Probabilities o 
 
 Chapter II. 
 
 Historical and Bibliographical Notes. 
 
 S. Pioneer Writers ■,-. 
 
 9. Bernoulli, de Moivre and Bayes j9 
 
 10. Application to Statistical Data jo 
 
 11. Laplace and Modern Writers -.a 
 
 Chapter III. 
 
 The Mathematical Theory of Probabilities. 
 
 12. Definition of Mathematical Probability jy 
 
 13. Example 1 lo 
 
 14. Example 2 20 
 
 15. Example 3 20 
 
 16. Example 5 22 
 
 17. Example G 23 
 
 Chapter IV. 
 
 The Addition arid Multiplication Theorems in Probabilities. 
 
 18. Systematic Treatment by Laplace 26 
 
 19. Definition of Technical Terms ^a 
 
 20. The Theorem of the Compl(>te or Total Probability, or the Proba- 
 
 bility of "Either Or" r,j 
 
 21. Theorem of the Comi)ound Probability or the Probability of "As 
 
 Well As" 28 
 
 22. Poincarc's Proof of the Addition and Multiplication Theorem 30 
 
 23. Relative Probabilities gj 
 
 24. Multiplication Theorem 32 
 
 25. Probabilit}' of Repetitions 23 
 
 1* XXV
 
 XXVI TABLE OF CONTENTS. 
 
 26. Application of the Addition and Multiplication Theorems in Problems 
 
 in Probabilities 35 
 
 27. Example 12 35 
 
 28. Example 13 36 
 
 29. Example U 37 
 
 30. Example 1.5 37 
 
 31. Example 16 38 
 
 32. E.xample 17 39 
 
 33. Example IS. De Moivre's Problem 40 
 
 34. Example 19 42 
 
 35. Example 20. Tchebycheff 's Problem 46 
 
 Chapter V. 
 
 Mathematical Expectation. 
 
 36. Definition, ]Mean Values 49 
 
 37. The Petrograd (St. Petersburg) Problem 51 
 
 38. Various Explanations of the Paradox. The Moral Expectation. ... 51 
 
 Chapter VI. 
 
 Probability a Posteriori. 
 
 39. Bayes's Rule. A Posteriori Pr<)l)abilities 54 
 
 40. Discovery and History of the Rule 55 
 
 41. Bayes's Rule (Case I) 56 
 
 42. Bayes's Rule (Case II) 59 
 
 43. Determination of the Probabilitic^s of Future Events Based upon 
 
 Actual Observations 59 
 
 44. Examples on the Application of Bayes's Rule 61 
 
 45. Criticism of Bayes's Rule 02 
 
 46. Theory versus Practice 64 
 
 47. Probabilities expressed by Integrals 67 
 
 48. Example 24 70 
 
 49. Example 25. Bing's Paradox 72 
 
 50. Conclusion 76 
 
 Chapter VII. 
 The Law of Large Numbers. 
 
 51. A Priori and Empirical Probabilities 82 
 
 52. Extent and Usage of Both Methods 85 
 
 53. Average a Priori Probabilities 87 
 
 54. The Theory of Disjjcrsion 88 
 
 55. Historical Development of the Law of Large Numbers 89 
 
 Chapter VIII. 
 Introductory Formulas from the Infinitesimal Calculus. 
 
 56. Special Integrals 90 
 
 57. Wallis's Expression of tt as an Infinite Product 90 
 
 58. De Moivre — Stirling's Formula 92
 
 TABLE OF CONTENTS. XXvii 
 
 Chapter IX. 
 
 Law of Large Numbers. Mathematical Deduction. 
 
 59. Repeated Trials 96 
 
 60. Most Probable Value 97 
 
 61. Simple Numerical Examples 97 
 
 62. The Most Probable Value in a Series of Repeated Trials 99 
 
 63. Approximate Calculation of the Maximum Term, T,,, 101 
 
 64. Expected or Probable Value 102 
 
 65. Summation Method of Lajjlace. The Mean Error 104 
 
 66. Mean Error of Various Algebraic Expressions 106 
 
 67. Tchebycheff's Theorem 108 
 
 68. The Theorems of Poisson and Bernoulli proved by the Application 
 
 of the Tchebycheffian Criterion 110 
 
 69. Bernoullian Scheme 110 
 
 70. Poisson's Scheme Ill 
 
 71. Relation between Empirical Frequency Ratios and Mathematical 
 
 Probabilities 114 
 
 72. Application of the Tchebycheffian Criterion 115 
 
 Chapter X. 
 
 The Theory of Dispersion and the Criterions of Lexis and Charlier. 
 
 73. Bernoullian, Poisson and Lexis Series 117 
 
 74. The Mean and Dispersion 118 
 
 74a. Mean or Average Deviation 122 
 
 75. The Lexian Ratio and Charlier Coefficient of Disturbancy 124 
 
 Chapter XI. 
 
 Application to Games of Chance and Statistical Problems. 
 
 76. Correlate between Theory and Practice 127 
 
 77. Homograde and Heterograde Series. Technical Terms 128 
 
 78. Computation of the Mean and the Dispersion in Practice 130 
 
 79. Westergaard's Experiments 136 
 
 80. Charlier's Experiments I37 
 
 81. Experiments by Bonynge and Fisher 141 
 
 CHAPTER XII. 
 
 Continuation of the Application of the Theory of ProbdbUities to 
 Homograde Statistical Series. 
 
 82. General Remarks 146 
 
 83. Analogy between Statistical Data and Mathematical ProbabiUties . . 147 
 
 84. Number of Comparison and Proportional Factors 149 
 
 85. Child Births in Sweden 151 
 
 86. Child Births in Denmark 152
 
 XXV (11 TABLE OF CONTENTS. 
 
 87. Danish Marriage Series 153 
 
 88. Stillbirths 154 
 
 89. Coal Mine Fatalities 155 
 
 90. Hcdiieod and Weighted Series in Statistics 157 
 
 91. Secular and Periodical Fluctuations 161 
 
 92. Cancer Statistics 165 
 
 93. Application of the Lexian Dispersion Theory in Actuarial Science. 
 
 Conclusion 167 
 
 PART II. 
 
 FREQUENCY CURVES AND HETEROGRADE 
 STATISTICS 
 
 Ch.vpter XIII. 
 
 The Theory of Errors and Frequency Curves and Its Application 
 to Statistical Series. General Remarks. 
 
 94. General Remarks. The Hypotheses of Elementary Errors 169 
 
 95. Application to Statistical Series. Definitions 173 
 
 96. (yoni[)ound Frequency Curves 176 
 
 97. Early Writers 178 
 
 98. Laplace and Gauss 179 
 
 99. (^uetelet's Studies 181 
 
 100. Opperman, Gram, and Thiele 182 
 
 101. Modern Investigations 184 
 
 Chapter XIV. 
 The Mathematical Theory of Frequency Curves. 
 
 102. Frc(iuency Distributions 188 
 
 103. Parameters Considered as Symmetric Functions 189 
 
 104. Semi-Invariants of Thiele 191 
 
 105. '^l'h(> I'\)uri(>r Integral Ecjuation 194 
 
 106. Frequency Function as the Solut ion of an Integral Equation 195 
 
 107. The Normal or Laplaccan Proba!)ility Function 197 
 
 108. Hermite's Polynomials 199 
 
 109. Orthogonal Functions 200 
 
 110. The Frequency Function Expressed as a Series 202 
 
 111. Derivation of Gram's Series 203 
 
 112. Absolute Frequencies 206 
 
 113. Cocflicients Expres,scd by Semi-Invariants 208 
 
 11 1. ( liaiigc of Origin and Unit 210
 
 TABLE OF CONTENTS. Xxix 
 
 PART in. 
 PRACTICAL APPLICATIONS OF THE THEORY. 
 
 Ciiaptp:k XV. 
 The Numerical Determination of the Parameters. 
 
 115. CU'iU'ial Remarks 215 
 
 1 IG. Remarks on Cri( ieisms 216 
 
 117. Charlier's Computation Scheme 218 
 
 118. Comparison between Observed Data and Theoretical Values 220 
 
 119. Principle of Method of Least Squares 221 
 
 120. Gau.ss' Solution of Normal Equations 224 
 
 121. Arithmetical Ai)pIication of Method 225 
 
 Chapter XVI. 
 Logarithmically Transformed Frequency Functions. 
 
 122. Transformalion of the Variate 235 
 
 123. The General Theory of Transformation 236 
 
 124. Lo^aritlimic; Transformation 237 
 
 125. The Mathematical Zero 238 
 
 125. Logarithmically Transformed Frequency Series 239 
 
 127. Parameters DeterminiMl by Least Squares 243 
 
 128. Application to Graduation of Mortality Tables 244 
 
 129. Formation of Observation Equations 246 
 
 130. Additional Examples ; 257 
 
 ClIAlTKlt Xv'H. 
 Frequency Curves and their Relation to the Bernoullian Series. 
 
 131. The Bernoullian Series 261 
 
 132. Poisson's Exponent ial 265 
 
 133. The Law of Small Numbers 270 
 
 Chapter XVIII. 
 
 Poisson-Charlier Frequency Curves for Integral Variates. 
 
 134. Charlier's B Curve 271 
 
 135. Numerical Examples 273 
 
 136. Transformation of the Variate 274 
 
 137. Bernoullian Series (>xpressed as B Curves 27t) 
 
 138. Remarks on Mr. Kevnes' Criticisms 278
 
 PART I 
 
 MATHEMATICAL PROBABILITIES AND 
 HOMOGllADE STATISTICS
 
 C'lIAITKR I. 
 
 INTRODUCTION: GENERAl, 1MU.\( Il'l.KS AND PHILOSOPHICAL 
 
 ASPIX'TS. 
 
 1. Methods of Attack. The subjrct of the tlieory of proba- 
 bilities may be attacked iii two dilfereiit ways, namely in a 
 philosophical, and in a mathematical manner. At first the 
 subject originated as isolated mathematical ])roblems from games 
 of chance. The pioneer writers on ])robability such as Cardano, 
 Galileo, Pascal, Fermat, and Iluyghens treated it in this way. 
 The famous Bernoulli was, perhaps, the first to view the subject 
 from the philosopher's point of \iew. Laplace wrote his well- 
 known "Essai Philosophiciue des Probabilites," wherein he terms 
 the whole science of i)r()bability as the ai)j)lication of common 
 sense. During the last thirty years numerous eminent jjliilo- 
 sophical scholars such as Mill, Vemi, and Keynes of England, 
 Bertrand and Poincare of France, Sigwart, von Kries and Lange 
 of Germany, Kroman of Denmark, and several Russian scholars 
 have written on the ])hilosophical aspect. 
 
 In the ordinary presentation of the elements of the theory of 
 probability as found in most English text-books, the treatment 
 is wholly mathematical. The student is given the definition of 
 a mathematical probability and the elementary theorems are 
 then proved. ^Ye shall, in the following chapter, depart from 
 this rule and first view the stibject, briefly, from a philosophical 
 standpoint. What the student may thus lose in time we hope 
 he may gain in obtaining a broader view of the ftmdamental 
 principles underlying our science. At the same time, the reader 
 who is unacquainted with the science of philosophy or pure logic, 
 need not feel alarmed, since not even the most elementary 
 knowledge of the principles of formal logic is required for the 
 understanding of the following chapter. 
 
 2. Law of Causality. — In a great treatise on the Chinese civiliza- 
 tion, Oscar Peschel, the German geographer and philosopher, 
 makes the following remarks: "Since otir intellectual awakening, 
 since we have appeared on the arena of history as the creators 
 
 2 1
 
 ^ INTRODUCTION. [2 
 
 and guardians of the treasures of cidture, we have sought after 
 only one thhig, of the presence of which the Chinese had no 
 i(k'a, and for which they would give hardly a bowl of rice. This 
 invisible thing we call causality. \Ye have admired a vast 
 nund)cr of Chinese inventions, but even if we seek through their 
 huge treasures of philosophical writing we are not indebted to 
 them for a single theory or a single glance into the relation 
 between cause and effect." 
 
 The law of causality may be stated broadly as follows: Every- 
 thing that happens, and everything that exists, necessarily 
 happens or exists as the consequence of a previous state of things. 
 This law cannot be proven. It must be taken, a priori, as an 
 axiom; but once accepted as a truth it does away with the belief 
 of a capricious ruling power, and even if the strongest disbeliever 
 of the law may deny its truth in theory he invariably applies it 
 in practice during his daily occupation in life. 
 
 All future human activity is more or less influenced by past 
 and present conditions. Modern historical writings, as for 
 instance the works of the brilliant Italian historian, Ferrero, 
 always seek to connect past events with ])resent social and 
 economic conditions. Likewise great and constructive statesmen 
 in trying to shape the destinies of nations always reckon with 
 past and present events and conditions. We often hear the term, 
 "a man with foresight," applied to leading financiers and states- 
 men. This does not mean that such men are gifted with a vision 
 of the future, but simply that they, with a detailed and thorough 
 knowledge of past and present events, associated with the par- 
 ticular undertaking in which they are interested, have drawn 
 conclusions in regard to a future state of affairs. For example, 
 when the Canadian Pacific officials, in the early eighties, chose 
 Vancouver as the western terminal for the transcontinental 
 railroad, at a time when practically the whole site of the present 
 metropolis of western Canada was only a vast timber tract, they 
 realized that the conditions then i)r('\"ailing on lliis particular 
 spot — the excellent shipping facilities, the favorable location in 
 regard to the Oriental trade, and the natural wealth of the sur- 
 rounding country — would bring forth a great city, and their 
 predictions came true.
 
 3] HYPOTHETICAL JUDGMENTS. 3 
 
 Predictions with regard to the future must be taken seriously 
 only when they are based upon a thorough knowledge of past 
 and present events and conditions. Prophecies, taken in a 
 purely biblical sense of the term and viewed from the law of 
 causality, are mere guesses which may come true and may not. 
 A prophet can hardly be called more than a successful guesser. 
 Whether there iuive been persons gifted with a purely prophetic 
 vision is a question which must be left to the theologians to 
 wrangle over. 
 
 3. Hypothetical Judgments. — Any person with ordinary in- 
 tellectual faculties may, however, predict certain future events 
 with absolute certainty by a simple application of the principle 
 of hypothetical judgment. The typical form of the hypothetical 
 judgment is as follows: If a certain condition exists, or if a certain 
 event takes place then another definite event will surely follow. 
 Or if A exists B will invariably follow. 
 
 iNIathematical theorems are examples of hypothetical judg- 
 ments. Thus in the geometry of the plane we start with certain 
 ideas (axioms) about the line and plane. From these axioms 
 we then deduce the theorems by mere hypothetical judgments. 
 Thus in the Euclidian geometry we find the axiom of parallel 
 lines, which assumes that through a point only one line can be 
 drawn parallel to another given line, and from this assumption 
 we then deduce the theorem that the sum of the angles in a 
 triangle is 180°. But it must be borne in mind that this proof is 
 valid only on the assumption of the actual existence of such lines. 
 If we could prove directly by logical reasoning or by actual 
 measurement, that the sum of the angles in any triangle is equal 
 to 180°, then we would be able to prove the above theorem, the 
 so-called "hole in geometry," independently of the axiom of 
 parallel lines. 
 
 A Russian mathematician, Lobatschewsky, on the other hand, 
 assumed that through a single point an infinite number of parallels 
 might be drawn to a previously given line, and from this as- 
 sumption he built up a complete and valid geometry of his own. 
 Still another mathematician, Riemann, assumed that no lines 
 were parallel to each other, and from this produced a perfectly 
 valid surface geometry of the sphere.
 
 4 INTRODUCTION. [4 
 
 As exam])les of hypothetical judj;inent we have the twia follow- 
 ing well-known theorems from elementary geometry and algebra. 
 If one of the angles of a triangle is divided into two parts, then 
 the line of division intersects the opposite side. If a dccadian 
 number is dixided by ') there is no remainder from the division. 
 
 In natural science, hy])()thetical judgments are founded on 
 certain occurrences (phenomena) which, without exception, have 
 taken place in the same manner, as shown by repeated obser- 
 vations. The statement that a suspended body will fall when its 
 supi)()rt is removed is a hy])othctical judgment derived from 
 actual ex])crieiice and o})servati()n. 
 
 4. Hypothetical Disjunctive Judgments. — In hypothetical 
 judgments we arc always able to associate cause and effect. It 
 happens frequently, however, that our knowledge of a certain 
 comj^lcx of present conditions and actions is such that we are 
 not able to tell beforehand the resulting consequences or effects 
 of such conditions and actions, but are able to state only 
 that either an event A\ or an event E2, etc., or an event En will 
 happen. This represents a hypothetical disjunctive judgment 
 whose typical form is: If A exists either Ei, E-i, E3, • • • or En 
 will happen. 
 
 If we take a die, ?'. e., a homogeneous cube whose faces are 
 marked with the numbers from one to six, and make an ordinary 
 throw, we are not able to tell beforehand which side will turn 
 up. True, we have here again a previous state of things, but the 
 conditions do not allow such a simple analysis as the cases we 
 have hitherto considered under the ])urely hypothetical judgment. 
 Here a multitude of causes influence the final result — the weight 
 and centre of gravity of the die, the infinite number of possible 
 movements of the hand which throws the die, the force of contact 
 with which the die strikes the table, the friction, etc. All these 
 causes are so complex that our minds are not allordcd an Op_ 
 ])ortunity to gras]) and distinguish the im])ulses that determine 
 the fall of the die. In other words we are not able to say, a 
 ])riori, which face will apj)ear. We onl>' kiu)W for certain that 
 either 1, 2, 'A, 4, ."), or (> will appear. If a line is drawn through 
 the vertex of a triangle, it either intersects the opposite side or 
 it does not. If a number is divided bv 5 the division either gives
 
 5] GENERAL DEFINITION OF PROBABILITY OF AN EVENT. 5 
 
 only an integral number or leaves a remainder. If an opening 
 is made in the wall of a vessel partly filled with water, then either 
 the water escapes or remains in the vessel. All the above cases 
 are examples of hyi)otlK'tical disjunctive judgments. 
 
 The four cases show, however, a common characteristic. They 
 all have a certain partial domain, where one of the mutually 
 exclusive events is certain to happen, while the other partial 
 domain will bring forth the other event, and the total area of 
 action embraces both events. Taking the triangle, we notice 
 that the lines may pass through all the points inside of an angle 
 of 300°, but only the lines falling inside the internal vertical 
 angle, ip, of the triangle will produce the event in question, 
 namely the line intersecting the opposite side. There will be 
 an outflow from the vessel only if the hole is made in that part 
 of the wall which is touched by the fluid. 
 
 All problems do not allow of such simple analysis, however, 
 as will be seen from the following example. Suppose we have 
 an urn containing 1 white and 2 black balls and let a person 
 draw one from the urn. The hypothetical disjunctive judgment 
 immediately tells us that the ball will be either black or white, 
 but the particular domain of each event cannot be limited to the 
 fixed border lines of the former examples. Any one of the balls 
 may occupy an infinite numl)er of positions, and furthermore we 
 may imagine an infinite number of movements of the hand which 
 draws the ball, each movement being associated with a particular 
 point of position of the ball in the urn. If we now assume each 
 of the three balls to have occupied all possible positions in the 
 urn, each point of position being associated with its proper 
 movement of the hand, it is readily seen that a black ball will 
 be encountered twice as often as a white ball in a particular 
 point of position in the urn, and for this reason any particular 
 movement of the hand which leads to this point of position 
 grasps a black ball twice as often as a white ball. 
 
 5. General Definition of the Probability of an Event. — All the 
 above examples have shown the following characteristics: 
 
 (1) A total general region or area of action in which all actions 
 may take place, this total area being associated with all possible 
 events.
 
 6 INTRODUCTION. [ 6 
 
 (2) A limited special domain in which the associated actions 
 produce a special event only. 
 
 If these areas and domains, as in the above cases, are of such a 
 nature that they allow a purely quantitative determination, 
 they may be treated by mathematical analysis. We define 
 now, without entering further into its particular logical signifi- 
 cance, the ratio of the second special and limited domain to the 
 first total region or area as the probability of the happening of 
 the event, E, associated with domain Xo. 2. 
 
 We must, however, hasten to remark that it is only in a com- 
 paratively few cases that we are able, a priori, to make such a 
 segregation of domains of actions. This may be possible in 
 purely abstract examples, as for instance in the example of the 
 division of the decadian number by 5. But in all cases where 
 organic life enters as a dominant factor we are unable to make such 
 sharp distinctions. If we were asked to determine the proba- 
 bility of an .r-y ear-old person being alive one year from now, we 
 should be able to form the hypothetical disjunctive judgment: 
 An .r-year-old person will be either alive or dead one year from 
 now. But a further segregation into special domains as was 
 the case with the balls in the urn is not possible. Many ex- 
 tremely complex causes enter into such a determination; the 
 health of the particular person, the surroundings, the daily life, 
 the climate, the social conditions, etc. Our only recourse in 
 such cases is to actual observation. By observing a large 
 number of persons of the same age, x, we may, in a ])urcly em- 
 pirical way, determine the rate of death or survival. Such a deter- 
 mination of an unknown probability is called an emi)irical proba- 
 bility. An empirical probability is thus a probability, into the 
 determination of which actual experience has entered as a domi- 
 nant factor. 
 
 6. Equally Likely Cases.^ — The main difficulty, in the appli- 
 cation of the ab()\-e definition of probability, lies in the deter- 
 mination of the question whether all the events or cases taking 
 place in the general area of action may be regarded as cciually 
 likely or not. Two diametrically ()])i)osite views have here been 
 brought forward by writers on probabilities. Oiu> view is based 
 upon the principle which in logic is known as tlie principle of
 
 6] EQUALLY LIKELY CASES. 7 
 
 "insufficient reason," \vhili> the other view is bused ui)on the 
 principle of "cogent reason." The chissical writers on the theory 
 of probability, such as Jacob Bernoulli and Laplace, base the 
 theory on the principle of insufficient reason exclusively. Thus 
 Bernoulli declares the six possible cases by the throw of a die to 
 be equally likely, since "on account of the equal form of all the 
 faces and on account of the homogeneous structure and equally 
 arranged weight of the die, there is no reason to assume that any 
 face should turn up in preference to any other." In one place 
 Laplace says that the possible cases are "cases of which we are 
 equally ignorant," and in another place, "we have no reason to 
 believe any particular case should hai)pen in i)reference to any 
 other." 
 
 The ()p])osite view, based on the principle of cogent reason, 
 has been strongly endorsed in an admirable little treatise by the 
 German scholar, Johannes von Kries.^ Von Kries requires, first 
 of all, as the main essential in a logical theory of probability, 
 that "the arrangement of the equally likely cases must have a 
 cogent reason and not be subject to arbitrary conditions." 
 
 \n several illustrative exam])les, von Kries shows how the 
 principle of insufficient reason may lead to different and paradox- 
 ical results. The following example will illustrate the main 
 points in von Kries's criticism. Suppose we be given the follow- 
 ing problem: Determine the probability of the existence of human 
 beings on the planet ^Nlars. By applying the first mentioned 
 principle our reasoning would be as follows: We have no more 
 reason to assume the actual existence of man on the planet than 
 the complete absence. Hence the probability for the non- 
 existence of a human being, is equal to ^. Next we ask for the 
 probability of the presence or non-presence of another earthly 
 mammal, say the elephant. The answer is the same, h. Xow 
 the probability for the absence of both man and elephant on the 
 planet is ^ X § = 4."^ The ])robabilit\' for the absence of a third 
 mammal, the horse, is also 2, or the probability for the absence 
 of man, elephant, and horse is equal to (^)^ = |. Proceeding in 
 the same manner for all mammals we obtain a very small proba- 
 
 1 "Die Princij)ien dor '\^'allrs^heinlichkeitsrechnung."l Berlin, 1886. 
 
 2 See the chapter 011 multiplication of probabilities.
 
 8 INTRODUCTION. [6 
 
 bility for the complete absence of all mammals on ]\Iars, or a 
 very large probability, almost equal to certainty, that the planet 
 harbors at least one mammal known on our planet, an answer 
 \vhich certainly does not seem plausible. But we might as well 
 have j)ut the question from the start: what is the probability 
 of the existence or absence of any one earthly mammal onlNIars? 
 The principle of insufficient reason when applied directly would 
 here give the answer §, while when applied in an indirect manner 
 the same method gave an answer very near to certainty. 
 
 An urn is known to contain white and black balls, but the 
 number of the balls of the two different colors is unknown. What 
 is the probability of drawing a white ball? The principle of 
 insufficient reason gives us readily the answer: ^, while the prin- 
 ciple of cogent reason would give the same answer only if it were 
 known a priori that there were equal numbers of balls of each 
 color in tlie urn before the drawing took place. Since this 
 knowledge is not present a priori, we are not able to give any 
 answer, and the problem is considered outside the domain of 
 probabilities. There is no doubt that the principle advocated 
 by von Kries is the only logical one to apply, and a recent 
 treatise on the theory of probability by Professor Bruhns of 
 Leipzig^ also gives the principle of cogent reason the most promi- 
 nent place. On the other hand it must be admitted that if the 
 principle was to be followed consistently in its very extreme it 
 would of course exclude many problems now found in treatises 
 on probability and limit the application of our theory consider- 
 ably in scope. Still, however, we must agree with von Kries 
 that it seems very foolhardy to assign cases of which we are 
 absolutely in the dark, as being equally likely to occur. This 
 very j)rinciple of insufficient reason is in very high degree re- 
 sponsible for the somewhat absurd answers to questions on the 
 so-called "inverse probabilities," a name which in itself is a great 
 misnomer. We shall later in the chapter on "a posteriori" 
 probabilities discuss this question in detail. At present we shall 
 only warn the student not to judge cases of which he has no 
 knowledge whatsoever to be equally likely to occur. The old rule 
 "experience is the best teacher" holds here, as everywhere else. 
 
 ' " KoUektivmasslehre and WiihrHcheinlichkeitsrechnung," Loipzip, 1903.
 
 7 J OBJECTIVE AND SUBJECTIVE PROBABILITIES, 9 
 
 7. Objective and Subjective Probabilities.— In tliis connection 
 it is interesting to note the lucid remarks by the Danish statis- 
 tician, Westergaard. "By every well arranged game of chance, 
 by lotteries, dice, etc.," Westergaard says, "everything is ar- 
 ranged in such a way that the causes influencing each draw or 
 throw remain constant as far as possible. The l)alls are of the 
 same size, of the same wood, and have the same density; they are 
 carefully mixed and each ball is thus apparently subject to the 
 influences of the same causes. However, this is not so. Despite 
 all our efforts the balls are different. It is impossible that they 
 are of exactly mathematically spherical form. Each ball has its 
 special deviation from the mathematical sphere, its special size 
 and weight. No ball is absolutely similar to any one of the 
 others. It is also impossible that they may be situated in the 
 same manner in tlie bag. In short there is a multitude of ap- 
 parently insignificant differences which determine that a certain 
 definite ball and none of the other balls may be drawn from the 
 bag. If such inequalities did not exist one of two things would 
 happen. Either all balls would turn up simultaneously or also 
 they would all remain in the bag. Many of these numerous 
 causes are so small that they perhaps are invisible to the naked 
 eye and completely escape all calculations, but by mutual 
 action they may nevertheless produce a visible result." 
 
 It thus appears that a rigorous application of the principle of 
 cogent reason seems impossible. However, a compromise 
 between this principle and that of the principle of insufficient 
 reason may be effected by the following definition of equally 
 possible cases, viz. : Equally possible cases are such cases in which 
 we, after an exhaustive analysis of the physical laws underlying the 
 structure of the complex of causes influencing the special event, are 
 led to assume that no particular case will occui in preference to any 
 other. True, this definition introduces a certain subjective 
 element and may therefore be criticized by those readers who 
 wish to make the whole theory of probabilities purely objective. 
 Yet it seems to me preferable to the strict application of the 
 principle of equal distribution of ignorance. Take again the 
 question of the probability of the existence of human beings on 
 the planet Mars. The principle of equal distribution of ignorance
 
 10 INTRODUCTION. [7 
 
 readily gives us without further ado the answer ^. Modern astro- 
 physical researches luue, however, verified physical conditions on 
 the planet which make the presence of organic life quite possible, 
 and according to such an eminent authority as ]\lr. Lowell, perhaps 
 absolutely certain. Yet these physical investigations are as 
 yet not sufficiently complete, and not in such a form that they 
 may be subjected to a purely quantitative analysis as far as the 
 theory of probabilities is concerned. Viewed from the stand])oint 
 of the principle of cogent reason any attempt to determine the 
 numerical value of the above probability must therefore be put 
 aside as futile. This result, negative as it is, seems, however, 
 preferable to the absolute guess of | as the probability.
 
 CHAPTER 11. 
 
 HISTORICAL AND BIBLIOCRAPIIICAL XOTES. 
 
 8. Pioneer Writers. — Tlie first attempt to dofino tlie measure 
 of a probahility of a future event is eredited to the Greek ])hilos- 
 oplier, Aristotle. Aristotle ealls an event probable when the 
 majority, or at least the majority of the most intellectual persons, 
 deem it likely to happen. This definition, although not allowing 
 a purely quantitative measurement, makes use of a subjective 
 judgment. 
 
 The first really mathematical treatment of chance, however, is 
 given by the two Italian mathematicians, Cardano and Galileo, 
 who both solved several problems relating to the game of dice. 
 Cardano, aside from his mathematical occupation, was also a 
 professional gambler and had evidently noticed that in all kinds 
 of gambling houses cheating was often resorted to. In order 
 that the gamester might be fortified against sucli cheating prac- 
 tices, Cardano wrote a little treatise on gambling wherein he 
 discussed several mathematical questions connected with the 
 diflFerent games of dice as ])Iayt>d in the Italian gambling houses 
 at that time. Galileo, although not a professional gambler, was 
 often consulted by a certain Italian nobleman on several problems 
 relating to the game of dice, and fortunately the great scholar 
 has left some of his investigations in a short memoir. In the 
 same manner the two great French mathematicians, Pascal and 
 Fermat, were often asked by a professional gamester, the cheva- 
 lier de Mere, to a])])ly their mathematical skill to the solution of 
 different gambling problems. It was this kind of investigation 
 which probably led Pascal to the discovery of the arithmetical 
 triangle, and the first rudiments of the combinatorial analysis, 
 which had its origin in ])r()bability problems, and which later 
 evolved into an independent branch of mathematical analysis. 
 
 One of the earliest works from the illustrious Dutch physicist, 
 Huyghens, is a small pamjjhlet entitled "de Ratiociniis in Ludo 
 Alese," printed in Leyden in the year 1657. Huyghens' tract is 
 
 11
 
 12 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [9 
 
 the first attempt of a systematic treatment of the subject. The 
 famous Leibnitz also wrote on chance. His first reference to a 
 mathematical probability is perhaps in a letter to the ])hiloso- 
 pher, Wolft", wherein he discusses the summation of the infinite 
 series 1 — 1 + 1 — IH----. Besides he solved several problems. 
 
 9. Bernoulli, de Moivre and Bayes. — The first extensive 
 treatise on the theory as a whole is from the hand of the famous 
 Jacob Bernoulli. Bernoulli's book, "Ars Conjectandi," marks a 
 revolution in the whole theory of chance. The author treats 
 the subject from the mathematical as well as from a philo- 
 sophical point of view, and shows the manifold applications of 
 the new science to practical problems. Among other important 
 theorems we here find the famous proposition which has become 
 known as the Bernoulli Theorem in the mathematical theory of 
 probabilities. Bernoulli's work has recently been translated 
 from the Latin into German,^ and a student who is interested in 
 the whole theory of probability should not fail to read this 
 masterly work. 
 
 The English mathematicians were the next to carry on the 
 investigations. Abraham de Moivre, a French Huguenot, and 
 one of the most remarkable mathematicians of his time, wrote 
 the first English treatise on probabilities.^ This book was cer- 
 tainly a worthy product of the masterful mind of its author, and 
 may, even today,, be read with useful results, although the 
 method of demonstration often appears lengthy to the student 
 who is accustomed to the powerful tools of modern analysis. 
 The high esteem in which the work by de ]Moivre is held by 
 modern writers, is proven by the fact that E. Czuber, the eminent 
 Austrian mathematician and actuary, so recently as two years 
 ago translated the book into German. A certain ])roblem (see 
 Chap. l\) still goes under the name of "The Problem of de 
 Moivre" in the modern literature on probability. A contem- 
 porary of de Moivre, Stirling, contributed also to the new branch 
 of mathematics, and his name also is immortalized in the theory 
 of probability by the formula which bears his name, and by which 
 we are able to express large factorials to a very accurate degree 
 of approximation. The third important English contributor is 
 
 ' Ars Conjectandi, Ostwald's Klassiker No. 108, Leipzig, 1901. 
 ^ de Moivre: "The Doctrine of Chances," London, 1781.
 
 10] APPLICATION TO STATISTICAL DATA. 13 
 
 the Oxford clergyman, T. Bayes. Bayes' treatise, which was 
 published after his death by Price, in Philosophical Tratu'iactioiis 
 for 17G4, deals with the determination of the a posteriori proba- 
 bilities, and marks a very imj)()rtant stepping stone in our whole 
 theory, rnfortunately the rule known as Bayes' Rule has been 
 applied very carelessly, and that mostly by some of Bayes' own 
 countrymen; so the whole theory of Bayes has been repudi- 
 ated by certain modern writers. A recent contril)uti()n by the 
 Danish philosophical writer, Dr. Kroman, seems, however, to have 
 cleared up all doubts on the subject, and to have given Bayes his 
 proper credit. 
 
 10. Application to Statistical Data. — In the eighteenth century 
 some of the most celebrated mathematicians investigated 
 problems in the theory of probability. The birth of life as- 
 surance gave the whole theory an important application to 
 social problems and the increasing desire for the collection of all 
 kinds of statistical data by governmental bodies all over Europe 
 gave the mathematicians some highly interesting material to 
 which to apply their theories. No wonder, therefore, that we 
 in this period find the names of some of the most illustrious mathe- 
 maticians of that time, such as Daniel Bernoulli, Euler, Nicolas 
 and John Bernoulli, Simpson, D'Alembert and Buft'on, closely 
 connected with the solution of problems in the theory of mathe- 
 matical probabilities. We shall not attempt to gi\e an account 
 of the diherent works of these scientists, l)ut shall only dwell 
 briefly on the labors of Bernoulli and D'Alembert. In a memoir 
 in the St. Petersburg Academy, Daniel Bernoulli is the first to 
 discuss the so called St. Petersburg Problem, one of the most 
 hotly debated in the whole realm of our science. We may here 
 mention that this problem is today one of the main pillars in the 
 economic treatment of value Bernoulli introduced in the dis- 
 cussion of the above mentioned problem the idea of the "moral 
 expectation," which under slightly difl'erent names appears in 
 nearly all standard writings on economics. 
 
 D'Alembert is especially remembered for the critical attitude 
 he took towards the whole theory. Although one of the most 
 brilliant thinkers of his age, the versatile Frenchman made some 
 great blunders in his attempt to criticize the theories of chance.
 
 14 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [11 
 
 Biitfon's name is remein})ored because of the needle problem, 
 and he may properly be called the father of the so-called "ge- 
 ometrical" or "local" probabilities. 
 
 11. Laplace and Modern Writers. — We now come to that 
 resplendent genius in the investigation of the mathematical 
 theory of chance, the immortal Laplace, who in liis great work, 
 "Theorie Analytique des Probabilites," gave the final mathe- 
 matical treatment of the subject. This massive volume leaves 
 nothing to be desired and is still today — more than one hundred 
 years after its first publication — a most valuable mine of in- 
 formation and compares favorably with much more modern 
 treatises. But like all mines, it requires to be mined and is by 
 no means easy reading for a beginner. An elementary extract, 
 "Essai Philosophique des Probabilites," containing the more 
 elementary parts of Laplace's greater work and stripped of all 
 mathematical formulas has recently appeared in an English 
 translation. 
 
 Among later French works, Cournot's "Exposition de la 
 Theorie des Chances et des Probabilites" (1843), treated the 
 principal questions in the application of the theory to practical 
 problems in sociology. Li 1837 Poisson published his "Re- 
 cherches sur les Probabilites " in which he for the first time proved 
 the famous theorem which bears his name. Poisson and his 
 Belgian contemporary, Quetelet, made extensive use of the 
 theory in the treatment of statistical data. 
 
 Among the most recent French works, we mention especially 
 Bertrand's "Calcul des Probabihtes" (Paris, 1888), Poincare's 
 "Calcul des Probabilites" (Paris, 1896), and Borel's "Calcul des 
 Probabilites" (Paris, 1901). We especially recommend Poin- 
 care's brilliant little treatise to every student who masters the 
 French language, as this book makes no departure from the 
 lively and elucidating manner in which this able mathematical 
 writer treated the numerous subjects on which he wrote during 
 his long and brilliant career as a mathematician. 
 
 Of Russian writers, the mathematician, Tchebycheff, has given 
 some extensive general theorems relating to the law of large 
 numbers. Unfortunately Tchebycheff 's writings are for the 
 most part scattered in French, German, Scandinavian and
 
 11] LAPLACE AND MODERN WRITERS. 15 
 
 Russian journals, and tlnis are not easily accessible to the ordinary 
 reader. A Russian artillery officer, Sabudski, has recently pub- 
 lished a treatise on ballistics in German, wherein he extends the 
 views formulated by Tchebycheff. 
 
 Of Scandinavian writers we mention T. X. Thiele, who prob- 
 ably was the first to ])ul)lish a systematic treatise on skew curves.^ 
 An abridged edition of this very original work has recently been 
 translated into English.- The Dane, Westergaard, is the author 
 of the most extensive and thorough treatise on vital statistics 
 which we possess at the present time. Westergaard 's work has 
 recently been translated into German,^ and is strongly recom- 
 mended to the student of vital statistics on account of his clear 
 and attractive style of presenting this important subject. 
 
 The Swedish mathematicians Charlier and Gylden have 
 published a series of memoirs in different Scandinavian journals 
 and scientific transactions. We may also, in this category, 
 mention the numerous small articles by the eminent Danish 
 actuary. Dr. Gram. 
 
 While tlie German mathematicians in general are the most 
 fertile writers on almost every branch of pure and applied mathe- 
 matics, they have not shown much activity in the theory of 
 mathematical probability except in the past ten years. But 
 during that time there has appeared at least a dozen standard 
 works in German. Among these, the lucid and terse treatise 
 by E. Czuber, the Austrian actuary and mathematician, is 
 especially attractive to the beginner on account of the systematic 
 treatment of the whole subject.'* A very original treatment is 
 offered by H. Bruhns in his " Kollektivmasslehre und Wahrschein- 
 lichkeitsrechnung" (Leipzig, 1903). Among the German works, 
 we may also mention the book by Dr. Norman Herz in "Samm- 
 lung Schubert," and an excellent little work by Hack in the small 
 pocket edition of "Sammlung Goschen." The theory of skew 
 curves and correlation is presented by Lipps and Bruhns in 
 extensive treatises. 
 
 ' "Almindelig lagttagclseslaere," Copenhagen, 1884. 
 
 2 "Theory of Observations," London, 1903. 
 
 3 "Mortalitat und MorbiHtat/' Jena, 1902. 
 
 *E. Czuber, "Wahrscheinlichkeitsrechnung," Leipzig, 1908 an<i 1910, 2 
 volumes.
 
 16 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [11 
 
 We finally come to modern iMiglish writers on the subject. 
 After the appearance of de ]\Ioivre's "Doctrine of Chances" 
 the first work of im])ortance was the book by de Morgan "An 
 Essay on the Theory of Probabilities." The latest text-book is 
 Whitworth's "Choice and Chance" (Oxford Press, 1904): but 
 none of these works, although A'ery excellent in their manner of 
 treatment of the subject, comes up to the French, Scandinavian, 
 and German text-books. Nevertheless, some of the most im- 
 portant contributions to the whole theory have been made by 
 the English statisticians and mathematicians, Crofton, Pearson, 
 and Edgeworth. Especially have frequency curves and cor- 
 relation methods introduced by Professor Karl Pearson been 
 very extensively used in direct applications to statistical and 
 biological problems. Of purely statistical writers, we may 
 mention G. Udny Yule, who has published a short treatise en- 
 titled "Theory of Statistics" (London, 1911). Numerous ex- 
 cellent memoirs have also appeared in the different English and 
 American mathematical journals and statistical periodicals, 
 especially in the quarterly publication, Biomctrika, edited by 
 Professor Karl Pearson. 
 
 In the above brief sketch, we have only mentioned the most 
 important contributors to the theory of i)robabilities proper. 
 Numerous able writers have written on the related subject of 
 least squares, the mathematical theory of statistics and insurance 
 mathematics. We shall not discuss the works of these inves- 
 tigators at the present stage. Each of the most important w'orks 
 in the above mentioned branches will receive a short review in 
 the corresponding chapters on statistics and assurance mathe- 
 matics. The readers interested in the historical develoi)ment of 
 the theory of probabilities are advised to consult the special 
 treatises on this subject by Todhunter and Czuber.^ 
 
 1 After this chapter had gone to press I notice that a treatise by the emi- 
 nent English scholar, Mr. Keynes, is being prepared by The Macniillan Co. 
 In this connection I wish also to call attention to the recent publication by 
 Bachclior (Calcul des probabilites, 1912), a work planned on a broad and 
 extensive scale. — A. F.
 
 CIIAlTKIl III. 
 
 THE MATHEMATICAL THEORY OF PROBABnJTIES. 
 
 12. Definition of Mathematical Probability. — '" If our positive 
 knowledge of the efl'ect of a complex of causes is such that we 
 may assume, a priori, t cases as being equally likely to occur, but 
 of which only/, (/ < t), cases are favorable in causing the event, 
 E, in which we are interested, then we define the proper fraction : 
 f,t = y as the mathematical pr()l)ability of the happening of 
 the event, £" (Czuber). We might also have defined an a 
 priori probability as the ratio of the equally favorable cases to 
 the co-ordinated possible cases. 
 
 As is readily seen, this definition assumes a certain a priori 
 knowledge of the possible and favorable conditions of the event 
 in question, and the probability thus defined is therefore called 
 "a priori probability." Denoting the event by the symbol, E, 
 we express tlie i)robability of its occurrence by the symbol F{E), 
 and the probability' of its non-occurrence by P{E). Thus if / is 
 the total number of equally possible cases and / the number of 
 favorable cases for the event, we have: 
 
 and 
 
 P{E) = j=V, 
 
 P(E) = ^-jJ= i-l=\-p=l- P(E). 
 
 This relation evidently gives us: P{E) + P(E) = 1, which is the 
 symbolic expression for the hypothetical disjunctive judgment 
 that the event E will either happen or not happen. If/ = /, we 
 have: 
 
 P(E) =1=1. 
 
 which is the symbol for the hypothetical judgment that if A 
 exists, E will surely happen. Similarly if/ = 0, we get 
 o 17
 
 18 THE MATHEMATICAL THEORY OF PROBABILITIES. [ 13 
 
 P(E) = 7 = 0, 
 
 or the symbol for the hypothetical judgment: If A exists, E will 
 not happen, or what is the same, E will happen. 
 
 As we have already mentioned, in an a priori determination of 
 a probability, special stress must be laid upon the requirement 
 that all possible cases must be equally likely to occur. The 
 enumeration of these cases is by no means so easy as may appear 
 at first sight. Even in the most simple problems where there 
 can be doubt about the possible cases being equally likely to 
 occur, it is very easy to make a mistake, and some of the most 
 eminent mathematicians and most acute thinkers have drawn 
 erroneous conclusions in this respect. We shall give a few ex- 
 amples of such errors from the literature on the subject of the 
 theory of probabilities, not on account of their historical interest 
 alone, but also for the benefit of the novice who naturally is ex- 
 posed to such errors. 
 
 13. Example 1. — An Italian nobleman, a professional gambler 
 and an amateur mathematician, had, by continued observation 
 of a game with three dice, noticed that the sum of 10 appeared 
 more often than the sum of 9. He expressed his surprise at this 
 to Galileo and asked for an explanation. The nobleman re- 
 garded the following combinations as favorable for the throw of 9 : 
 
 1 
 
 2 6 
 
 1 
 
 3 5 
 
 1 
 
 4 4 
 
 2 
 
 2 5 
 
 2 
 
 3 4 
 
 3 
 
 3 3 
 
 id for the throw of 10 the six combinations of 
 
 1 
 
 3 6 
 
 1 
 
 4 5 
 
 2 
 
 2 6 
 
 2 
 
 3 5 
 
 2 
 
 4 4 
 
 3 
 
 3 4
 
 13] EXAMPLE 1. 19 
 
 Galileo shows in a treatise entitled "Considerazione sopra il 
 giiico (lei dadi" that these combinations cannot be regarded as 
 being ecjually likely. By painting each of the three dice with 
 the diilVrent color it is easy to see that an arrangement such as 
 1 2 () can be produced in G different ways. Let the colors be 
 white, black and red resi)ectivcly. We may then make the 
 following arrangements: 
 
 White 
 
 Black 
 
 Red 
 
 1 
 
 2 
 
 6 
 
 1 
 
 G 
 
 2 
 
 2 
 
 1 
 
 G 
 
 2 
 
 G 
 
 1 
 
 6 
 
 1 
 
 2 
 
 6 
 
 2 
 
 1 
 
 which gives 3! = G different arrangements. The arrangements 
 of 1 4 4 can be made as follows: 
 
 White Bhick 
 
 Red 
 
 1 4 
 
 4 
 
 4 1 
 
 4 
 
 4 4 
 
 1 
 
 which gives 3 different arrangements. The arrangements of 
 3 3 3 can be made in one way only. By complete enumeration 
 of equally favorable cases we obtain the following scheme: 
 
 Sum 9 
 
 cases 
 
 Sum 10 
 
 cases 
 
 1, 2, G 
 
 G 
 
 1, 3, G 
 
 6 
 
 1, 3, 5 
 
 G 
 
 1, 4, 5 
 
 6 
 
 1,4,4 
 
 3 
 
 2,2,6 
 
 3 
 
 2, 2, 5 
 
 3 
 
 2, 3, 5 
 
 6 
 
 2, 3, 4 
 
 6 
 
 2,4,4 
 
 3 
 
 3, 3, 3 
 
 1 
 
 3,3,4 
 
 3 
 
 
 25 
 
 
 27 
 
 The total number of equally possible cases by the different ar- 
 rangements of the 18 faces on the dice is 6^ = 21G. The prob- 
 ability of throwing 9 with three dice is therefore gYc > ^^f throwing 
 
 ■^" 2 16 8-
 
 20 THE MATHEMATICAL THEORY OF PROBABILITIES [14 
 
 14. Example 2. — D'Alembert, the great French mathematician 
 and natural philosopher and one of the ablest thinkers of his 
 time, assigned | as the probability of throwing head at least 
 once in two successive throws with a homogeneous coin. D'Alem- 
 bert reasons as follows : If head appears first the game is 
 finished and a second throw is not necessary. He therefore gives 
 as equally possible cases (we denote head by II and tail by T) : 
 H, TH, TT, and determines thus the probability as |. Where 
 then is the error of D'Alembert? At first glance the chain of 
 reasoning seems perfect. There are altogether three possible 
 cases of which two are in favor of the event. But are the three 
 cases equally likely? To throw head in a single throw is evi- 
 dently not the same as to throw head in two successive throws. 
 D'Alembert has left out of consideration the fact that a double 
 throw is allowed. The following analysis shows all the equally 
 I)ossible cases which nuiy occur: 
 
 ////, Iir, Til, TT. 
 
 Three of those cases favor the event. Hence we have: 
 
 P(E) = p = 
 
 We shall return to this problem at a later stage under the dis- 
 cussion of the law of large numbers. 
 
 The examples quoted have already shown that the enumer- 
 ation of the equally likely cases requires a sharp distinction 
 between the different combinations and arrangements of ele- 
 ments. In other words, the solution of the ])roblems requires 
 a knowledge of permutations and combinations. We assume 
 here that the reader is already acquainted with the elements and 
 formulas from the combinatorial analysis and shall therefore 
 proceed with some more illustrations. In the following, when 
 employing the binomial coefficients, we shall use the notation 
 
 y!) instead of "'Cfc. 
 
 15. Example 3.- An urn contains a white and h black balls. A 
 person draws /.• balls. What is the probability of drawing a 
 white and /3 black balls? 
 
 (a + /3 = /.-, a < a, ^ ^ h)
 
 1 .") ] EXAMPLE 3. 21 
 
 k balls miiy hv drawn from the urn in as many ways as it is possible 
 to select k elements from a + b elements, which may be done in 
 
 ((i+h\ _ (a+b\ 
 
 ways. Furtliermorc there are I I ^roui)s of a white and II 
 
 groups of j3 black balls. Since each combination of any one 
 <2;roui) of the first groups with any one f^rouj) of the second groups 
 is favorable for the ?vent, we have as favorable cases: 
 
 Example 4. A special case of the above problem is the fol- 
 lowing: (jucstion which often appears in the well known game of 
 whist. What are the respecti\e chances that 0, 1, 2, 3, 4 aces 
 are held by a specified player? There are altogether 52 cards 
 in the game equally distributed among 4 players. Of these 
 cards 4 are aces and 4S are non-aces. Hence we have the fol- 
 loAving values for a, b, Ic, a and /3. 
 
 a ^ A, b = 4S, /.• = U, a = 0, 1, 2, .3, 4, /3 = 13, 12, 11, 10, 0. 
 
 Substituting in the abo^•e formula we get: 
 
 /4\ /4S\ /o2\ S2251 
 P«=(())Xll3)^ll3) = 270725 
 
 ?^^=\l)Xll2) --113; = 270725 
 
 /4\ /4Sv /:)2\ 57798 
 ^^^ = I2) X 111) ^ll3) = 270725 
 
 /4\ /4SV /52\ 11154 
 ^^=l3)><llo)^ll3) = 270725 
 
 ?^^=^ (4) ><(<)) -^(13) = 270^- 
 A hypothetical disjvuicti\e judgment immediately tells us that in
 
 22 THE MATHEMATICAL THEORY OF PROBABILITIES. [ 16 
 
 a game of whist a specified player must either hold 0, 1, 2, o or 
 4 aces. Any such judgment is certain to come true. Hence by 
 adding the 5 above computed probabilities we obtain a check 
 for the accuracy of our calculations. The actual addition of the 
 numerical values of /;,), [h, p-i, P:i, and p^ gives us unity which is 
 the mathematical symbol for certainty. Gauss, the renowned 
 German mathematician and astronomer, was an eager whist 
 player. I )uring his f orty-eigiit years of residence in the university 
 town of Gottingen almost every evening he played a rubber of 
 whist with some friends among the university professors. He 
 kept a careful record of the distribution of the aces in each 
 game. After his death these records were found among his 
 papers, headed "Aces in Whist." The actual records agree 
 with the results computed above. 
 
 16. Example 5. — An urn contains n similar balls. A part of 
 or all the balls are drawn. What is the probability of drawing 
 an even number of balls? 
 
 One ball may be drawn in as many ways as there are balls, 
 two balls in as many ways as we may select two elements out of 
 72 elements, and so on. Hence we have for the total number of 
 equally possible cases: 
 
 -(;v(:)+(3)+-+(+')"(:)- 
 
 We have now: 
 and 
 
 . a-.ir = i-(;v(:;)-...+(-i)»(;;). 
 
 The number of favorable cases is given by the expansion: 
 
 /=(:)+(:)+ 
 
 The expression for / is the binomiiuil coefficients less unity. 
 
 Hence we have: 
 
 t = (] -\r 1)" - 1 = 2" - 1. 
 
 If we add the two expansions of (1 + 1)" and (1 — 1)" and then
 
 17] EXAMPLE 6. 23 
 
 subtract 2 we get tlic expansion for 2/. Hence we have: 
 
 2/ = [(1 + 1)" + (1 - D" - 2] .-. / = 2»-' - 1. 
 
 Thus we shall have as the probability of drawing an even numi)er 
 
 of balls: 
 
 2«-i _ 1 
 
 ^^^^^-^^ 
 
 while for an uneven number: 
 
 _ ^ = ^ 
 
 9n— 1 
 
 We notice that the probability of drawing an uneven number of 
 balls is larger than the probability of drawing an even number. 
 This apparently strange result is easily explained without the 
 aid of algebra from the fact that when the urn contains one ball 
 only, we cannot draw an even number. Hence we have p = 0, 
 q = \. With two balls we may draw an uneven number in two 
 ways and an even number in one way, thus p = \, and q = §. 
 The greater weight of q remains when n is finite; only when 
 
 n = CO, P = q = 2- 
 
 17. Example 6. — A box contains n balls marked 1, 2, 3, • • • n. 
 A person draws n balls in succession and none of the balls thus 
 drawn is put back in the urn. Each drawing is consecutively 
 marked 1, 2, 3, • • • n on n cards. What is the probability that 
 no ball marked a (a = 1, 2, 3, • • • n) appears simultaneously 
 with a drawing card marked a? 
 
 The number of equally possible cases is simply the number of 
 permutations of n elements which is equal to nl 
 
 The number of favorable cases is given by the total number 
 of derangements or relative permutations of n elements, i. e., 
 such permutations wherein the numbers from 1 to n do not appear 
 in their natural places. The formula for such relative permuta- 
 tions was first given by Euler in a memoir of the St. Petersburg 
 Academy entitled "Quaestio Curiosa ex Doctrina Combina- 
 tionis." Euler makes use of a recursion formula. A German 
 mathematician, Lampe, has, however, derived the formula in a 
 simpler manner in "Grunert's Archives" for 1884.
 
 24 THE MATHEMATICAL THEORY OF PROBABILITIES. [17 
 
 Lampe denotes by the symbol ^(1) the number of permuta- 
 tions wherein 1 does not appear in its natural place. By letting 1 
 remain fixed in the first place we obtain (n — 1) ! permutations of 
 the other remaining elements, or: 
 
 <^(1)„= nl- {u- 1)1 
 
 permutations where 1 is out of place. Of these permutations 
 there are, however, a number wherein 2 appears in its natural 
 place. If we let 2 remain fixed in this place we shall have: 
 
 <^(l)„_i= {n- l)\- {n-2)\ 
 
 permutations wherein 2 is in its place but 1 out of place, there 
 remains thus: 
 
 <pi2)„ = <^(1)„ - <^(l)„_i = u\- 2(n -1)1+ (n- 2)1 
 
 permutations in which neither 1 nor 2 is in its natural place. 
 Letting 3 remain fixed in its place, the remaining n — 1 elements 
 
 give: 
 
 (n- 1)!- 2{n- 2)!+ (n - 3)! 
 
 cases where 3 is in its place but 1 and 2 are not. Accordingly 
 there will be: 
 
 ,^(.3)„ = <p{2)n - <pi2)n-x =nl- 3(« - 1)! + 3(n - 2)! - (/7-3)! 
 
 permutations in which none of the three elements 1, 2, and 3 is 
 in its place. The complete deduction gives us now for the 
 number r: 
 
 <p{r)n = n\ - ([) {n - 1)! + [[) {n -2)1 
 
 + (- 1)' (I) in - r)l 
 
 arrangements in which none of the numbers 1, 2, 3, • • • r is in 
 its place. Hence the reciuired probability is: 
 
 '^^=i-cy- + r]^^ 
 
 nl \1/ // \2/ „(n - 1) 
 
 "^ ^ ^^\r)n(n- 1) • • • (n - 
 
 (n - 1) • • • (n - r + 1) '
 
 17 ] EXAMPLE 6. 25 
 
 when n = r the above expression becomes: 
 
 or the probability that none of the balls appear in its numerical 
 order. 
 
 When /? = oc the above ex])ression converjijes towards e~^ as 
 a limit. Since the series is rapidly convergent, we may therefore 
 as an approximate \'alue let 
 
 jj = t'-i = 0.36788 • • • . 
 
 The probability that at least one ball appears in numerical order 
 is 
 
 q= I- p = 0.G3213 •••.
 
 CHAPTER TV. 
 
 THE ADDITION AND MULTIPLICATION THEOREMS IN 
 PROBABILITIES. 
 
 18. Systematic Treatment by Laplace. — The reader will readily 
 have noticed that the problems hitherto considered have been 
 solved by a direct application of the fundamental definition of a 
 mathematical probability. Almost every branch of pure and 
 applied mathematics has originated in this manner. A few 
 isolated problems, apparently having no mutual connection what- 
 soever, have presented themselves to different mathematicians. 
 As the number of problems increased, there was found to exist 
 a certain inner relation between them, and from the mere isolated 
 cases there grew a systematic treatment of an entirely new 
 subject. 
 
 The theory of probabilities had its origin in games; and the 
 different problems that arose, were treated individually. From 
 the time of Galileo and Cardano to the appearance of Laplace's 
 great treatise, a number of celebrated mathematicians such as 
 Pascal, Fermat, Huyghens, De Moivre, Stirling, Bernoulli and 
 others had solved numerous problems, some of these, as we already 
 have seen in the preceding chapter, of a quite complex nature. 
 But none of these mathematicians had liitherto succeeded in 
 giving a systematic treatment of the subject as a whole. All 
 their treatises were, as any one taking the trouble to look over 
 the works of De Moivre and Bernoulli will readily notice, mere 
 collections of examples solved by direct ai)plication of om- funda- 
 mental definition. It remained for Laplace first to give the 
 definite rnles to the science by which the solution of a great 
 number of problems, often very c()m])licate(l, was reduced to 
 the application of a few stable principles, first given in his 
 "Theorie Analytique des Probabilites " (Paris, 1812). 
 
 19. Definition of Technical Terms. — Before entering into a 
 demonstration of Laplace's theorems it will, however, be neces- 
 sary to explain a few technical terms which seem commonplace 
 
 26
 
 20] THEOREM OF COMPLETE OR TOTAL PROBABILITY. 27 
 
 and simple enough but which, nevertheless, must be defined 
 clearly in order to avoid any ambiguity. 
 
 In all works on probabilities when speaking of happenings of 
 various events we encounter often the terms, indcpcndoit events, 
 dependent events and mutually exclusive events. An e\'ent E is 
 said to be independent of another event F when the actual 
 happening of F does not influence in any degree whatsoever the 
 probability of the happening of E. On the other hand, if the 
 probability of E is dependent on or influenced by the previous 
 happening of F, then E is said to be dependent on F. Finally the 
 two events E and F are said to be mutually exclusive when 
 through the occurrence of one of them, say F, the other event 
 E cannot take place, or vice versa. We might also in this case 
 consider the two events E and F as members of a complete dis- 
 junction. In a complete hypothetical disjunctive judgment as 
 "When a die is thrown either 1, 2, 3, -i, 5 or G will turn up" 
 each member represents a possible event. Any one of these 
 events is mutually exclusive in respect to the other events of the 
 disjunction. 
 
 20. The Theorem of the Complete or Total Probability, or the 
 Probability of " Either Or." — When an event, E, may happen in 
 any one of the /( different and mutually exclusive ways Ei, E^, 
 Ez, • • ■ En with the respective probabilities: pi, p-i, i:)^, • • • p^, 
 then the probability for the happening of the event, E, is equal 
 to the sum of the individual probabilities: pi, p^, pz, ■ • • pn- 
 
 Proof: The main event, E, falls in n groups of subsidiary events 
 of which only one can happen in a single trial but of which any 
 one will bring forth the event E. Let us by t denote the total 
 number of equally possible cases. Of these possible cases / are 
 in favor of the event. This favorable group of cases may now 
 be divided into n sub-groups of which /, are favorable for the 
 happening of E\, f-i in favor of £'2, fz in favor of £3 • • • /„ in 
 favor of En. When we write: 
 
 P(E) = v = j= ^ =7 + 7 
 
 -l-^+ ... +-^. 
 ^ t ^ ^ t
 
 28 THE ADDITION AND MULTIPLICATION THEOREMS. [21 
 
 Each of the fractions f^'t (a = 1, 2, 3, • • • n) represents the 
 respective probabilities for the actual occurrence of the subsidiary 
 events, Ei, E-i, Ez, • • • En- Hence we shall have 
 
 P{E) ^ p = pi + P2 + />3 + • • • + Pn. 
 
 This theorem is also known as the Addition Theorem of proba- 
 bilities. Instead of "total probability" the German scholar, 
 Reuschle, has suggested the expressive name of the "either or" 
 probability. The term is well selected when we remember that 
 the event, E, will happen when either Ei, or E^ or Ez • • • or En 
 happens. 
 
 Example 7. — What is the probability to throw 8 with two dice 
 in a single throw? 
 
 The total number of ways is t = Or = 36. The event in. ques- 
 tion E is comj)osed of the three subsidiary events favoring the 
 combination of 8: 
 
 E,: G, 2 
 
 £2: 5, 3 
 Ez: 4, 4. 
 
 Now 
 
 / 
 
 ^(^') = i = i^' ^(^') = i = i^' ^(^') = li 
 
 Hence 
 
 ^ ' 18^18"^ 36 36' 
 
 21. Theorem of the Compound Probability or the Probability 
 of " As Well As." — An event E may ha])pen when every one of 
 the mutually exclusive events Ei, E-i, Ez, • ' • En has occurred 
 previously. It is immaterial if the n subsidiary events have 
 happened simultaneously or in succession. But it makes a 
 difference if the events Ei, E2, Ez, • • • En are independent, or 
 dependent on each other. 
 
 1. Independent Events. — The probability, P{E) = p, for the 
 simultaneous or consecutive appearance of several nnitually ex- 
 clusive events: Ei, E>, ■ • • /?„ is equal to the product: pi-p-i-pz- 
 • • • p„ of the individual probabilities of the n events. 
 
 Proof: Let the number of possible cases entering into the 
 complex that brings forth the event E be /. Each of the t^
 
 21 ] THEOREM OF THE COMPOUND PROBABILITY. 29 
 
 possible cases corresi)()nding to the event Ei may occur simul- 
 taneously with each one of the t'l cases corresponding^ to the event 
 Eo- Thus we have altogether /i X /•> cases falling on A'l and E2 
 at the same time. Continuing in the same way of reasoning it 
 is readily seen that the total number of equally possible cases 
 resulting from the simultaneous occurrence of the events Ei, Eo, 
 Ez, • • -En is equal to /i X /'^ X /s X • • • tn. By applying the same 
 reasoning to the favorable cases we get as their total number: 
 
 / = /iX/. X/3X •••/„. 
 
 Hence the fiiuil probaliility for the hai)pening of the simultaneous 
 or consecutive appearance of the n minor events is: 
 
 P(£:) ={ = 7X7X7 X ••••-7= /^ix }hXp,x ••■vn. 
 
 Example 8. — A card is drawn from a whist deck, another card 
 is drawn from a pinochle deck. What is the probability that 
 they both are aces? 
 
 A whist deck contains o2 cards of which four are aces, a 
 pinochle deck 48 cards with 8 aces. Denoting the probabilities 
 of getting an ace from the whist and j)inochle decks by P(Ei) 
 and P{,E2) respectively we have: 
 
 P{E) = P{E,)P{E,) = ^^X^=~. 
 
 2. Dependent Events. — The n events £1, Eo, E3, • • • En are 
 not independent of each other, but are related in such a way that 
 the appearance of Ei influences E2, that event influences in turn 
 £3, Ei event £4 and so on. 
 
 The same reason holds as above, and, 
 
 P{E) = p = pi X p-i X p,X •■■ Pn. 
 
 But p-i means here the probability for the happening of E-i after 
 the actual occurrence of Ei, ps the probability for the happening 
 of E3 after Ei and E2 have pieviously happened, and so on for 
 all n events. 
 
 Example 9. — A card is drawn from a whist deck and replaced 
 by a joker, and then a second card is drawn. What is the prob- 
 abilitv that both cards are aces?
 
 30 THE ADDITION AND MULTIPLICATION THEOREMS. [22 
 
 Denoting the two subsidiary events by Ei and E2 we have: 
 
 P{E) = P(£i)P(E.>) 
 
 4 
 
 3 3 3 
 
 52 
 
 52 13 X 52 676 
 
 The two above theorems are known as the multiphcation theorems 
 in probabihties. Reuschle lias also suggested the name " the 
 as well as ])r()bal)ility." 
 
 22. Poincare's Proof of the Addition and Multiplication 
 Theoreiti. — The French mathematician and physicist, II. 
 Poincare, has derived the above theorems in a new and elegant 
 manner in his excellent little treatise: " Lecons sur le Calcul des 
 Probabilites," Paris, 1896. 
 
 Poincare's proof is briefly as follows: 
 
 Let El and E2 be two arbitrary events. 
 El and E2 may happen in a difi'erent ways. 
 El may happen but not E2 in /3 different ways. 
 E2 may happen but not Ei in 7 different ways. 
 Neither Ei nor E2 will happen in 5 different ways. 
 We assume the total a + j8 + 7 + 5 cases to be equally likely to 
 
 occur. 
 The probability for the occurrence of Ei is 
 
 ^' a + ^ + y + 8' 
 The probability for the occurrence of E2 is 
 
 a + 7 
 
 7>2 
 
 «+^+7+5 
 
 The probability for the occurrence of at least one of the events Ei 
 
 and E2 is 
 
 a + / 3 + 7 
 ^' a + ^ + 7 + 5- 
 
 The probability for the occurrence of both Ei and E2 is 
 
 a 
 
 Pa = 
 
 « + |8 + 7 + 5' 
 
 The probability for the occurrence of Ei when E2 has already oc- 
 curred is 
 
 a 
 
 Pb = , . 
 a + 7
 
 23 ] RELATIVE PROBABILITIES. 31 
 
 The probability for the occurrence of Eo when Ei has already oc- 
 curred is 
 
 a 
 
 The probability for the occurrence of Ei when E2 has not already 
 occurred is 
 
 ^' = ^ + y' 
 
 The probability for the occurrence of E2 when Ei has not already 
 occurred is 
 
 y 
 7 + 
 
 We have now the following identical relations: 
 
 Vi + P2 = P3 + Pa, Pz = P\ + P2 — Pi, 
 
 i. e., the probability that of two arbitrary events at least one 
 will happen is equal to the probability that the first will happen 
 plus the probability that the second will happen less the prob- 
 ability that both will happen. The particular problem which 
 we may happen to investigate may possibly be of such a nature 
 that the two events Ei and E^. cannot happen at the same time, 
 in that case pi = 0, and we get: 
 
 Pz = i^i + Pi. 
 
 In this equation we immediately recognize the addition theorem 
 for two mutually exclusi\-e events. By substitution of the 
 proper values we have furthermore: 
 
 Pi = p-2 ■ Pb or Pi = pi • Pq. 
 
 These equations contain the theorems proved under §21, of 
 the probability for two mutually dependent events, 
 
 23. Relative Probabilities. — We shall now finally give an alter- 
 native demonstration of the same two theorems. It will, of 
 course, be of benefit to the student to see the subject from as 
 many view points as possible; moreovei;, the following remarks 
 will contain some very useful hints for the solution of more com- 
 plicated problems by the application of so-called " relative prob-
 
 32 THE ADDITION AND MULTIPLICATION THEOREMS. [23 
 
 abilities "and a few elementary theorems from the calculus of 
 lojjic. The following paragraphs are mainly based upon a 
 treatise in the Proceedings of the Royal Academy of Saxony, 
 by the German mathematician and actuary, F. Ilausdorff. 
 
 In our fundamental definition of a mathematical probability 
 for the ha])peninp; of an e\-ent E, expressed in symbols by P(E), 
 as the ratio of the equally favorable and equally possible cases 
 resulting from a general complex of causes, we were able to 
 compute the so-called ordinary or absolute probabilities. But 
 if we, from among the favorable cases and possible cases, select 
 only such as bring forward a certain different event, say F, then 
 we obtam the " relative probability " for the happening of E 
 under the assumption that the subsidiary event, F, has occurred 
 previously. For this relative probability we shall employ the 
 symbol Pf{E), which reads "the relative probability of E, 
 positi F." The following problem illustrates the meaning of 
 relative probabilities. If an honor card is drawn from an 
 ordinary deck of cards, what is the probalMlity that it is a king? 
 Denoting the subsidiary event of drawing an honor card by F, 
 and the main event of drawing a king by E, we may write the 
 above mentioned probability in the symbolic form: Pp{E). If 
 on the other hand we knew a priori that a king was drawn, we 
 may also ask for the probability of having drawn an honor card. 
 Since any king also is an honor card, we may write in symbols: 
 Pe{F) = 1. 
 
 Before entering upon the immediate determination of relative 
 probabilities we shall first define a few symbols from the calculus 
 of logic. We denote first of all the occurrence of an event E 
 by E, the non-occurrence of the same event by E. Similarly 
 we have for the occurrence and non-occurrence of other events, 
 F, G, 11, • ■ ■ and /', d, II, • ■. E -Jr F means that at least one 
 of the two events E and F will happen. E X /' or simply E • F 
 means the occurrence of both E and F. From the above 
 definition it follows immediately that E + F = E • F and 
 
 /:=/•:• F + /•; • f. 
 
 This last relation simply states that E will ha])|)(Mi when either 
 E and /' happen simultaneously or when E and the non-aj^pear- 
 ance of /' happen at the same time. If furthermore Fu Fi, Fo,
 
 25] PROBABILITY OP^ Rp:i'ETnTONS. 33 
 
 Fi ■ • • Fn, Fn constitute the inciiihers of a complete disjunetion, 
 i. e., mutually e\chisi\'e events, we have in <jeneral: 
 
 ;•; = E ■ F, + /•: ■ I\ + /•; • f, + /•; ■F2+---F- f„ + /:: • f „. 
 
 From the orii^inal definition of a ])rol)abilit\ , it follows now: 
 
 P{K) = P(E ■ F) + PiE -F), ^ 
 and 
 
 P{E) = P{E • F,) + PiE • F,) + P{E • /',) + P{E ■ F,) 
 
 + PiE ■ Fn) + P{E ■ ¥\), 
 
 i. e., the probability that of several mutuallx' exclusive events 
 one at least will hai)])en is the sum of the probabilities of the 
 happening of the separate events. This is the symbolic form for 
 the addition theorem. 
 
 24. Multiplication Theorem. — We next take two arbitrary 
 events. From these events we ma\' form the following com- 
 binations: 
 
 E ■ F, E ■ F, E ■ F, E ■ F, I e., 
 
 Both E and F happen, 
 
 E ha])])ens but not /' 
 
 F happens but not E 
 
 Neither E nor F happens. 
 
 Furthermore let a, (3, y, 5, be the respective numbers of the 
 favorable cases for the al)ove four combinations of the events 
 E and F. Following the previous method of Poincare, we shall 
 have: 
 
 a -\- y a -\- p 
 
 PiE ■ F) = 
 
 a + /3 + 7 + 5 
 
 25. Probability of Repetitions. — From the above equations it 
 immediately follows: 
 
 PiE ■ F) = PiE) X P,:iF) = PiF) X PriE), 
 
 which is the symbolic form for the multiplication theorems of 
 compound probabilities. 
 4
 
 34 THE ADDITION AND MULTIPLICATION THEOREMS. [25 
 
 In special cases it may ha])])en that the different subsidiary 
 events: Ei, E-i, E3 ■ ■ • E„ are all similar. We shall then have, 
 following the symbolic method: 
 
 E = E, - E, ■ E^ - ■ • En = E, ■ E, ■ E, ■ ■ ■ E, = E,-, 
 and 
 
 P{E) = P(£i") = P(E,r. 
 
 This gives us the following theorem: 
 
 The probability for the repetition n times of a certain event, 
 E, is equal to the ??tli power of its absolute ])robability. 
 
 Thus if P{E) = p we have immediately P{E) = 1 — p. 
 
 PiE") = PiEY = //'. 
 P{E") = P{Er = (1 - pY- 
 
 Thus the probability for the occurrence of E at least once in 
 n trials is • 
 
 P{E + E + ■•■ n times) = 1 - P{E^) = 1 ^ (1 - p)". 
 
 Denoting the numerical quantity of this probability by Q we 
 have: 
 
 1 _ (^ = (1 _ p)n. 
 
 Solving this equation for n we shall have: 
 
 log (1-0 
 
 n = 
 
 log (1 - p) 
 
 Whenever n equals, or is greater than, the above logarithmic 
 value for given values of Q and p we are sure that Q will exceed 
 a previously given proper fraction. To illustrate: 
 
 Example 10. — How often must a die be thrown so that the 
 probability that a six appears at least once is greater than ^? 
 
 Here p = \, Q = 2- Hence we must select for n the smallest 
 positive integer satisfying the relation: 
 
 log i\-\) ^ !2^ _ .301035 . ^ 
 
 log (1 - i) log I .079186 '■ ^^ "" ^' 
 
 For this particular value of n we have in reality: 
 
 Q = 1 - & = -318.
 
 27] EXAMPLE 12. 35 
 
 26. Application of the Addition and Multiplication Theorems 
 in Problems in Probabilities. — We shall next proceed to illustrate 
 the theorems of the preceding paragraphs by a few examples. 
 First, we shall ap})l}- the demonstrated theorems to some of the 
 examples we have already solved by a direct a{)plication of the 
 fundamental definition of a inathcniatica! !)i-()l)al)ility. 
 
 Example 11. — We take first of all our old friend, the problem 
 of D'Aleinbert. What is the probability of throwing head at 
 least once in two successive throws with an uniform coin? 
 
 This j)robleni is most easily sohcd by finding the i)robability 
 first for not getting head in two successive throws. By the 
 multii)lication theorem this probability is: p = | X | = j. 
 Then the probability to get head at least once is 1 — j = f 
 from a simi)le application of the rule in § 25. A more lengthy 
 analysis is as follows. Denoting the event by E, the following 
 cases may appear which may bring forth the desired event: 
 Head in first throw which we shall denote by //i and head in 
 second throw wliich we denote by //.., or head in first throw {II x) 
 and tail in second (7'.), or finally tail in first {Ti) and head in 
 second (7/2). Then we have: 
 
 E= Ih- II, + Ih ■ T,+ T,- Ih, 
 or: 
 
 P{E) = P{H,) . P{Ih) + P{H,) ■ P(T,) + P{T,) . P(H,) 
 
 — ivi-i-ivi_i_ivi — 3 
 
 27. Example 12. — "\Miat is the probability of throwing at 
 least twelve in a single throw with three dice? The expected 
 event occurs when either 12, 13, 14, . . . or 18 is thrown. Of 
 these events only one may happen at a time. We may, there- 
 fore, apply the addition theorem and obtain as the total prob- 
 ability : 
 
 p = pn + Pn + Pi4 + • • • + pis. 
 
 where pn, Pis, • • • Pm are the respective probabilities for throwing 
 the sums of 12, 13, ••• or IS. These subsidiary probabilities 
 were determined in § 1.3 under the problem of Galileo, and: 
 
 T) = ^2_6_ I JUL J ^1_5_ 4. JUL _4_,fi__l__^ I 1 _27 
 
 y 216^2161216^^216^^16^^ 2T6 "T 216 " ^2-
 
 36 THE ADDITION AND MULTIPLICATION THEOREMS. [ 2 >^ 
 
 28. Example 13. — An urn contains a white, h black and c red 
 balls. A sin<:jle ball is drawn a + /3 + 7 times in succession, 
 and the ball thus drawn is replaced before the next drawing takes 
 place. To determine the probability that (1) there are first 
 a white, then 13 black and finally 7 red balls, (2) the drawn balls 
 appear in three closed groui)s of a white, 13 black nnd 7 icd balls, 
 but the order of these groups is arbitrary, (3) that white, black 
 and red balls appear in the same number as above, but in any 
 order whatsoever. 
 
 1. Denoting the three subsidiary events for drawing a white, 
 |3 black and 7 red balls by Fu F-i and Fz, and the main event for 
 drawing the balls in the prescribed order by E, we may write the 
 probability for the occurrence of the main event in following 
 symbolic form involving symbolic probabilities: 
 
 P{E) = P{F{)Pj.,{F,)P^,^.S.Fz). 
 
 Substituting the algebraic values for P{Fi), PiF^) and P{Fz) 
 in the expression for P{E), and then. applying Hausdorff's rule 
 (§24) we get: 
 
 g" h^ c* 
 
 P{E) - Pi - ^^ _^ ^ _,_ ^)a X ^^ _^ ^ _^ ^)3 X ^^ _^ ^ ^ ^^, 
 
 (a+6 + c)''+^+^' 
 
 2. In the second part of the problem the order of the three 
 
 different groups is immaterial. The three subsidiary events: 
 
 Fi, Fi and F^, may therefore be arranged in any order whatsoever. 
 
 The total number of arrangements is 3! = 6. The probability 
 
 of the happening of any one of these arrangements separately 
 
 is the same as the probability comi)uted luider (1). By ai)})lying 
 
 the addition theorem we get therefore as the probability of the 
 
 occurrence of this event: 
 
 6 a''6^cy 
 
 ^'' ~ (a+~6 + c)»-"^+^* 
 
 3. The third part is more easily solved by a direct application 
 of the definition of a mathematical probability. Th(> order of 
 the balls drawn is liere immaterial. Of each individual com-
 
 30] EXAMPLE 15. 37 
 
 bination of a white, jS black balls and 7 red balls it is possible 
 to form (a + /3 + y)\la\&\y\ different i)ennutations as the total 
 number of faA'orable cases. The above number of equally j)os- 
 sible cases is here {a -\- h -\- cY^^^'^. Hence we have: 
 
 (a + /? + 7) ! a'^h^c* 
 
 Vz = TVoTTl X 
 
 a 1/3 17! ^^ (a+ fe + c)"+^+^" 
 
 29. Example 14. — In an urn are n l)alls amon<; which are a 
 white and ^i black. What is the prt)bability in three successive 
 drawings to draw (1) first two white and then one black ball, (2) 
 two white and one black ball in any order whatsoever? (a+jS^w). 
 The probability to draw first one white, then another white and 
 finally a black ball is: 
 
 ^ « (« - 1) _J^__ 
 ^^ n{n - 1) {n - 2)' 
 
 The probability for any of the other arrangements is the same, 
 
 or we have for (2) 
 
 3a {a-\) ^ 
 
 j)2 = 3pi = — X 
 
 n {n - 1) (w - 2) • 
 
 30. Example 15.— What is the chance to throw a doublet of 
 6 at least once in n consecutive throws with two dice? (Pascal's 
 Problem.) 
 
 Chevalier de ]\Iere, a French nobleman and a great friend of all 
 games of chance, went more deeply into the complex of causes in 
 different games than most of the ordinary gamblers of his time. 
 Although not a proficient mathematician he understood suffi- 
 cient, nevertheless, to give some very interesting problems for 
 which he got the ideas from the gambling resorts he frequented. 
 De ^Nlere was a friend of the great French mathematician and 
 philosopher, Blaise Pascal, and went to him whenever he w^anted 
 information on some apparently obscure point in the different 
 games in which he participated. The chevalier had from patient 
 observation noticed that he could profitably bet to throw a six 
 at least once in four throws witii a single die. He reasoned now 
 that the number of throws to throw a doublet at least once with 
 two dice ought to be proportional to the corresponding equal 
 number of possible cases with a single die. For one die there are
 
 38 THE ADDITION AND MULTIPLICATION THEOREMS. [31 
 
 6 possible cases, for two 36. Thus de iNIere thought he could 
 s./icly bet to throw a doublet of 6 in 24 throws with two dice. 
 An actual trial by several games of dice proved extremely 
 disastrous to the finances of the nobleman, who then went to 
 Pascal for an explanation. Pascal solved the problem by a direct 
 application of the definition of a mathematical probability. We 
 shall, however, solve it by an application of the multiplication 
 theorem. 
 
 The probability to get a doublet of G in a single throw is -gQ. 
 The probability of not getting a double six is therefore 1 — ^q 
 = 11 . The probability of the ha])pening of this event n con- 
 secutive times is (f g)". Thus the probability of getting a double 
 six at least once in Ji throws with two dice becomes: p = 1 — 
 (ft)"- Solving this equation for n we shall have: 
 
 ^ log (1 - p) 
 
 log 35 — log 30 ' 
 
 for p = 2 ^'6 shall have: 
 
 log 2 
 n = = 24. G • • •. 
 
 log 3G — log 35 " ' 
 
 First for 25 throws we may bet safely one to one while for 24 
 throws such a bet was unfavorable. This shows the fallacy of 
 de Mere's reasoning. 
 
 31. Example 16. — An urn,.l, contains a balls of winch a are 
 white, another similar urn, B, contains b balls of which /3 are 
 white. A single ball is drawn from one of the two urns. What 
 is the probability that the ball is white? The beginner may easily 
 make the following error in the solution of this problem. The 
 probability to get a white ball from A is a/a, from B, fi/b. Thus 
 the total probability to get a white ball is: a /a + j3,'b. This 
 result is, liowe\er, wrong, for we may, by selecting proper values 
 for a, b, a and /3, obtain a total probability which in numerical 
 value is greater than unity. Thus if a = 7, b = 7, a = 5, 
 /3 = 4, we get as the total probability: 
 
 P — Y T- 1 — y. 
 
 This result is evidently wrong, since a mathematical probability 
 is never an improper fraction. The error lies in the fact that we
 
 32] EXAMPLE 17. 39 
 
 have regarded the two events of drawing a ball from either urn 
 as independent and mutually exclusive. A simple ai)j)lication 
 of the symbolic rule for relative probabilities will give us the 
 result immediately. The main event, K, is com])osed of the two 
 following subsidiary events: (1) to get a white ball from A, or 
 (2) to get a white ball from B. We shall symbolically denote 
 these two events by A ■ IT' and B ■ JV respectively. Thus we 
 have : 
 
 P{E) = P(A ■ JV) + P(B ■ Jn = P(A)PJ]V) + P(B)PJ\V). 
 
 Now the probability to obtain urn A is P{A) = pi = h, also to 
 get B: P{B) = p-y — h. The probability to get a white ball 
 from A when this particular urn is previously selected is expressed 
 by the relative probability: 
 
 Similarlv for B: 
 
 p.m = P^=l- 
 
 PB(in - pa = 1 
 
 Substituting these different values in the expression for P(E) 
 we get finallv: 
 
 For the particular numerical example we have: 
 
 1 /.") 4\ 9 
 
 H=+=) 
 
 P 2\7 ' 7/ 14 
 
 32. Example 17. — The probability of the happening of a 
 certain event, E, is p, while the probability for the non-occurrence 
 of the same event is </ = 1 — p. The trial is now to be repeated 
 n times. The probability that there will be first a successes 
 and then (3 failures is: 
 
 P{E^)Pe'^ (E^) = p" . (j^(a + (3= n). 
 
 This is the probability that the two complementary events E and 
 E happen in the order prescribed above. When the order, in 
 which the successes and failures happen, plays no role during 
 the n trials, that is to say it is only required to obtain a successes
 
 40 THE ADDITION AND M I LTIPLICATION THEOREMS. [33 
 
 and i5 failures in any order whatsoever in n total trials, then the 
 arrangement of the a factors p and /3 factors q is immaterial. The 
 total number of arrangements of n elements of which a are equal 
 to p and 13 equal to q is simply nil (a ! X /? I) . For any one particu- 
 lar arrangement of a factors p and /3 factors q the ])rol)ability of 
 the ha])pening of the two complementary events in this particular 
 arrangement is equal to p"^ ■ q^ . The Addition Theorem im- 
 mediately gives the answer for a successes and /3 failures in any 
 order whatseover as: 
 
 l\E^-m = p^^{^l)pY 
 
 Let us, for the present, regard this probability as being a function 
 of the variable quantity, a, {n being a constant quantity). We 
 may then write: 
 
 p^ = ip{a). 
 
 Letting a assume all positive integral values from to n the 
 above expression for p^ becomes: 
 
 Po 
 
 P2 
 
 
 Pn = P 
 
 These are the respective probabilities for no successes, one success, 
 two successes, . . . and finally n successes in n trials. The 
 above quantities are, however, merely the diflFerent members of 
 the binomial expansion (p + q)". Since p -\- q = 1 from the 
 nature of the problem, we also have (p -\- q)" = 1, or po + pi 
 + P2 + • • • -\- Pn = 1. This last equation is the symbolic 
 form for the simple hypothetical disjunctive judgment: E must 
 happen either 0, 1,2, • • • or 71 times in n total trials. We shall 
 return to this problem later under the discussion of the Bernoullian 
 Theorem. In fact, the above example constitutes an essential 
 part of this famous theorem which has proven one of the most 
 important and far reaching in the whole theory of probability. 
 33. Example 18. De Moivre's Problem. — The following prob- 
 lem was first given by the eminent French-English mathemati-
 
 33] EXAMPLE 18. 41 
 
 cian, Abraham de ]\Ioivre, in a treatise, entitled " De ]\Iensura 
 Sortis," which was pubHshed in London about 1711. 
 
 An urn contains n -{- I balls marked 0, 1,2, • • • n. A person 
 makes i drawings in succession, and each ball is put back in the 
 urn before the next drawing takes place. What is the probability 
 that the sum of the numbers on the n balls thus drawn equals s? 
 
 The first ball may be drawn in n + 1 ways, the second ball 
 may also be drawn in w + 1 ways. Hence two balls may be 
 drawn in (n + 1)^ ways or i balls in (n + 1)' ways: This is the 
 total number of equally possible cases. 
 
 If we expand the expression: 
 
 (.ro + ,;•' + .r2 + .r3 + .r^ + ... .r'')' (1) 
 
 after the multinominal theorem, we notice that the coefficient 
 to x' arises out of the different ways in which 0, 1, 2, 3, • • • n 
 can be grouped together so as to form s by addition, which also 
 is the total number of favorable cases. The expression (1) 
 inside the bracket represents a geometrical progression, which 
 may be written as: 
 
 (1 - .r"+i)'(l - .r)-^ = I 1 - ^'•^•""^' + (i)-^'"'^' ~ (o) •1-^"+^ 
 
 + -.jx(i + «-.('-V)..=+Ct^'),.+ ...j. 
 
 By actual multiplication we get a power series in .r. The terms 
 containing .r^ are obtained in the following manner: the first term 
 of the first factor being multiplied with the term 
 
 (i -\- s — 1 \ 
 I .r' of the second factor, 
 
 the second term of the first factor multiplied with the term : 
 
 (i -\- s — )i — 2 \ 
 , I .j-*~"~^ of the second factor, 
 s — n — 1 f ' 
 
 the third term of the first factor multiplied with the term: 
 I _ o ~_ o ) .T^-""" of the second factor.
 
 42 THE ADDITION AND MULTIPLICATION THEOREMS. [34 
 
 Thus the coefficient of .r" is equal to 
 
 + (;)r!:o;";^)--. 
 
 The above expression may by further rechictions be brought to 
 the form: 
 
 (^+l)(^+2) ■'■ (s+ i-1) - 
 1 • 2 • . . (i - 1) 
 
 i \ (,'i — n){s — n -\- I) • ' ■ (s — n -}- i — 2) 
 
 -{[) 
 
 +(;) 
 
 1 • 2 • • • (i" - 1) 
 •/ \ {s - 2n - l){s- 2n) ■■■ (s - 2n + i - 3) 
 
 1 • 2 ••■(/- 1) 
 
 The series breaks of course as soon as iiei^ativc factors appear 
 in the numerator. The required probabihty is therefore 
 
 . , (^+l)(^+2) ■■■ (s+i-l) 
 
 + 
 
 (»+ DM 1 • 2 ••• (i- 1) 
 
 (s — w)(.? — n + 1) • • • (s — n -\- i — 2) 
 
 -(;) 
 
 1, • 2 • • • (/ - 1) 
 
 34. Example 19. — If a single experiment or observation is 
 made on n pairs of opposite (comi)lementary) events, E^ and E^ 
 with the respective probabilities of happening p^ and q^{a = 1, 
 2, o, • • • n), to determine the i)robability that: (1) exactly r, 
 (2) at least /• of the events E^ will happen. 
 
 This ])r()blem is of great ini])()rtan('e, especially in life assurance 
 mathematics. It happens frecpiently that an actuary is called 
 upon to determine the probability that exactly /• jiersons will be 
 alive m years from now out of a group of n persons of any age 
 whatsoever, each piM-son's age and his individual coefficient of 
 survival through the ixM-iod being known beforehand. 
 
 Various demonstrations have been given of this i)r<)blein. The 
 first elementary proof was probably due to ?^lr. George King,
 
 34] EXAMPLE 19. 43 
 
 the English actuary, in liis well-known text-book. The Austrian 
 mathematician and actuary, E. Czuber, has simplified King's 
 method in his " Wahrscheinlichkeitsrechnung " (lOO^i). Later 
 the Italian actuary, Toja, has given an elegant proof in BoUctino 
 degli Aftitdri, WA. 12. Einally another Italian mathematician, 
 P. ]\Iedolaghi, has investigated the problem from the standpoint 
 of symbolic logic. In the following we siiall adhere to the demon- 
 stration of Czuber and also give a short outline of the symbolic 
 method. 
 
 In order to answer the first part of the problem we must form 
 all possible combinations of r factors of p and ?i — r factors of q 
 and then sum all such combinations of n factors. Denoting 
 the event by E r] 'vve have: 
 
 = -/^„7>3 • • • (1 - Pa)(1 - p,) •■• (1 - p„). 
 
 We shall now denote the sum of all products in (1) containing if 
 factors p by the symbol »S^. It is readily seen that ^ will have 
 all positive integral values from r to n inclusive. We may 
 therefore write the total compound probability in the following 
 form : 
 
 P{E^ri) = AoSr + ,llS.+ : -f .l.,S,+o + • • • + An-rSn. (2) 
 
 The student must bear in mind that the different S are merely 
 symbols for different s^nns of all the i)roducts of /•, r + 1, r + 2, 
 • • • n factors p respectively. Our problem is now to determine 
 the unknown coefficients A. It is easily seen that the coefficient 
 Aq = \, since all different products containing r factors p appear 
 only once. The other coefficients of the form A do not depend 
 on the values of p, however. They remain therefore unaltered 
 if we equate all of the various p's and let them equal p. Ex- 
 pression (1) then simi)ly becomes I j • p''{l — p)"~\ We must 
 form all possible rth powers of n similar factors, which can 
 be done in ( I ways. The expression (2) on the other hand
 
 44 THE ADDITION AND MULTIPLICATION THEOREM. [34 
 
 becomes : 
 
 + h An-rP''. 
 
 Any S^ is by definition the sum of all products containing (p 
 
 factors p and we may form I | sucii j^roducts from n elements 
 
 ;;. But we saw above that <p might only have all positive values 
 from r to n inclusive, hence expression (2) will naturally take 
 tlic alxAe form. We have therefore 
 
 + -'=(r+ 2) ■''"*'+ ■■■+-i„-,p\ 
 
 Expanding the expression on the left hand side by means of the 
 binomial theorem and equating the coefficients of equal powers 
 of J), we get: 
 
 or: 
 
 Substituting these values in (2) for the unknown coefficients A, 
 we shall have: 
 
 P(E,r,) = Sr- (""l ) .SVl + ( ' 2 ) ''^'-'' ~ 
 
 r+ 1\ ., , /r+ 2
 
 34] EXAMPLE 19. 45 
 
 If we expand the algebraic expression: 
 
 = 5'-(l + 5)-('^-i) 
 
 :r+2 
 
 (1 + S)'^' 
 we have: 
 
 We may therefore write P(E) = v-TTT^y+i > when e\-ery exi)o- 
 
 nent is rej)hiced by an index number (i. c, .S* replaced by .S'^) 
 and the expansion broken off at the term *S'". The student must 
 of course constantly bear in mind the symbolic meaning of ^S^. 
 The second ])art of the problem is easily solved by the sym- 
 bolic method. Denoting this particular event hy Er, we have 
 the following identity: 
 
 P(Er) - P(Er+l) = P(E,r,) 
 
 or 
 
 P(Er) - PiE,r,) = P{Er+l). 
 
 The following relations are self-evident: 
 
 P{E,) =1; 
 
 P{Ei) = PiEo) - P(EJ - 1 - Y^, also; 
 
 P{E,) = P{E,) - P(£,„) = '^ ^ ^' 
 
 1+S (1 + 5)2 (1 + 5)2* 
 
 The complete induction gives us finally: 
 
 P(Er) .= 
 
 (1 +sy 
 
 Assuming the rule is true for /•, we may easily prove it is true 
 for r + 1 also. We have in fact: 
 
 (1 + sy (1 + s)^^ (1 + sy+^ '
 
 46 THE ADDITION AND MULTIPLICATION THEOREMS. [35 
 
 35. Example 20. Tchebycheff's Problem. — The following solu- 
 tion of a very interesting problem is due to the eminent Russian 
 mathematician, Tchebycheff, one of the foremost of modern 
 analysts. 
 
 A proper fraction is chosen at random. What is the proba- 
 bility it is in its lowest terms? 
 
 Stated in a slightly different wording the same question may 
 also be put as follows: If A/B is a proper fraction, what is the 
 probability that .1 and B are prime to each other? 
 
 If p2, pz, Ph, • • • pm denote respectively the probabilities that 
 each of the primes 2, 3, 5, • • • m is not a common factor of 
 numerator and denominator of A/B, then the probability that 
 710 prime number is a common factor is: 
 
 P = p2 ■ pz ■ p;, ■ ■ • Pm ■ ■ ■ p, • • • ad. inf. (I) 
 
 This follows from the multiplication theorem and from the fact 
 that the sequence of prime numbers is infinite. 
 
 Tchebycheff now first finds the probability Qm = ^ — Pm that 
 the fraction A/B does contain the prime m as factor of both A 
 and B. By dividing any integral number by the prime m we 
 obtain besides the quotient a certain remainder that must be 
 one of the following numbers, viz.: 
 
 0, 1, 2, 3, 4, • • • (m - 1). 
 
 Each of the above remainders may be regarded as a possible 
 event. The probability to obtain as a remainder is accordingly 
 1/m. The probability that m is contained as a factor of .1 is 
 therefore 1/m. This same quantity is also the probability that 
 m is a factor of B. The probability that both A and B are 
 divisible by m is therefore: 
 
 1 111 .1 
 
 qm= I — Pm = = —^, or Pm = I — —i- 
 
 111 m m- m^ 
 
 Hence we have for the various primes: 
 
 P2=l-r^, P3=l— 5^, P5=l-T^, ••••
 
 35] EXAMPLE 20. tchebycheff's problem. 47 
 
 Formula (I) then takes the form: 
 
 Forming the reciprocal l/P we get: 
 1111 
 
 P l_i i_i i_l 
 
 22 3'-' 52 
 
 ad. inf. 
 
 Now each factor on the right hand side is the sum of a geometrical 
 progression, as: 
 
 p = ^ 1 + ^ + ^2J2 + • ■ • / \ 1 + 32 + (3^ + • • • f 
 
 ( 1 + 55 + (^2 +•••)••• ad. inf. 
 Multiplying out we shall have : 
 
 ^^^T2~i92~r'Q2'T2~'r2' '*' ^^' iril- 
 
 The above infinite series is, however, merely the well known 
 Eulerian expression for tt^/G, hence: 
 
 Suppose furthermore we were assured that none of the three 
 primes 2, 3, 5 was a common factor of both A and B. What 
 would then be the probability that the fraction might be reduced 
 by division by one or more of the other primes? 
 
 Denoting by the symbol Pa) the probabiHty that none of the 
 primes from 7 and upw' ards is a common factor, we get : 
 
 also: 
 
 ^ = ^= (1-2^) (1-32) (1-52)^(7),
 
 48 THE ADDITION AND MULTIPLICATION THEOREMS, [35 
 
 or: 
 
 p.. = ^[(.4)('-^)(-l)] = --. 
 
 The probability of the divisibihty of both numerator and 
 denominator of a fraction chosen at random by a prime hirger 
 than 5 is tlius: 
 
 J_ 
 
 20 
 
 1 - P(7) = -. 
 
 The summation of the infinite series of the reciprocals of the 
 squares of the natural numbers baffled for a long time the skill 
 of some of the most eminent mathematicians. Jacob Bernoulli, 
 the renowned classical writer on probabilities, proved its conver- 
 gency but failed to find its sum. The final summation was first 
 performed by Euler.
 
 CHAPTER V. 
 
 MATHEMATICAL EXPECTATION. 
 
 36. Definition, Mean Values. — It is common belief among 
 many people that gambling and all kinds of betting have their 
 source in reckless desire. This is often argued by moral reform- 
 ers, but cannot be said to be the true cause. Whenever by ordi- 
 nary gambling or by a bet, actual value is exposed to a complete or 
 partial loss, this exposure is not due to the fact that the gamester 
 is reckless, but because there is hope of an actual gain. "Hope," 
 says Spinoza in his treatise on ethics, "is the indeterminate joy 
 caused by the conception of a future state of alTairs of whose 
 outcome we are in doubt." Actual mathematical calculation 
 cannot be attempted on the basis of this definition any more 
 than it could be attempted to determine a mathematical prob- 
 ability from the definition of Aristotle. "We disregard there- 
 fore the psychological element of desire, which is associated with 
 hope or expectation as well as the anxiousness or dread associated 
 with the related psychological element of non-desire" (Cantor). 
 
 The so-called niathrinafical expectation is the product of an 
 expected gain in actual \alue and the mathematical probability 
 of obtaining such a gain. The danger of loss may in this case 
 be regarded as a negative gain. Thus if a person, A, may expect 
 the gain, G, from the event, E, whose probability of happening 
 is equal to p, then e = p-G is his mathematical expectation. 
 The quantity expressed by the symbol, e, is here the amount it 
 is safe to hazard for the expected gain, G. We may also regard 
 the quantity, c, as a mean value or average value. Among a 
 large number of Ji cases only np will bring the gain, G, the others 
 not. Thus the total gain is: 
 
 pjiG -=- w = pG. 
 
 Suppose we have n mutually exclusive events, Ei, Ei, • • • , En, 
 5 49
 
 50 MATHEMATICAL EXPECTATION. [36 
 
 forming a complete disjunction. For their respective prob- 
 abilities we have then the following equation: 
 
 Ih + Ih + />3 + • • • + Pn = 1. 
 
 If the actual occurrence of a certain one of these events, say, 
 E^, brings a gain of 6'„, then the total value of the mathematical 
 expectation of the n events is: 
 
 e = Ih- Ch + Pi • 62 + • • • + Pn ■ Gn = "^Pa ' 0^. 
 
 Since Zp^ = 1 this result may be written: 
 
 f X (pi + i>2 + • • • + Pn) = Ol-Pl + 6'2-i>2 + G3-pz + • ' -Gn-pn, 
 
 hence e may be regarded as the mean value of the different 
 quantities G^ with the weights p^ (a = 1, 2, 3, • • •, n). 
 
 Although we shall discuss the theory of mean values in a 
 following chapter a few preliminary remarks might not be out 
 of place here. 
 
 A variable quantity A' is related to a series of events Ei, Eo, 
 Ez, • • • , En (it being assumed that these events form a complete 
 disjunction) in such a manner that when E^ happens X takes on 
 the value .r^ (a = 1, 2, 3, • • •, n). If furthermore p\, />>, pz, • ■ ■ 
 denote the respective probabilities of the occurrence of Ei, E2, 
 Ez, • • -, then 
 
 M{X) = piXi + 7>2.r2 + ■ • -Pn^-n 
 
 is called the mean value or simply the mean of X. 
 
 The above definition may be illustrated by the following 
 concrete urn-scheme. An urn contains N balls of which a\ balls 
 are marked .vi, a-i balls marked .12 • • • and finally a,, balls marked 
 Xn where Oi + ch + 03 + • • -On = X. Each drawing from the 
 urn produces a certain number A^, which may assume n different 
 values xu X2, xz, • • •, Xn, each with the respective probabilities: 
 
 V^ = N'^ = N-'-^- = X- 
 The arithmetic mean of all the numbers written on the balls is: 
 
 fll-Ti 4- ChXi + • • • anXn 
 
 N 
 which agrees with the mean as defined above.
 
 38] VARIOUS EXPLANATIONS OF THE PARADOX. 51 
 
 37. The Petrograd (St. Petersburg) Problem.— In this con- 
 nection it is worthy to note a celebrated problem, which on 
 account of its paradoxical nature has become a veritable stumb- 
 ling block, and has l)een discussed by some of the most eminent 
 writers on probabilities. The problem was first suggested l)y 
 Daniel Bernoulli in a coniniuiiication to the Petrograd — or as 
 it was then called St. Petersburger Academy — in 173S. 
 
 The Petrograd problem may shortly be stated as follows: Two 
 persons A and B are interested in a game of tossing a coin under 
 the following conditions. An ordinary coin is tossed until head 
 turns up, whicii is the deciding e\ent. If head turns up the first 
 time A pays one dollar to B, if liead appears first at the second 
 toss B is to receive two dollars, if first at the tliird time four 
 dollars and so on. What is the mathematical expectation of /i? 
 Or in other words, how much must B pay to A before the game 
 starts in order that the game may be considered fair? 
 
 The mathematical expectation of B in the first trial is 
 ^ X 1 = ^. The mathematical expectation for head in second 
 throw is (^)- X 2 = ^. Or in general the mathematical prob- 
 ability that head appears for the first time in the nth toss is 
 (§)", and the co-ordinated expectation is 2""^-^ 2" = ^. Thus the 
 total expectation is expressed by the following series: 
 
 1 J- 1 + 1 -I- ... 
 
 2 12 12 1 
 
 "\Mien n = oo as its limiting value it thus appears that B 
 could afford to pay an infinite amount of money for his expected 
 gain. 
 
 38. Various Explanations of the Paradox. The Moral Expec- 
 tation. — Tliis e\idently paradoxical result has called forth a num- 
 ber of explanations of various forms by some eminent mathe- 
 maticians. One of the commentators was D'Alembert. It was 
 to be expected that the famous encyclopaedist, who — as we have 
 seen — did not view the theory of probabilities in too kindly a 
 manner, would not hesitate to attack. He returns repeatedly 
 to this problem in the "Opuscules" (1761) and in " Doutes et 
 questions" (Amstenhiin, 1770). 
 
 D'Alembert distinguishes between two forms of possibilities, 
 viz., metaphysical and i)liysical possibilities. An event is by
 
 52 MATHEMATICAL EXPECTATION [38 
 
 him called a metaphysical possibility, when it is not absurd. 
 When the event is not too "uncommon" in the ordinary course 
 of happenings it is a physical possibility. That head would 
 appear for the first time after 1,000 throws is metaphysically 
 possible but quite impossible physically. This contention is 
 rather bold. "What would," as Czuber remarks, "D'Alembert 
 have said to an actual rej)()rted case in 'Grunert's Archiv' where 
 in a game of whist each of the four players held 13 cards of one 
 suit." The numerical probability of such an event as expressed 
 by mathematical probabilities is (635013559600)-^. 
 
 D'Alembert's definitions including the half metaphorical term 
 "ordinary course" are rather \ague. And what numerical 
 value of the matheinaticid proi)ability constitutes the physical 
 impossibility y D'Alembert gives three arbitrary solutions for 
 the probability of getting head in the nth throw, namely : 
 
 1 _1_ 1 
 
 2»(1 + /3n2) ' 2"+*^" ' , 2^B ' 
 
 2" + 
 
 ' — rj^'?.^ 
 
 (K-n) 
 
 where a, 13, B, K are constants and 7 an une\'en number. 
 
 Daniel Bernoulli himself gives a solution wherein he introduces 
 the term "moral expectation." If a person possesses a sum of 
 money equal to x then according to Bernoulli 
 
 kdx 
 
 dy = 
 
 X 
 
 is the moral expectation of .r, k being a constant quantity. 
 Integrating we get: 
 
 1 (h/ = k I — = A-(log b — log a) = k log- , 
 
 which is the moral expectation of an increase h — a of an original 
 value a. If now x denotes the sum owned by />' we may replace 
 the mathematical expectation by their corresponding moral ex- 
 pectations, that is to say replace 2"-V2" by (1/2") log ((a + 2''-i)/.r) 
 and we then have: 
 
 idi ^+^.11 ^+2, 1. .r+2''\ 
 
 ^\^^log-^+ ilog— ^-h •••^.log ^ j,
 
 38] EXPLANATION OF PARADOX. 53 
 
 wliich is a coiver^ent series. In this connection, it may be 
 mentioned that the Bernoullian liypothesis has found (juite an 
 extensive use in tlie moch-rn theory of utihty. 
 
 De Morgan in his spK'iidid httle treatise "On ProbahiHties" 
 takes the \'ie\v that the solution as first given is by no means an 
 anomaly. He (juotes an actual experiment in coin tossing by 
 BufVoii. Out of 2,{)4S trials 1,{](»1 gave head at the first toss, 
 494 at the second, 2:>2 at the third, KJT at the fourth, oli at the 
 fifth, 29 at the sixth, 25 at the seventh, 8 at the eighth and at 
 the ninth. Computing the various mathematical expectations, 
 we find that the maximum value is found in the 25 sets with head 
 in the seventh toss, which gives a gain of 25 X r)4 = 1,()()(). The 
 most rare occiu*rence, the G sets of head in the ninth tb.row gives 
 a gain of X 25() = 1,5;)G, which is the next highest j^ain in all 
 the nine sets. De Morgan furthermore contends that if Bufi'on 
 had tried a thousand times as many games, the results would 
 not only have given more, but more yer game, arguing "that a 
 larger net would have caught not only more fish but more varieties 
 of fish; and in two millions of sets, we might have seen cases in 
 which head did not appear till the twentieth throw." Further- 
 more, "the player might continue until he had realized not only 
 any given sum, but any given sum per game." Therefore 
 according to De ]Morgan the mathematical expectation of a 
 player in a single game must be infinite.
 
 CHAPTER VI. 
 
 PROBABILITY A POSTERIORI 
 
 39. Bayes's Rule. A Posteriori Probabilities. — The problems 
 hitherto considered liave all luid certain ])()ints in common. 
 Before entering upon the calculations of the mathematical 
 probability of the happening of the event in question, we knew 
 beforehand a certain complex of causes which operated in the 
 general domain of action. ^Ye also were able to separate this 
 general complex of productive causes into two distinctive minor 
 domains of complexes, of which one would bring forth the event, 
 E, while the other domain would act towards the production of 
 the opposite event, E. Furthermore we also were able to 
 measure the respective quantitative magnitudes of the two 
 domains, and then, by a simple algebraic operation, determine 
 the probability as a proper fraction. The addition and multi- 
 plication theorems did not introduce any new principles, but 
 only gave us a set of systematic rules which facilitated and 
 shortened the calculations of the relations between the different 
 absolute probabilities. The above method of determination 
 of a mathematical probability is known as an a j^riori determina- 
 tion, and such probabilities are termed a priori probabilities. 
 
 The problems treated in the preceding chapters have, nearly 
 all, been related to different games of chance, or purely abstract 
 mathematical quantities. The inorganic nature of this kind of 
 ])roblems lias made it possible for us to treat them in a relatively 
 simple manner. Tn many of tlic problems, which we shall con- 
 sider hereafter, organic elements enter as a dominant factor and 
 make the analysis much more complicated and difhcult. 
 
 All social and biological investigations, which are of a much 
 larger benefit and practical vahic tlian the ])robUMns in games of 
 chance, lead often to a completely different categon' of ])robabil- 
 ity problems, which are known as " a posteriori ])robabilities." 
 In ])roblems where organic life enters into the cak'iilations, the 
 complex of productive causes is so varied and manifold, that 
 
 54
 
 40] DISCOVERY AND HISTORY OF THE RULE. 55 
 
 our minds are not able to pifjeonhole the different productive 
 causes, j)lacing them in their proper domains of action. But we 
 know that such causes do exist and are the origin of the event. 
 If now, by a series of observations, we have noticed the actual 
 occurrence of the event, E (or the occurrence of the opposite 
 event E), the problem of the determination of an a posteriori 
 probability to find the probability that the event E originated 
 from a certain complex, say F. We must then, first of all, 
 form a complete hypothetical judgment of the form: E either 
 happens from the complexes i^i, or Fo, or F-i, ■ ■ • or F„. But we 
 must not forget that, in general, the different complexes F^ 
 (a = 1, 2, •••, n) of the disjunction are not known a ])riori. 
 We must, therefore, determine the respective probabilities for 
 the actual existence of such disjunctive complexes F^. These 
 probabilities of existence for the complexes of causes are in 
 general different for each member, a fact which often has been 
 overlooked by many investigators and writers on a posteriori 
 probabilities, and which has given rise to meaningless and 
 paradoxical results. 
 
 40. Discovery and History of the Rule. — The first discoverer 
 of the rule for the computation of a posteriori probabilities by 
 a purely deductive process was the English clergyman, T. Bayes. 
 Bayes's treatise was first published after the death of the author 
 by his friend, Dr. Price, in Philosophical Transactions for 1763. 
 The treatise by the English clergyman was, for a long time, 
 almost forgotten, even by the author's own countrymen; and 
 later English writers have lost sight of the true " Bayes's Rule " 
 and substituted a false, or to be more accurate, a special case of 
 the exact rule, in the different algebraic texts, under the discus- 
 sion of the so called " inverse proba})iIities," a name which is due 
 probably to de Morgan, and which in itself is a great misnomer. 
 This point, presently, we shall discuss in detail. 
 
 The careless application of the exact rule has recently led to 
 a certain distrust of the whole theory of " a posteriori proba- 
 bilities." Scandinavian mathematicians were probably the first 
 to criticize the theory. In 1S79, Mr. J. Bing, a Danish actuary, 
 took a very critical attitude towards the mathematical principles 
 underlying Bayes's Rule, in a scholarly article in the mathe-
 
 56 PROBABILITY A POSTERIORI. [41 
 
 matical journal Tidsskrift for Matematik. Bing's article caused 
 a sharp, and often heated, discussion among the older and younger 
 Danish mathematicians at that time; but his views seem to have 
 gained the upper hand, and even so great an authority on the 
 whole subject as the late Dr. T. Thiele, in liis well-known work, 
 " Theory of Observations "' (London, 19()o), refers to Bing's 
 article as "a crushing proof of the fallacies underlying the 
 determination of a posteriori probabilities by a purely deductive 
 method." As recently as 1908, the Danish writer on philosophy. 
 Dr. Kroman, has taken up cudgels in defense of Bayes in a 
 contribution in the Transadiuns of the Royal Danish Academy 
 of Science, which has done much towards the removal of many 
 obscure and erroneous vicMs of the older authors. Among 
 English writers, Professor Chrystal, in a lecture delivered before 
 the Actuarial Society of Edinburgh, has also given a sharp 
 criticism of the rule, although he does not go so dee])ly into the 
 real nature of the problem as either Bing or Kroman. 
 
 Despite Chrystal's advice to " bury the laws of inverse prob- 
 abilities decently out of sight, and not embalm them in text books 
 and examination papers " the old view still holds sway in recent 
 professional examination papers. It is therefore absolutely 
 necessary for the student preparing for professional examinations 
 to be acquainted with the theory. In the following ])aragraphs 
 we shall, therefore, give the mathematical theory of Bayes's 
 Rule with several examples illustrating its application to actual, 
 problems, together with a criticism of the rule. 
 
 41. Bayes's Rule (Case I). — {The different complexes of causes 
 producing the observed event, E, possess different a priori proba- 
 bilities of existence.) Let E denote a certain state or condition, 
 which can appear under only one of the mutually exclusive 
 complexes of causes: Fi, Fo, • • ■ and not otherwise. Let the 
 probability for the actual existence of F\ be ki and if F\ really 
 exists then let coi be the " productive probability " for bringing 
 forth the observed event, E {E being of a different natm-e from 
 F), which can only occur after the previous existence of one of 
 the mutually exclusive complexes, F. Let, in the same manner, 
 F2 have an " existence probability " of ko and a " productive 
 probability " of 0^2, F^ an existence probability of ks and a pro-
 
 41 ] BA yes's rule. 57 
 
 ductive probability of 003 • • • etc. If now, by actual observa- 
 tion, we have noted that the event E has occurred exactly m 
 times in n trials, then the probability that the complex Fi was 
 the origin of E is: 
 
 Similarly that complex /^ ^vas the origin: 
 
 ,, , , wY 1 , , ^ n — m 
 
 r^, = 
 
 
 and so on for the other complexes. 
 
 Proof. — Let the number of equally possible cases in the general 
 domain of action, which leads to one of tlie complexes F^, be /. 
 Furthermore, of these t cases let /i be favorable for the existence 
 of complex /'\,/2 for F2,/3 for F3, • • • , etc. Then the probabilities 
 for the existence of the different complexes F^ia = 1, 2, 3, • • • 71) 
 are: 
 
 /l f-2 /s ,. , 
 
 Ki = y , K'2 = 'T , ^3 = Y ■ ' ■ respectively. 
 
 Of the/i favorable cases for complex Fi, Xi are also favorable for 
 
 the occurrence of E. 
 Of the/2 favorable cases for complex Fo, Xo are also favorable for 
 
 the occurrence of E. 
 Of the/3 favorable cases for complex F3, X3 are also favorable for 
 
 the occurrence of E. 
 
 The probability of the happening of E under the assumption that 
 Fi exists, i. e., the relative probability: PyJ^E), is: 
 
 COi 
 
 Al 
 "/l 
 
 ^\ 
 
 "A 
 
 or in general: 
 
 (a= 1, 2, 3, .-.). 
 
 The total number of equally likely cases for the simultaneous 
 occurrence of the event E with either one of the favorable cases
 
 58 PROBABILITY A POSTERIORI. [41 
 
 for Fi, F2, Fz, ' ■ ■ is: 
 
 Xi + X2 + X3 + • • • = ^K. 
 
 The number of favorable cases for the simultaneous occurrence 
 of Fi and E is Xi, for the simultaneous occurrence of Fo and E, 
 X-:, • • • , etc. Hence: we have as measures for their corresponding 
 probabilities 
 
 
 
 
 
 But 
 
 
 
 
 
 Xi = oji • J\, Xo = 002 • fi, 
 
 
 , etc., 
 
 and 
 
 
 
 
 
 /i = ^■l • i, f-i = K-z • t, ■ 
 
 ' y 
 
 etc. 
 
 Hence 
 
 
 
 
 
 Xi = COi • K\ • t, X> = CO-) • K-y 
 
 • t, 
 
 • • • , etc 
 
 Substituting these values in the above expression for Qi, Qo 
 we get: 
 
 as the respective probabilities that the obserxed event originated 
 from the complexes 7'\, Fo, F^, ■ • • . Such probabilities are called 
 a posteriori probabilities. 
 
 Let us now for a moment investigate the above expression for 
 Q\, Qi, • • • • The numerator in the expression for ^i is aci • coi. 
 But Ki is simply the a priori probability for the existence of F\ 
 while coi is the a ])riori productive probability of bringing forth 
 the event observed from complex F\. The product /ci • coi is 
 simply the relative probability Pf^(E), or the probability that 
 the event E originated from Fi. In the denominator we have 
 the expression '^k^o)^ (a = 1,2, • • • n) which is the total proba- 
 bility to get A' from any of the com])lexes F^. From example 17 
 (Chapter IV) we know that the probability to get E exactly m 
 times from Fi in w total trials is: 
 
 7)1 = (2)k-i • c^rd - co,y-^ 
 
 and the probability to get E from any one of the complexes, F,
 
 43] BA yes's rule. 59 
 
 m times out of n is: 
 
 2/Ja = ( 2 ) l^K-a • a;/'(l - coj"— (a = 1, 2, 3, • • •)• 
 
 If, by actual observation, we know the event E to have happened 
 exactly m times out of 7i, then tlie a posteriori i)r()habiHty that 
 Fi was the origin is: 
 
 ^^1= --r^TT (« = 1, 2, 3, . • .). (I) 
 
 ^:) 
 
 The factorials I j in numerator and denominator cancel each 
 
 other of course. It will be noticed that, in the above proof, it 
 is not assumed that the a posteriori proba})ility is j)roportional 
 to the a priori probal)ility, an assumption usually made in the 
 ordinary texts on algebra. 
 
 42. Bayes's Rule (Case II). — {S,i)eclnl Case. The a priori 
 probabilities of existence of the different complexes are equal.) 
 Sometimes the different complexes F may be of such special 
 characters tha!; their a priori probabilities of existence are equal, 
 i. e., 
 
 Ki = K-2 = K3 = K:4 • • • Kn- 
 
 In this case the equation (I) simjily reduces to: 
 
 cor(i — (oi)""^ 
 
 Equation (I) gives, however, the most general expression for 
 Bayes's Rule which may be stated as follows: 
 
 // a definite observed event, E, can originate from a certain series 
 of mutually exclusive complcres, F, and if the actual occurrence of 
 the event has been observed, then the probability that it originated 
 from a specified complex or a specified group of complexes is also 
 the " a posteriori " probability or probability of existence of the 
 specified complex or group of complexes. 
 
 43. Determination of the Probabilities of Future Events 
 Based Upon Actual Observations. — It happens frequently that
 
 60 PROBABILITY A POSTERIORI. [43 
 
 our knowledge of the general domain of action is so incomplete, 
 that we are not able to determine, a priori, the ])robability of the 
 occurrence of a certain expected event. As we already have 
 stated in the introduction to a posteriori probabilities, this is 
 nearly always the case with problems wherein organic life enters 
 as a determining factor or momentum. But the same state of 
 affairs may also occur in the category of i)r()l)lcms relating to games 
 of chance, which we have hitherto considered. Sui)pose we had 
 an urn which was known to contain white and black balls only, 
 but the actual ratio in which the balls of the two different colors 
 were mixed, was unkiu)wn. With this knowledge beforehand, 
 we should not be able to determine the ])robability for the drawing 
 of a white ball. If, on the other hand, we knew, from actual 
 experience by repeated obser\'ations, the results of former draw- 
 ings from the same urn when the conditions in the general domain 
 of action remained unchanged during each separate drawing, then 
 these results might be used in the determination of the prob- 
 ability of a specified event by future draA\ings. 
 
 Our problem may be stated in its most general form as follows: 
 Let Fa denote a certain state or condition in the general domain 
 of action, which state or condition can appear onj}' in one or the 
 other of the mutually exclusive forms: Fi, F-y, /"s, • • •. and not 
 otherwise. Let the probability of existence of Fi, F-2, F^, • • • be 
 K\, K2, K'3, • • • respectively, and when one of the complexes F\, Fo, 
 Fz, • • • exists (occurs) let coi, coo, C03, • • • be the respective pro- 
 ductive probabilities of bringing forth a specified event, E. 
 If now, by actual observation, we know the event, E, to have 
 happened exactly ni times out of n total trials (the conditions in 
 the general domain of action being the same at each individual 
 trial), what is then the probability that the event, E, will happen 
 in the (n + l)th trial also? 
 
 By Bayes's Rule we determined the "a posteriori " probabili- 
 ties or the probabilities of existence of the complexes Fi, Fi, • • ■ 
 as: 
 
 fl ~ V.r . /., m/l , . \n—m ' ^2 ~ 
 
 {a= 1, 2, 3, •••). 
 In the in -f l)th trial E may happen from any one of the mutually
 
 44] APPLICATION OK BAVEs's RULE. 61 
 
 exclusive complexes: /'\, F-i, P\, ■ ■ ■ whose respective probabilities 
 in producing the event, E, are coi, ooo, (^3, •••• The addition 
 theorem then gives us as the total ])robabiIity of the occurrence 
 of Em the (/? + l)th trial: 
 
 R^ = ^P,M^) = (?i . coi + (?, . CO, + Q, • C03 
 
 SK' „-a,„"'(l-coJ"--a). ^ ^ ^ ^ ^ (HI) 
 
 ^K^ -CO, U — CO J 
 
 If the a priori probabilities of existence are of equal magnitude 
 (Case II) the factors k in the above expression cancel each other 
 in numerator and denominator and we have 
 
 ^ ScO.-d - CO J "-"CO. 
 
 2co/'(l -coj— • • ^^'^ 
 
 44. Examples on the Application of Bayes's Rule. — Example 
 21. — An urn contains two balls, white or l)lack or both kinds. 
 What is the proba})ility of getting a white ball in the first draw- 
 ing, and if this event has happened and the ball replaced, what 
 is then the probability to get white in the following drawing? 
 
 Three conditions are here possible in the urn. There may be 0, 
 1, or 2 white })alls. Each hyi)othelical condition has a proba- 
 bility of existence equal to \, and the producti^'e probabilities 
 for white are 0, | and 1 respectively. The total probability to 
 get white is therefore: 
 
 Pi = I • + i • i + i • 1 = i 
 
 If we now draw a white ball then the probabilities that it came 
 from the complexes: Fi, ¥■>, F3, respectively, are: 
 Q . 1 1^1 1^1 
 
 2' 6 • 2' 3 
 
 These are also new existence probabilities of the three proba- 
 bilities. The probability for white in second drawing is therefore . 
 
 (0-^^)0+(J-^|)§+(i^i)l = f. 
 
 This solution of the problem is, however, not a unique solution, 
 because it is an arbitrary solution. It is arbitrary in this respect, 
 that we have without further consideration given all three com- 
 plexes the same probability of existence, \. We shall discuss
 
 62 PROBABILITY A POSTERIORI. [45 
 
 this part of the question under the chapter on the criticism of 
 Bayes's Rule. 
 
 Example 22. — iVn urn contains fi^'e balls of which a part is 
 known to be white and the rest black. A ball is drawn four 
 times in succession and replaced after each drawing. By three 
 of sucli (h-awiiiii's a white ball was obtained and l)y one drawing 
 a black l)all. What is the probability that we will get a white 
 ball in the fifth drawing? 
 
 In regard to the contents of the urn the following four hypoth- 
 eses are possible: 
 
 ¥\: 4 Avhite, 1 black balls, 
 
 ¥.: 3 " 2 " 
 
 l\\ 2 " 3 " " 
 
 ¥,: 1 " 4 " 
 
 Since we do not know anything about the ratio of distribution 
 of the different colored balls, we may by a direct application of 
 the principle of insufficient reason regard the four complexes as 
 equally probable, or: 
 
 K\ = Ko = K3 = K'4 = J. 
 
 If either Fi, Fo, F3 or Fi exists, the respective productive 
 probabilities are: 
 
 wi = I, C02 = f, aj3 = f , C04 = i. 
 
 By a direct substitution in the formula: 
 
 R = 
 
 2co/'(l - ojj"-^ 
 (a = 1,2, 3, 4) for n = 4 and m = 3 we get: 
 
 p _ (miw + gram + amm + anm) ^ ,, 
 ^tm) + (im) + (im) + (im) ^*" 
 
 45. Criticism of Bayes's Rule, — In most English treatises on 
 the theory of cliaiice the " a posteriori " determination of a 
 mathematical probability is discussed under the socalled " in- 
 verse probabilities." This somewhat misleading name was prob- 
 ably first introduced by the eminent PvUglish mathematician and 
 actuary, Augustus de Morgan. In the opening of the discussion
 
 45] CRITICISM OP" BAYEb'.S RULE. 63 
 
 of a posteriori ])r()l);il)ilitic> in the tiiird chai)tcr of his treatise, 
 " An Essay on rrol);il)ilities/" de Morgan says: " In the preceding 
 chapter, we liave calcidated the chances of an event, knowing the 
 circumstances under which it is to happen or faih We are noAV 
 to place ourselves in an inverted position, we know the event, 
 and ask what is the probabiHty which results from the event 
 in favor of any set of circumstances under which the same might 
 have happened." Is this now an inverse process? By the a 
 priori or — as de Morgan prefers to call them, — the direct prob- 
 abilities, we started from a definitely known condition and de- 
 termined the probability for a future event, E, or what is the same, 
 the probability of a specified future state of affairs. Here we 
 start knowing the present condition and try to determine a past 
 condition. The ])rocess apparently a])pears to be the inverse of 
 the former, although they both are the same. We possess a 
 definite knowledge of a certain condition and try to determine 
 the probability of the existence of a specified state of affairs, in 
 general different from the first condition, but whether this state 
 of affairs occurred in the past or is to occur in the future has no 
 bearing on our problem. In other words, time does not enter 
 as a determining factor. And even if we were willing to admit 
 the two processes of the determination of the different probabil- 
 ities to be inverse, the probabilities themselves can not be said 
 to be inverse. Nevertheless, this misleading name appears over 
 and over again in examination papers in England and in America 
 as a thoroughly embalmed corpse which ought to have been 
 buried long ago. What is really needed, is a change of customary 
 nomenclature in the whole theory of probability. Instead of 
 direct and inverse, a priori and a posteriori probabilities, it would 
 be more proper to si)eak about " prospective " and " retro- 
 spective " probabilities in the application of Bayes's Rule. All 
 probabilities are in reality determined by an empirical process. 
 That there is a certain probability to throw a six with a die we 
 only know after we have formed a definite conception of a die. 
 The only probabilities which we perhaps rightly may name a 
 priori are the arbitrary probabilities in purely mathematical 
 problems where we assume an ideal state of affairs. " There 
 is," to quote the Danish writer on logic. Dr. Kroman, " really
 
 64 PROBABILITY A POSTERIORI. [46 
 
 more reason to doubt the a priori than the a posteriori probabil- 
 ities, and it would be more natural and also more exact in the 
 application of Bayes's Rule to speak about the actual or original 
 and the new or gained probability." 
 
 The discussion above has really no direct bearing on Bayes's 
 Rule but was introduced in order to give the student a clearer 
 understanding of the main principles underlying the whole deter- 
 mination of a posteriori probabilities by means of actual experi- 
 mental observations, and also to remove some obscure points. 
 From his ordinary mathematical training every student of mathe- 
 matics has an almost intuitive imderstanding of an inverse process. 
 Naturally when he encounters again and again the customary 
 heading: " inverse probabilities " in text-books he obtains from 
 the very start — almost before he starts to read this particular 
 chapter — an inverse idea of the sul)ject instead of the idea he really 
 ought to have. Nowhere in continental texts on the theory of 
 probabilities, will the reader be able to find the words direct and 
 inverse applied in the same sense as in English texts since the 
 introduction of these terms by de Morgan. We shall advise 
 readers who have become accustomed to the old terms to pay 
 no serious attention to them. 
 
 46. Theory Versus Practice. — In § 41 we reduced Bayes's 
 Rule to its most general form: 
 
 ^ - -K. • cord - coj"- ^" - i. -, -^, • • •)• 
 
 This is an exact expression for the rule, but it is at the same 
 time almost impossible to employ it in ])ractice. Only in a few 
 exceptional cases do we know, a priori, the different values of the 
 often numerous probabilities of existence k„, of the complexes 
 F^, and in order to api)ly the rule with exact results we require 
 here sufficient facts and information about the dilfcrcnt com- 
 plexes of causes from which the observed event. A', originated. 
 Bayes deduced the rule from special examples resulting from 
 drawings of balls of different colors from an urn where the different 
 complexes of causes were materially existent. The i)robability 
 of a cause or a certain com])Iex of causes did not here mean the 
 probability of existence of such a complex but the i)robability
 
 ■46] THEORY VERSUS PRACTICE. 65 
 
 that the observed event originated from this particuhir complex. 
 In order to eiucichite this statement we give following sinipU; 
 exami)le: 
 
 Example 23. — A bag contains i coins, of which one is coined 
 with two heads, the other three having both head and tail. A 
 coin is drawn at random and tossed four times in succession and 
 each time head turns up. What is the probability that it was 
 the coin with two heads? 
 
 The two complexes /\ and Fo, whicii may produce the event, 
 E, are: Fi, the coin with two heads, and /-'•>, an ordinary coin. 
 The probability of existence of /\ is the probability of drawing 
 the single coin with two heads which is equal to j, the probability 
 of existence for the other complex, F-y, is equal to |. The 
 respective productive probabilities are 1 and ^. Thus ki = ^, 
 Ko = 4 , coi = 1 and coo = ^. Substituting these values in formula 
 (I) (n = 4, m = 4), we get: 
 
 Q = axv)^{\xv + ix ay) = i-^u = H- 
 
 But in most cases we do not know anything about the material 
 existence of the complexes of causes from which the event, E, 
 originated. On the contrary, we are forced to form a hypothesis 
 about their actual existence. To start with a simple case we take 
 example 21 of § 44. 
 
 We assumed here three equally possible conditions in the urn 
 before the drawings, namely the presence of 0, 1, or 2 white balls. 
 From this assumption we foiuid the probability to get a white 
 ball in the second drawing, after we had i)re\'iously drawn a white 
 ball and then put it back in the urn before the second drawing, 
 to be equal to |. As we already remarked, this solution is not 
 unique because it is an arbitrary solution. It is arbitrary to 
 assign, without any consideration whatsoever, ^ as the probability 
 of existence to each of the three conditions. Let us suppose 
 that each of the two })alls bore the numbers 1 and 2 respectively. 
 W^e may then form the following equally Hkely conditions: 
 
 ?>i6o, hiir-2, h-iU'i, iViWo, 
 
 each condition having an a })riori probability of existence equal 
 to J and a productive probability for the drawing of a white 
 6
 
 66 PROBABILITY A POSTERIORI. [46 
 
 ball equal to: 0, ^, ^ and 1 respectively. Thus: 
 
 K'l = K2 = 'v'S — '<4 = 4 
 
 and 
 
 The respective a posteriori probabilities, that is the new or 
 gained probabilities of the four hyi)othetical conditions, become 
 now by tiie ai)plieati()n of Bayes's Rule (Formula II): 
 
 Qi= I f?2 = I - 2, r^3 = ^ - 2, (?4 = i 
 
 Hence the pr()l)abiUty for white in the second drawing is: 
 
 / -co "'(1 - a)J"~'"co„\ 
 
 (Formula IV: 7? = V^lvi ^\;r^) 
 
 \ 2a)„'"(l - wj"-'" / 
 
 2a),-(l - wj' 
 
 7? = - 2 + (i - 2) + (i - 2) + (1^2) 
 
 In the first solution we got I for the same probability. Which 
 answer is now the true one? Neither one I The true answer to 
 the problem is tluit it is not given in such a form tliat the last 
 question — the probability of getting a wliite ])all in the second 
 drawing — may be settled without any doubt. The answer must 
 be conditional. Following the first hypothesis we got |, while 
 the second hypothesis gives | as the answer. 
 
 We next proceed to example 22 which is abnost identical in 
 form to the first one, the only difference l)cing a greater variety 
 of hypothetical conditions. We started here with the folh)wing 
 four hypotheses: 
 
 Fi: 4 white, 1 black ball, lu: 3 wliite, 2 black, F.,: 2 white, 3 
 black and F4: 1 white and 4 black balls, assigning | as the hy- 
 pothetical existence ])roba})ility. 
 
 By marking the 5 balls similarly as in the last example, with 
 the numbers from 1 to 5 we may form the comj)lexes: 
 
 Fi: 4 white and 1 black ball in (1) ways, 
 Fo: ?> " " 2 " balls" il) " 
 
 F3: 2 " " .3 " " " (|) " 
 ^4=1 " " 4 " " " (f) " 
 
 This gives us a total of 5 + 10 + 10 + 5 = 30 different 
 complexes. Assuming all of these complexes equally likely
 
 47] PROBABILITIES FXPRESSED BY INTEGRALS. 67 
 
 to occur, we got following probabilities of existence and pro- 
 ductive probabilities: 
 
 Ki = K2 = K3 = Ki = ' ' ' — >^so — '5^ 
 
 coi = CO) = C03 = o).} = ojs = I (Productive prob. for Fi) 
 CO6 = C07 = coh = • • • = Wis = I (Productive prob. for F2) 
 CO 16 = 00 17 = • • . = 0)25 = I (Productive prob. for Fs) 
 o)->& = CO27 = C028 = CO29 = C1J30 = ^ (Productive prob. for F4). 
 
 The total probability of getting a white ball in the second 
 
 drawing is now —^t,\ ^ — (a = 1, 2, 3, • • •, 30). 
 
 Sco^ll-ooJ 
 
 Actual substitution of the above values of co in this formula 
 gives us the final result as: R = ||. 
 
 47. Probabilities Expressed by Integrals. — By making an ex- 
 tended use of the infinitesimal calculus Mr. Bing and Dr. Kroman 
 in their memoirs arrived at much more ambiguous results through 
 an application of the rule of Bayes. Starting with the funda- 
 mental rule as given in equation (I) in § 41, we may at times en- 
 counter somewhat simpler conditions inside the domain of 
 causes. The total comi)lex of actions may embrace a large 
 number of smaller sub-comi)lexes construed in such a way that 
 the change from one complex to another may be regarded as a 
 continuous process, so that the productive probabilities are 
 increased by an infinitely small quantity from a certain lower 
 limit, a, to an upper limit, b. Denoting such continuously in- 
 creasing probabilities by i' and the corresponding small proba- 
 bilities of existence by vdv, we have as the total probability of 
 obtaining E from any one of the minor complexes with a pro- 
 ductive probability between a and ^ (a ^ a, ^ ^ b) 
 
 p = I uvdv. 
 
 •'a 
 
 The probability that when E has happened it originated from 
 one of those minor complexes, or the probability of existence of 
 some one of those complexes is: 
 
 I uvdv 
 
 p _ ^/a 
 
 uvdv 
 
 i:
 
 68 
 
 PROBABILITY A POSTERIORI. 
 
 47 
 
 The situation may be still more simplified by the following con- 
 siderations. In the continuous total complex between the limits 
 a and h we have altogether situated {h — a)ldv individual minor 
 complexes. If we assume all of these complexes to possess the 
 same probability of existence, we must have: 
 
 iidv 
 
 dv 
 
 b — a' 
 
 The two formulas then take on the form: 
 
 I 
 
 vdv 
 
 and 
 
 
 A still more specialized form is obtained by letting a = and 
 b = I which gives: 
 
 -f 
 
 vdv and P — 
 
 .1 
 
 7" 
 
 vdv 
 
 vdv 
 
 The above formulas may perhaps be made more intelligible 
 to the reader by a geometrical illustration. 
 
 Let the various productive ])robabirities, i\ be i)lotte(l along the 
 A" axis in a Cartesian coordinate system in the interval from a 
 to b (a < h). To any one of these probabilities say ^v there 
 corresponds a certain probability of existence, iir, represented 
 by a }' ordinate. In the same manner the next following pro- 
 ductive probability, jv+i, will have a probability of existence
 
 47] I'liOHABILITIES EXPRESSED BY INTEGRALS. 69 
 
 represented })y an ordinate ?/r+i. It is now possible to represent 
 the varions //'s by means of areas instead of line ordinates. Thus 
 the probability of existence, Ur, is in the figure represented by 
 the small shaded rectangle, with a base equal to 
 
 and an altitude of Ur, the total area being equal to AvrUr. That 
 this is so, follows from the well-known elementary theorem from 
 geometry that areas of rectangles with equal bases are directly 
 proportional to their altitudes. The sum of the difi'erent u'h is 
 thus in the figure re])resented as the sum areas of the various 
 small rectangles in the staircase shajx-d histograph. Now ac- 
 cording to our assum])tion v is a continuous function in the interval 
 from a to b. We may, therefore, divide this interval, b — a, 
 into n smaller equal intervals. Let 
 
 b — a 
 
 Vr+l — l\ = ACr = 
 
 n 
 
 be one of these smaller divisions. By choosing n sufficiently 
 large, (b — a);n or Av becomes a very small quantity and by 
 letting w approach infinity as a limiting value we have 
 
 ,. /^ — « ,. 
 
 inn u — — = hm uAv = udv. 
 
 In this case the histograph is replaced by a continuous curve and 
 udv is the probability of existence that the i)roductive probability 
 is enclosed between /" and i- + dc.^ 
 
 The probability to get E from any one of the complexes is 
 evidently given by the total area of the small rectangles, or in 
 the continuous case by means of the integral: 
 
 r 
 
 uvdv 
 
 1 A more rigorous analysis would be as follows: We plot along the abscissa 
 axis intervals of the length t so that the middle of the interval has a distance 
 from the origin equal to an integral nmltiple of e. If now e is chosen suffi- 
 ciently small, we may regard the i)robability of existence of ;/, for values of 
 the variable v between rt — \t and re + 5^ as a constant and the probability 
 that V falls between the limits re — \e and r« + |e may hence be expressed as 
 eur. When e approaches as a limiting value this expression becomes vdv. 
 See the similar discussion under frequency curves.
 
 70 PROBABILITY A POSTERIORI. [48 
 
 In the same way the probabiHty that E originated from any 
 of the complexes between a and /3 is: 
 
 r 
 
 IV dv 
 
 f. 
 
 b 
 
 uvdv 
 
 The special case a = and 6=1 needs no further commentary. 
 We are now in a position to consider the examples of Bing and 
 Kroman, Any student familiar with multiple integration will 
 find no difficulty in the following analysis. For the benefit of 
 readers to whom the evaluation of the various integrals may seem 
 somewhat difficult, we may refer to the addenda at the close of 
 this treatise or to any standard treatise on the calculus as, for 
 instance, Williamson's " Integral Calculus." 
 
 48. Example 24. — An urn contains a very large number of 
 similarly shaped balls. In 10 successive drawings (with replace- 
 ments) we have obtained 7 with the number 1, 2 with the number 
 2, and one having the number 3. Wliat is the probability to 
 obtain a ball with another number in the following drawing? 
 
 We must here distinguish between 4 kinds of l)alls, namely 
 balls marked 1, 2, 3, or " other balls." A general scheme of 
 distribution of the balls in the urn may be given through the 
 following scheme: 
 
 nx balls marked with the number 1, 
 
 ny 
 
 o 
 
 nz " " " " " 3 and 
 
 nt = n{\ — X — y — z) other balls. 
 
 Here .r, y, z and / represent the respective productive probabil- 
 ities. If we now let all such probabilities assume all possible 
 values between and 1 with intervals of 1/??, we obtain the pos- 
 sible conditions in the total complex of actions. Each of these 
 conditions has a probability of existence, .v, and the ])roductive 
 probabilities x, y, z, and 1 — .r — // — z. The original probability 
 for 7 ones, 2 twos and 1 three in 10 drawings is: 
 
 10'
 
 48] EXAMPLE 24. 71 
 
 Now when n is a very l;ir<fe iiuinher the interval ]ln becomes a 
 very small quantity, and we may approximately write: 
 
 s = udxdydz, 
 
 and also write the alcove sum as a triple integral: 
 
 10! f C'' n 
 ^ ^ 7i->ii ' u ■ x'' ■ y~ - z ■ dx ■ dy • dz, 
 
 where 
 
 p = 1 — .1- and q — \ — x — y. 
 
 If now the above event has happened, then the probal)ility to get 
 a different marked ball in the 11th drawing is: 
 
 ml 
 u ' .1-^ • y- • 2(1 — X — y — z) ■ dx • dy • dz 
 
 H = 
 
 I I I '' ■ •^■" • i/' ' 2 • dx ■ dy ■ dz 
 
 It is, however, quite impossible to evaluate the above integral 
 without knowing the form of the function ?/; but unfortunately 
 our information at hand tells us absolutely nothing in regard to 
 this. Perhaps the balls bear the numbers 1, 2 and 3 only, or 
 perhaps there is an equal distribution up to 10, ()()() or any other 
 number. Our information is reall\ so insufficient that it is quite 
 hopeless to atteni])t a calculation of the a posteriori probability. 
 
 IMany adherents of the inverse probability method venture, 
 however, boldly forth with tlie following solution based upon the 
 perfectly arbitrary hypothesis that all the ?/'s are of equal magni- 
 tude. This gives the special integral: 
 
 m.r" • y- • 2(1 — X — y — z), dx - dy • dz 
 — - 
 
 v'O tVo «^0 
 
 y~ ' z ■ dx • dy • dz 
 
 where once more it must be remembered that 
 
 x+ y+z^l. 
 
 In this case the limits of x are and 1, those of y are and I — x 
 and those of 2 are and 1 — .r — y. 
 
 This is a well-known form of the triple integral which may be 
 evaluated bv means of Dirichlet's Theorem:
 
 72 PROBABILITY A POSTERIORI. [49 
 
 ^1 ^i-x ^.-.-j, r(6)r(//or(») 
 
 Jo Jo Jo r(H-6+m+^0 
 
 (See Williamson's Calculus.) 
 
 Remembering the well-known relation between gamma func- 
 tions and factorials, viz. T(n + 1) = ??!, we find by a mere 
 substitution in the integral, the value of the probability in 
 question to be 1:14. Another and cquidly plausible result is 
 obtained by a slightly different wording of the problem. 
 
 Ten successive drawings have resulted in balls marked 1, 2, 
 or 3. What is the probability to obtain a ball not bearing such 
 a number in the 11th drawing? This probability is given by 
 the formula. 
 
 r v'%1 - v)dv 
 ^ = 1 • 1*^ 
 
 Jo 
 
 Quite a different result from the one given above. 
 
 49. Example 25 — Bing's Paradox. — A still more astonishing 
 paradox is produced by Bing when he gi^•es an example of Bayes's 
 Rule to a problem from mortality statistics. A mortality table 
 gives the ratio of the number of persons living during a certain 
 period, to the number living at the beginning of this period, 
 all persons being of the same age. By recording the deaths 
 during the specified period (say one year) it has been ascertained 
 that of .v persons, say forty years of age at the beginning of the 
 period, m have died during the period. The observed ratio is 
 then (.9 — m)ls. If s is a very large number this ratio may (as 
 we shall have occasion to prove at a later stage) be taken as an 
 approximation of the true ratio of probability of sur\-ival during 
 the period. If .v is not sufficiently large the belie\'(M's in the inverse 
 theory ought to be able to evaluate this ratio by an ap])lication 
 of Bayes's Rule, by means of an analysis similar to the one as 
 follows: 
 
 Let // be the general symbol for the probability of a forty- 
 year-old ])(Tson being alive one year from hence. Eadi of such 
 persons will in general be subject to different conditions, and the 
 general symbol, y, will therefore have to be understood as the
 
 49] EXAMPLE 25. bing's paradox. 73 
 
 symbol for all the ])()ssil)l(' productive probability values changing 
 from to 1 by a continuous process. 
 
 Assuming s a very large number each condition will have a 
 probability of existence equal to udy. We may now ask: What 
 is the probability that the rate of survival of a group of s persons 
 aged 40 is situated between tiie limits a and /3? 
 
 The answer according to Baves's Rule is: 
 
 I 
 
 ?/""'(! — y)"'ndy 
 
 .' ^ . (I) 
 
 7/"^(l — y)"'udy 
 
 Let us furthermore divide the whole year into two equal parts 
 and let y\ be the probability of surviving the first half year, 
 2/2 the probability of surviving the second half, and 7/1 • dy\, 
 111 • dy2 the corresponding probabilities of existence. Then the 
 respective a posteriori probabilities for ?/] and ?/2 are: 
 
 2/i«-'"'(l - yiT' uidyi 
 
 I 
 
 1 
 
 ?/i^~™>(l — yi)"''uidyi 
 
 and 
 
 j!/2^~"'(l — y'iY''ii'idy2 
 
 f 
 
 (nil + m-i — m) 
 
 ?/2*~™ (1 — y2)"'-vidy2 
 '0 
 
 (nil and m2 represent the number of deaths in the respective half 
 years.) The probability that both ?/i and y^ are true is then 
 according to the multiplication theorem: 
 
 ^/i-'-^Hl - yi)""n\dyiy-f~'"{l — y-d'^-U'^dyi 
 
 j y^^-m^ii - y{)-^Hiidyi ( 1/2*^(1 - y2Thi2dy2 
 Jo Jo 
 
 where y = y\ • y-i. 
 
 The probability that the probability of survival for a full 
 year, y, is situated between the limits a and jS is therefore: 
 
 a 
 
 "(1 - ?/i)""//2'~"'(l - 2/2)"''Mi • ?/2 • dyx ■ dlj2 
 
 I .vi^^'(l - yi)""iti(Jy\ I y-i^'d — yiT-n^dy 
 
 Jq ' ' Jo 
 
 (11)
 
 74 PROBABILITY A POSTERIORI. [49 
 
 where the Hmits in the double integral in the numerator are de- 
 termined by the relation: 
 
 Choosing the principle of insufficient reason as the basis of 
 our calculations, merely assuming that all possible events are, in 
 the absence of any grounds for inference, equally likely, the 
 various quantities cx])ressed by the general symbol, ?/, become 
 equal and constant and cancel each other in numerator and 
 denominator, which brings the a posteriori probabilities ex- 
 pressed by (I) and (II) to the forms: 
 
 £ 
 
 y^il - ijTdy 
 
 J 
 
 (III) 
 
 and 
 
 // 
 
 yr'^'H - y,ryr'"'-'"^-il - y2)'^dy, ■ dy^ 
 
 f ^I'-'-Ki - yiY"' \ y-f-^0^ - y2)'""-dy,-dy^ 
 
 Jq *Jo 
 
 (IV) 
 
 where the limits in the numerator in the latter expression are 
 determined by the relation : a < y\y2 < |8. 
 Letting 
 
 y 
 
 yi = ~~ 
 
 and then 
 
 1 — 2/1 = 2(1 - y) 
 
 this latter expression may after a simple substitution be brought 
 to the form: 
 
 fyr-'Hl - yirdyrj y.'-^il - y.rdy^ 
 
 (V) 
 
 (See appendix.) 
 
 ]\Ir. Bing now puts the further question : A\Tiat is the probability 
 that a new person forty years of age, entering the original large
 
 49] EXAMPLE 25. bing's paradox. 75 
 
 group of s persons, will survive one year, when we assume 
 mi = ni'i = 0? (Ill) gives the answer: 
 
 X 
 
 
 Formula (V), on the other hand, gives us 
 
 I in" (hi I I U2(hj2 
 
 ^= (::;)• 
 
 As the above analysis is perfeetly general, we might equally 
 well have ap])lied it to each of the semi-annual periods, which 
 would give us an a posteriori probability of survival equal to 
 
 I I ., ) for each half year, or a compound probability of 
 
 I I .^ I for the whole year. Extending this process it is 
 
 easily seen that by dividing the year into parts, we shall have 
 
 (5 4- 1 \ " 
 -3^-^ I as the final probability a posteriori that a forty-year- 
 old person will reach the age of forty-one. By letting n increase 
 indefinitely the above quantity a])])roac]ies as its limiting 
 value and we obtain thus the ])arad()x of Bing: 
 
 If, among a large group of s cqudlh/ old prrsoih'i, we have observed 
 no deaths during a full calendar year then another person of the 
 same age outside the group is sure to die inside the calendar year. 
 
 This is evidently a very strange result, and yet, working on 
 the basis of the principle of insufficient reason, the mathematical 
 deductions and formula exhibit no errors. 
 
 Mr. Bing disposes of the whole matter by simply denying the 
 validity and existence of a ])()steriori probabilities. Dr. Kroman 
 on the other hand defends Bayes's Rule. " Mathematics," 
 Kroman says, " is — as Huxley has justly remarked — an ex- 
 ceedingly fine mill stone, but one must not expect to get wheat 
 flour after having put oats in the quern." According to the
 
 7G I'ROHABILITY A POSTERIORI. [50 
 
 Danish scliolar the paradox is due to tlie use of a wrong formula. 
 We ought to have used the general formula (11) instead of formula 
 (V) whieh is a special case. In the general formula we encounter 
 the functions ?/, denoting the probability existence of the various 
 productix^e probabilities y. As we do not know anything about 
 this function // it is lu)peless to attempt a calculation. This 
 brings the criticism down to the fundamental question whether 
 we shall build the theory of probabilities on the principle of 
 " cogent reason " or the principle of " insufficient reason." 
 
 50. Conclusion. — Contradictory results of a similar kind to 
 the ones given above have led several eminent mathematicians 
 to a complete denunciation of the laws underlying a posteriori 
 probabilities. Professor Chrystal, especially, becomes extremely 
 severe in his criticism in the previously mentioned address before 
 the Actuarial Society of Edinburgh. He advises " practical 
 people like the actuaries, much though the}' may justly respect 
 Laplace, not to air his weaknesses in their annual examinations. 
 The indiscretions of a great man should be quietly allowed to be 
 forgotten." Although one may heartily agree with Professor 
 Chrystal's candid attack on the belief in authority, too often 
 prevailing among mathematical students, I think — aside from 
 the fact that the rule was originally given by Bayes — that the 
 great F'rench savant has been accused unjustly as the following 
 remarks perhaps may tend to show. 
 
 In our statement of Bayes's Rule, we followed an exact mathe- 
 matical method, and the final formula (I) is theoretically as 
 correct as any previously demonstrated in this work. The 
 customary definition of a mathematical probability as the 
 ratio of equally favorable to coordinated possible cases, is not 
 done away with in this new kind of probabilities; the former are 
 found in the numerator and the latter in the denominator; and 
 if we take care that each of the particular formulas, with its 
 definite requirements, is applied to its particular case, we do not 
 go beyond ])ure mathematics or logic. But are we able to get 
 complete and exact information about these re({uircnients? In 
 the example of the tossing of a coin with two heads, this informa- 
 tion was at hand. Here we were able to enumerate exactly the 
 difi'erent mutually exclusive causes from which the observed
 
 50] CONCLUSION. 77 
 
 event originated. We were also able to determine the exact 
 quantitative measures for the probabilities, k, that these com- 
 plexes existed as well as the different productive pr()l)id)ilities, a;. 
 Here the most rigid requirements could be satisfied, and the rule 
 gave therefore a true answer. 
 
 In the other examples we encountered a different state of 
 affairs. Here we were not able to enumerate directly the dif- 
 ferent complexes of causes from which the event originated, but 
 were forced to form different and arbitrary hyi)()tlieses about the 
 complexes of origin, /', and each hypothesis gave, in general, a 
 different result. Furthermore, we assumed a priori that the 
 different probabilities of the actual existence of the complexes 
 were all equal in magnitude, and it was, therefore, the special 
 formula (H) we emi)l()yed in the determination of the a posteriori 
 probabilities. In this formula, the different ks do not enter at 
 all as a determining factor; only the i)roductive probabilities, co, 
 are considered. The assumption that all the k's are equal in 
 magnitude is based upon the principle of insufficient reason, or 
 as Boole calls it, " the equal distribution of ignorance." 
 
 The principle of equal distribution of ignorance makes in the 
 case of continuously varying productive probabilities, v, the 
 function, //, of the proba])ilities of existence of the various 
 complexes equal to a constant quantity. In other words, the 
 curve in Fig. 1, is replaced by a straight line of the form, it — k. 
 Now, as a matter of fact, we possess in most cases, some partial 
 knowledge of the complexes of action producing the event in 
 question. This partial knowledge — although far from complete 
 enough to make a rigorous use of formula (I) — is nevertheless 
 sufficient to justify us in discarding completely any general 
 hypothesis assuming such simple conditions as above. Such 
 partial knowledge is, for instance, found in the Paradox of Bing. 
 Here the rather absurd hypothesis was made that the possible 
 values of the probability of surviving a certain period were 
 equally probable. In other words, it is equally probable that 
 there will die 0, 1, 2, • • -, or s persons in the particular period. 
 " Common sense, however, tells us that it is far more probable 
 that, for instance, 90 jicr cent, of a large number of forty-year-old 
 persons will survive the i)eriod than no one or every one will die
 
 78 PROBABILITY A POSTERIORI. [50 
 
 in the same period " (Kromaii). The indiscreet use of formula 
 (II) therefore naturally leads to paradoxical results. On the 
 other hand, the fallacy of the luippy-j:;o-lucky computers, em- 
 ploying the special case (II) of Bayes's Rule, as well as the critics 
 of Laplace, lies in their failure to make a proper distinction 
 between " equal distribution of ignorance " and " partial cogent 
 reason," which latter expression properly may be termed " an 
 unequal distribution of ignorance." If, despite the actual 
 presence of such unequal distribution of ignorance, we still insist 
 in using the special formula (II), which is only to be used in the 
 case of an equal distribution of ignorance, it is no wonder we 
 encounter ambiguous answers. Not the rule itself, its discoverer, 
 or Laplace, but the indiscreet computer is the one to blame. 
 Messrs. Bing, Venn and Chrystal, in their various criticisms, have 
 filled the quern with some rather "wild oats" and expected to 
 get wheat flour; and that one of those critics in his disappoint- 
 ment in not getting the expected flour should blame Laplace, is 
 hardly just. 
 
 So much for the principle of " equal distribution of igno- 
 rance." It may be of interest to see how matters tm-n out when 
 we like von Kries insist upon the principle of " cogent reason " 
 as the true basis of our computations. The reader will quite 
 readily see that a rigorous application of the Rule of Bayes in its 
 most general form as given by fornuila (I) really tacitly assumes 
 this very principle. In formula (I), we require not alone an 
 exact enumeration of the various complexes from which the 
 observed event may originate, but also an exact and complete 
 information about the structure of such complexes in order to 
 evaluate their various probabilities of existence. If such informa- 
 tion is present, we can meet even the most stringent requirements 
 of the general formula, and we will get a correct answer. But 
 in the vast majority of cases, not to say all cases, such information 
 is not at hand, and any attempt to make a computation l)y means 
 of Bayes's Rule must be regarded as hopeless. We may, how- 
 ever, again remark that very seldom we are in complete ignorance 
 of the conditions of the complexes, w^hich is the same thing as 
 saying that we are not in a position to employ the principle of 
 equal distribution of ignorance in a rigorous manner. From
 
 50] CONCLUSION. 79 
 
 other experiments on tlie same kind of event, or from other 
 sources, we may have attained some partial information, even if 
 insufficient to employ the ])rinciple of cogent reason. Is such 
 information now to be comi)letely ignored in an attempt to give 
 a reasonable, although approximate answer? It is but natural 
 that the mathematician should attempt to obtain as much of 
 such information as possible and use it in the evaluation of the 
 various probabilities of existence. Thus for instance, if, in the 
 Paradox of Bing, we had observed that the probability of survival 
 for a forty-year-old person never had been below .75 and never 
 above .95, it would be but reasonable to substitute those limits 
 in their proper integrals in order to attain an approximate answer. 
 To illustrate this somewhat subjective determination of an a 
 posteriori probability, we take another example from the memoirs 
 of Bing and Kronian. 
 
 Example (24)- — A merchant receives a cargo of 100,000 pieces 
 of fruit. If every single fruit is untainted, the value of the cargo 
 may be put at 10,000 Kroner. On the other hand, any part of 
 the cargo more or less tainted is considered worthless. The 
 merchant lias never before received a similar cargo and does not 
 know how the fruit has been afl'ected by travel. As samples, he 
 has selected 30 pieces picked at random from the cargo and all 
 samples proved to be fresh. lie asks a mathematician what 
 value he can put on the cargo. 
 
 If the mathematician uses the special formula (II), assum- 
 ing an equal distribution of ignorance, therefore assuming that 
 it is equally probable that for example none, 5,000 or all the 
 individual pieces of fruit were untainted, the answer is: 
 
 10,000 ^^ = 9687.5 Kroner. 
 
 ( f^¥v 
 
 Jo 
 If we use the trne rule, the a posteriori probability of the whole- 
 someness of the cargo is given by the integral: 
 
 >i 
 
 i 
 
 1
 
 80 PROBABILITY A POSTERIORI. [50 
 
 where v is tlie general expression for a possible probability of 
 wholesomeness between and 1 and udv the corresponding proba- 
 bility of existence. Now if the mathematician has no complete 
 information as to this particular function, ?/, it would be foolish 
 if him to attempt a calculation, since the hy])othcsis of an equal 
 probability of existence for all possible values of v evidently 
 gives an arbitrary and perhaps a very erroneous result. On 
 the other hand, the computer may possibly have access to some 
 partial information. Perhaps tlie merchant has rec(>ived fruit 
 of a similar kind or heard about cargoes of this particular kind 
 of fruit received by other dealers. If now the merchant were 
 able to inform the computer that in a great number of similar 
 cases the probability of wholesomeness had been between 0.9 
 and 1 with an approximately even distribution, while it never 
 had been below 0.9, then nothing would hinder the mathematician 
 to present the following comi)utation: 
 
 ^''' . = 0.9726 
 
 
 v^^le 
 
 and tell the merchant that on the basis of the information given 
 9,726 Kroner would be a fair price for the cargo. 
 
 This is really the point of view taken by the English mathe- 
 matician. Professor Karl Pearson, one of the ablest writers on 
 mathematical statistics of the present time, when he says: "I 
 start, as most writers on mathematics liave done, with ' the 
 equal distribution of ignorance ' or I assume the truth of Bayes's 
 Theorem. I hold tliis theorem not as rigidly demonstrated, but 
 I think with Edgeworth that the hypothesis of the equal dis- 
 tribution of ignorance is, within the limits of ])ractical life, 
 justified l)y our experience of statistical ratios, which are unknown, 
 i. e., such ratios do not tend to cluster markedly round any 
 ])articuhu- j)()iiit." 
 
 To sum up the above remarks: Theoretically Bayes's Rule is 
 true. If we are able to enumerate and determine the probabilities 
 of existence of the complexes of origin it will also give true 
 results in practice. If we are justified in assuming the principle
 
 50 ] CONCLUSION. 81 
 
 of " insufficicMit reason " or " eciual distrihiition of i<^iioraiice " 
 as the basis for our calculations, formula (II) may be employed 
 with exact results after a rigid enumeration of the complexes. 
 If the principle of " cogent reason " is required as the basis, an 
 exact computation is in general hopeless, and we can only after 
 having obtained ])artial subjecti\'c infoimation give an approxi- 
 mate answer. 
 
 Witli these remarks we shall conclude the elementar\' dis- 
 cussion of the merely theoretical ])art of the subject, '^riie follow- 
 ing chapters require in most cases a knowledge of the infinitesimal 
 calculus, and many of the questions discussed above will appear 
 in a new and instructive light by this treatment.
 
 CHAPTER VII. 
 
 THE LAW OF LARGE NUMBERS. 
 
 51. A Priori and Empirical Probabilities. — In the previous 
 
 chapters we limited ourselves to the discussion of such mathe- 
 matical probabilities, where we, a priori, on account of our 
 knowledge of the various domains or complexes of actions, were 
 able to enumerate the respective favorable and unfavorable 
 possibilities associated with the occurrence or non-occurrence of 
 the event in question. " The real importance of tHe theory of 
 probability in regard to mass phenomena consists, however, 
 in determining the mathematical relations of the various proba- 
 bilities not in a deductive, but in an empirical manner — without an 
 a priori exhaustive knowledge of the mutual relations and actions 
 between cause and effect — by nteans of statistical enumeration 
 of the frequency of the observed event. The conception of a 
 probability finds its justification in the close relation between the 
 mathematical probabilities and relative frequencies as determined in 
 a purely empirical way. This relation is established by means 
 of the famous Law of Large Nimibers " (A. A. Tschuprow). 
 
 To return to our original definition of a mathematical proba- 
 bility as the ratio of the favorable to the coordinated equally 
 possible cases, we first notice that this definition is wholly 
 arbitrary like many mathematical definitions. The contention 
 of Stuart Mill that every definition contains an axiom is rather 
 far stretched. In mathematics a definition does not necessarily 
 need to be metaphysical. A striking example is offered in 
 mechanics by the definitions of force as given by Lagrange and 
 Kirchhoff. What is force? " Force," Lagrange says, " is a 
 cause which tends to produce motion." Kirchhoff on the other 
 hand tells us that force is the product of mass and acceleration. 
 Lagrange's definition is wholly metaphysical. Whenever a 
 definition is to be of use in a purely exact science such as mathe- 
 matics, it must teach us how to measure the particular phe- 
 nomena which we are investigating. Thus, to quote Poincare, 
 
 82
 
 51 ] A PRIORI AND EMPIRICAL PROBABILITIES. 83 
 
 " it is not necessary that the definition tells us what force really 
 is, whether it is a cause or the effect of motion." 
 
 An analogous case is offered in the criticism of a mathematical 
 probability as defined by Laplace, and the attempts to place 
 the whole theory of probabilities on a purely empirical basis by 
 Stuart Mill, Venn and Chrystal. These writers contend " that 
 probability is not an attribute of any particular event happening 
 on any particular occasion. Unless an event can happen, or 
 be conceived to happen a great many times, there is no sense in 
 speaking of its probability." The whole attack is directed against 
 the defiuition of a mathematical probability in a simjle trial 
 which definition, evidently by the empiricists, is regarded as 
 having no sense. The word " sense " must evidently be con- 
 sidered as having a purely metaphysical meaning. In the same 
 manner Kirchhoff' s definition might be dismissed as having no 
 sense, since it would seem as difficult to conceive force as a purely 
 mathematical i)roduct of two factors, mass and acceleration, as 
 it is to conceive the definition of a mathematical probability 
 as a ratio. 
 
 The metaphysical trend of thought of the above writers is 
 shown in their various definitions of the probability of an event. 
 Mill defines it merely as the relative frequency of happenings 
 inside a large number of trials, and Venn gives a similar defini- 
 tion, while Chrystal gives the following: 
 
 " If, on taking any very large number N out of a series of cases 
 in which an event, E, is in question, E happens on piV occasions, 
 the probability of the event, E, is said to be p." 
 
 Let us, for a moment, look more closely into these statements. 
 Any definition, if it bears its name rightly, must mean the same 
 to all persons. Now, as a matter of fact, the vagueness in a 
 half metaphorical term like " any very large number " illustrates 
 its weakness. The question immediately confronts us " what is 
 a very large number? " Is it 100, 1,000 or perhaps 1,000,000? 
 
 A fixed universal standard for the value of N seems out of the 
 question and the definition — although perhaps readily grasped 
 in a " general way " — can hardly be said to be happily chosen. 
 
 Another, and perfectly rigorous definition, is the following one 
 given by the Danish astronomer and actuary, T. N. Thiele.
 
 84 THE LAW OF LARGE NUMBERS. [51 
 
 Thiele tells us that " common usage " has assifj;ned the word 
 probability as the name "for the limiting value of the relative 
 frequency of an event, when the number of observations (trials), 
 under which the event happens, ai)j)r()ach infinity as a limit." 
 A similar definition is later on given by the American actuary 
 R. Henderson, who says: " The numerical measure which has been 
 universally adopted for the probability of an event under given 
 circumstances is the idtimate value, as the number of cases is 
 indefinitely increased, of the ratio of the number of times the 
 event happens under those circumstances to the total possible 
 number of times." There is nothing ambiguous or vague in these 
 definitions. Infinity, taken in a purely quantitative sense, has a 
 perfectly uniform meaning in mathematics. The new definition 
 differs, however, radically from our customary definition of a 
 mathematical a priori probability. We cannot, therefore, agree 
 with Mr. Henderson when he continues " the measure there given 
 has been universally adopted and this holds true in spite of the 
 fact that the rule has been stated in ways which on their face differ 
 widely from that above given. The one most commonly given 
 is that if an event can happen in a ways and fail in b ways all of 
 w'hich are equally likely, the probability of the event is the ratio 
 of a to the sum of a and b. It is readily seen that if we read 
 into this statement the meaning of the words " equally likely," this 
 measure, so far as it goes, reduces to a particular case of that given 
 above." 
 
 In order to investigate this statement somewhat more closely, 
 let us try to measure the probability of throwing head with an 
 ordinary coin by both our old definition of a mathematical 
 probability and the definition by Mr. Henderson of what we 
 shall term an empirical probability. Denoting the first kind of 
 probability by P{E) and the second by P'{E) we have in ordinary 
 symbols 
 
 PiE) = h 
 
 P\E) = lim F(E, v) 
 
 11=00 
 
 where the symbol F{E, v) denotes the relative frequency of the 
 event, E, in v total trials. No a priori knowledge will tell us 
 offhand if P'{E) will approach | as its ultimate value. The
 
 52] EXTENT AND USAGE OF BOTH METHODS. 85 
 
 two methods are radically different. By the first method tlie 
 determination of the mimerical measure of a probability depends 
 simi)ly on our ability to jud^n' and sej^rej^ate the equally possi})le 
 eases into cases faAorable and unfavorable to the event E. By 
 tlie second method the (h^termination of the ])rol)ability de])ends, 
 not alone on the segref:;ation and consequent enumeration of the 
 favorable from the total cases, but chiefly on the extent of our 
 observations or trials on the event in question. 
 
 52. Extent and Usage of Both Methods. — Before enterin}^ into 
 a more detailed discussion of the actual quantitative comparison 
 of the two methods, it mi<2;ht be of use to compare their various 
 extent of usai^e. In this res])ect the empirical method is vastly 
 sui)erior to the a ])riori. A rigorous aj)plication of the a ])riori 
 method, as far as concrete problems <i;o, is limited to sim])le 
 games of chance. As soon as we begin to tackle sociological or 
 economical ])ra(tical problems it leaves us in a helpless state. 
 If we were to ask about the probability that a certain i)erson 
 forty years of age would die inside a year, it would be of little use 
 to try to determine this in an a priori maimer. Even>a ])urely 
 deductive process, as illustrated by Bayes's Rule in the earlier 
 chapters, leads to ])arad()xical residts. Our a ])riori knowledge 
 of the complexes of causes governing death or snrvi\al is so 
 incomplete that even a qualitative — not to speak of a quanti- 
 tative — judgment is out of the (piestion. The empiricid n:iethod 
 shows us at least a way to obtain a measure for the probability 
 of the event in (juestion. By observing during a period of a yejir 
 an infinite number of forty-year-old ])ersons of wliom, after an 
 exhaustive qualitative in\'estigation, we are led to believe that 
 their present conditions as far as health, social occupation, en- 
 vironments, etc., are concerned are equally similar, we may by 
 an enumeration of those who died during the year obtain the 
 desired ratio as defined V)y P'{E). Of course, ol)servation 
 an infinite number is ])ractically impossible. An approximate 
 ratio may be formed by taking a finite, but a large, number 
 of cases under observation. But how large a number? This 
 very question leads straightforward to another problem, namely 
 the quantitative determination of the range of variance between 
 the approximate ratio and the ideal ultimate ratio as defined bv
 
 86 THE LAW OF LARGE NUMBERS. [52 
 
 the relation 
 
 P'iE) = lim FiE, v). 
 
 Since it is impossible to make an infinite number of observations 
 we cannot find the exact vahie of the ran<i;e of such variations. 
 But we may, however, determine the ])robabihty tliat this range 
 does not exceed a certain fixed quantity, say X, in absohite mag- 
 nitude. Stated in compact form our problem reduces to the 
 following form: To determine the probability of the existence 
 of the following inequality: 
 
 lim F(E, v) --\^\ 
 
 where both a and s are finite numbers. This, to a certain extent, 
 contains in a nut shell some of the most important problems in 
 probabilities. 
 
 The above problem may be solved in two distinct ways. The 
 first, and perhaps the most logical way, is by a direct process. 
 This is the method followed by T. N. Thiele in his " Almindelig 
 lagttagelseslajre,"^ published in Copenhagen, 1889, a most 
 original work, which moves along wholly novel lines. Thiele 
 distinguishes between (1) Actual observation series as recorded 
 from observation, in other words statistical data. (2) Theoret- 
 ical observation series giving the conclusions as to the outcome of 
 future observations and (3) Methodical laws of series where the 
 number of observations is increased indefinitely. By such a 
 process, purely a theory of observations, the whole theory of 
 probability becomes of secondary importance and rests wholly 
 upon the theory of observed series, a fact thoroughly- emphasized 
 by Thiele himself. When the author first, in the closing chapters 
 of his book, makes use of the word probability it is only because 
 " common usage " has assigned this word as the iiaine for the 
 ultimate frequency ratio designated by our symbol lim F{E, v). 
 
 f=00 
 
 The prol)lem may, however, be solved in an indirect way, 
 which is the one T shall adopt. This method, as first consistently 
 deduced by Laplace, has for its basis our original definition of a 
 mathematical a priori prol)abilitN' and may be briefly sketched as 
 follows: We first of all postulate the existence of an a priori 
 
 1 English edition, "Theory of Observations," London, 1905.
 
 53] AVERAGE A PRIORI PROBABILITIES. 87 
 
 probability as defined, altli()u<i;h its actual determination, by a 
 priori knowledj^e, is impossible except in a few cases, as, for 
 instance, simple S'ln^e^ of chance, drawing balls from urns, etc. 
 Denoting such a probability by P(E), or p, we next ask, What will 
 be the expected number, say a, of actual happenings of the event, 
 E, expressed in terms of .s- and i^, when we make s consecutive 
 trials instead of a single trial, and what will be the number of 
 happenings of E when s approaches infinity as its ultimate value? 
 If such a relation is found between j:), a and s, where /-> is the 
 unknown quantity, we have also found a means of determining 
 the value of /; in known quantities. Our next question is — 
 What is the probability that the absolute value of the difference 
 between p and the relative frequency of the event as expressed 
 by the ratio of a to s does not exceed a previously assigned 
 quantity? Or the probability that 
 
 a 
 
 X? 
 
 Now, as the reader will see later, we shall prove that 
 lim F{E, v) = P{E) = p. 
 
 v=x> 
 
 It must, however, be remembered that this result is reached by a 
 mathematical deduction, based upon the postulate of mathe- 
 matical probabilities, and not in the manner as suggested in the 
 above statement by Mr. Henderson. 
 
 It is only after having established such purely quantitative 
 relations that we are entitled to extend the laws of mathematical 
 probabilities as deduced in the earlier chapters to other problems 
 than the simple problems of games of chance. 
 
 53. Average a Priori Probabilities. — In the previous para- 
 graphs of this chapter, another important matter is to be noted, 
 namely the assumption that the complex of causes producing 
 the event in question remains constant during the repeated 
 trials (observations), or, stated in other words the mathematical 
 a priori probability remains constant. Under this limitation 
 the extension of the laws of mathematical probabilities would 
 have but a very limited practical application. In all statistical 
 mass phenomena such an ideal state of affairs is rather a verv
 
 88 THE LAW OF LARGE NUMBERS. [54 
 
 rare exception. If we consider an ordinary mortality investiga- 
 tion we know with absolute certainty that no two persons are 
 identically alike as far as health, occupation, environment and 
 numerous other thin(i;s are concerned. Thus the postulated 
 mathematical probability for death or survival during a whole 
 calendar year will in general be different for each person. We 
 may, however, conceive an average probability of survival for a 
 full year defined by the relation 
 
 V\ + /J2 + 7^3 + • • • Ps 2/? 
 ^' = . ~ = T' 
 
 where p\, p-i, pz, • • • are the postulated probabilities of each 
 individual u!ider observation. Our task is now to find: 
 
 1. An algebraic relation between the average probability as 
 defined above, the absolute frequency a and the total number of 
 observations (trials) s, 
 
 2. The same relation when s approaches a as its ultimate value, 
 
 3. The probability of the existence of the inequality, 
 
 a 
 
 ^X, 
 
 where a denotes the absolute frequency- of the occurrence of the 
 event, s the total number of observations (trials) and X an ar- 
 bitrary constant. 
 
 54. The Theory of Dispersion. — As we mentioned before the 
 empirical ratio ajs represents only an ap})roximation of the ideal 
 ultimate value of lim F{E, r). If we now make a series of 
 
 observations (trials) on the occurrence of a certain event E, such 
 that instead of a single set of observations of s individual ob- 
 servations we take A^ such sets, we shall have A^ relative frequency 
 ratios : 
 
 ,f ' ,V ' ,9 ' S ' 
 
 Since the ratios are a])])r()ximations only of the ultimate ratio 
 they will in general exhibit discrepancies as to tlieir numerical 
 values and may be regarded as A"^ different empirical a]))>roxima- 
 tions. "^I'he question now arises how these various empirical 
 ratios group themselves around the value of lim F{E, r). The dis-
 
 55] HISTORICAL DEVELOPMENT OF LAW OF LARGE NUMBERS. 89 
 
 tribution of the empirical ratios around the ultimate ratio is by- 
 Lexis called " dispersion." 
 
 55. Historical Development of the Law of Large Numbers. — 
 
 The first mathematician to investigate the problems we have 
 roughly outlined in the previous paragraphs was the renowned 
 Jacob Bernoulli in the classic, " Ars Conjectandi," which rightly 
 may be classified as one of the most important contributions on 
 the subject. Bernoulli's researches culminate in the theorem 
 which bears his name and forms the corner-stone of modern 
 mathematical statistics. That Bernoulli fully realized the great 
 practical importance of these investigations is proven by the 
 heading of the fourth part of his book which runs as follows: 
 " Artis Conjectandi Pars Quarta, tradens usum et applicationem 
 praecedentis doctrinae in civilibus et oeconomicis." It is also 
 here that we first encounter the terms " a priori " and " a pos- 
 teriori " probabilities. Bernoulli's researches were limited to 
 such cases where the a priori probabilities remained constant 
 during the series or the whole sets of series of observations. 
 Poisson, a French mathematician, treated later in a series of 
 memoirs the more general case where the a priori probabilities 
 varied with each individual trial. He also introduced the technical 
 term, " Law of Large Numbers " (" Loi des Grand Xombres "). 
 Finally Lexis through the publication in 1877 of his brochure, 
 " Zur Theorie der ]Massenerscheinungen der menschlichen Gesell- 
 schaft," treated the dispersion theory and forged the closing 
 link of the chain connecting the theory of a priori probabilities 
 and empirical frequency ratios. Of late years the Russian mathe- 
 matician, Tchebycheff, the Scandinavian statisticians, Wester- 
 gaard and Charlier, and the Italian scholar, Pizetti, have con- 
 tributed several important papers. It is on the basis of these 
 papers that the following mathematical treatment is founded. 
 In certain cases, however, we shall not attempt to enter too 
 deeply into the theory of certain definite integrals, which is 
 essential for a rigorous mathematical analysis, but which also 
 requires an extensive mathematical knowledge which many of 
 my readers, perhaps, do not possess. To readers interested in 
 the analysis of the various integrals we may refer to the original 
 works of Czuber and Charlier.
 
 CHAPTER VIII. 
 
 INTRODUCTORY FORMULAS FROM THE INFINITESIMAL 
 
 CALCULUS. 
 
 56. Special Integrals. — In the following chapters we shall 
 attempt to investigate the theory of probabilities from the stand- 
 point of the calculus. Althougli a knowledge of the elements 
 of this branch of mathematics is presupposed to be possessed 
 by the student, we shall for the sake of convenience briefly 
 review and demonstrate a few formulas from the higher analysis 
 of which we shall make frecjuent use in the following paragraphs. 
 All such formulas have been given in the elementary instruction 
 of the calculus, and only such readers who do not have this 
 particular branch of mathematics fresh in memory from their 
 school days need pay any serious attention to the first few 
 paragraphs. 
 
 57. Wallis's Expression for tt as an Infinite Product. — We wish 
 first of all to determine the value of the definite integral: 
 
 Jn = X'%in-rrf.r, (1) 
 
 under the assumption that n is a positive integral number. This 
 integral is geometrically equal to the area between the x axis, 
 the axis of y, the ordinate corresponding to the abscissa \tv and 
 the graph of the function y = sin" .r. Letting // = D^n = sin x, 
 V = sin"~^ X, we get by partial integration: 
 
 J„ = _ cos .r sin"-i .r] ^ '--\- J" '~ cos .r(/i- 1) sin"-^ x cos xdx. (2) 
 
 If we substitute the upper and lower limits in the first term on 
 the right hand side of the above expression for J„ this term 
 reduces to 0, assuming /( > 1. Thus we have: 
 
 Jn = in — D^J*"^" sin"~2.r-cos2a:c?a:. 
 90
 
 57] WALLIS'S EXPRESSION OF tt AS AN INFINITE PRODUCT. 91 
 
 Putting cos^ .T = 1 — sin^ x, we get: 
 
 Jn = (/I - l)o/"^" sin"~- xdx - {n - l)J^'^' sin" xdx. (3) 
 
 The last integral is, however, equal to J„ and the first integral 
 is, following the notation from (1), equal to J„_2. We shall 
 therefore have: 
 
 Jn + in — \)Jn = (n — l)e/„_2, 
 
 or 
 
 nJn = (W - 1) Jn-2. (4) 
 
 Replacing nhy n — 1, n — 2, n — 3, • • • successively we get: 
 
 nJn = (W — 1)J„_2, 
 
 (n - l)Jn-i = {n - 2)J„_3, 
 (n — 2)J„_2 = (w — 3)J"„_4, 
 
 According as n is even or uneven we shall have one of the 
 following equations at the bottom of the recursion formula: 
 
 Jo = oj " sin° xdx = ^^ " dx = Jx, 
 or 
 
 Ji = oJ " " sin xdx = — cos a- T " = 1. (5) 
 
 If, for even values of 7i, we let n = 2m, and, for uneven values, 
 n = 2m — 1, we get finally the following recursion formulas: 
 
 2W J2m = (2 m — l)J2m-2, (2m— l)J-2m-l = (2 W — 2)J2m-Z, 
 
 {2m - 2)J2n.-2= (27w-3)J2^_4, (2m-3)J2^_3= (2m-4)J2^_5, 
 
 2J2 = l47r, 3/3=2X1. 
 
 Successive multiplication of the above equations gives us 
 finally : 
 
 ^ (2m- l)(2m-3)-»-l tt 
 ''^"^ 2m(2m-2)---2 ^2' 
 
 _ (2m - 2)(2m - 4)>--2 ^^^ 
 
 /2m-i - ^2vi- l)(2m- 3)--.3' 
 
 We may now draw some very interesting conclusions from the
 
 92 FORMULAS FROM THE INFINITESIAL^L CALCULUS. [58 
 
 above equations. Both integrals represent geometrically areas 
 bounded by the graphs of the functions: 
 
 y = sin-'" .r and y = sin-"*~^ .r respectively. 
 
 The difference of the ordinates of these graphs, namely: 
 
 (sin .r — 1) sin-"'-! x 
 
 is evidently decreasing with increasing values of the positive 
 integer n, since sin x lies between and + 1 and sin-'""^ x ap- 
 proaches the value except for certain values of .r. The larger 
 we select ?// the less is the difference of the two areas and the 
 ratio will therefore approach 1, or the expression 
 
 {2m- 2)(2m - 4)---2 ^ {2m - l)(2m - 3) •••3 _ tt 
 (2m- l)(2w - 3)---3 ' 2m(2w-2)---2 ~ 2' 
 
 Hence: 
 
 TT 
 
 lim 
 
 22.42.62...(2to- 2)2. 2m 
 
 2 ;,:r; l- • 3-^ • S^ • • • (2m - 3)2(2m - 1)2 
 Multiplying with 2--4--62- • •{2m — 2)- we get: 
 
 X ,. 2'"'-h7i[{m - 1)/]^ ,. 22'"(m/)2 
 
 L.= ^J7/2. 
 
 - = am — rTT, TTT^T^ — or i:m — 
 
 2 „,=« [{2m - 1)/]- „,^« (2,„/) M2m 
 
 This is the formula originally discovered by the English 
 mathematician, John Wallis (1616-1703), and by means of which 
 TT may be expressed as an infinite product. 
 
 58. De Moivre — Stirling's Formula. — ^We are now in a position 
 to give a demonstration of Stirhng's formula for the approximate 
 value of n! for large values of n. A. de Moivre seems to have 
 been the first to attempt this approximation. In the first edition 
 of his "Doctrine of Chances" (1718) he reaches a result, which 
 must be regarded as final, except for the determination of an 
 unknown constant factor. Stirling succeeded in completing this 
 last step in his remarkable "Methodus Differentialis" (1738). 
 In the second edition of "Doctrine of Chances" (1738) de Moivre 
 gives the complete formula with full credit to Stirling. He 
 mentions as his belief that Stirling in his final calculation possibly 
 has made use of the formula of Wallis. The demonstration by 
 the older English authors is rather lengthy and much shorter
 
 58] DE MOivRE — Stirling's formula. 93 
 
 methods have been devised by later writers. ]\I()st authors 
 make use of the Euk-riau integral of the second order by which 
 any factorial may be expressed by a gamma function: 
 
 r(/t + 1) = J^^x^e-^dx = n!. 
 
 Another method makes use of the well-known Euler's Summation 
 Formula from the calculus of finite differences. This metliod is 
 of special interest to actuarial students, who frequently use the 
 Eulerian formula in the computation of various life contingencies. 
 For the benefit of those interested in this particular method we 
 may refer to the treatises of Seliwanoff and Markhoff, two 
 Russian mathematicians.^ 
 
 The Italian mathematician, Cesaro, has, however, derived 
 the formula in a nuich simpler manner.- 
 
 Cesaro starts with the inequalities: 
 
 1 
 
 (-;r' 
 
 1+ 
 
 e < 1 + - <e '^"*"+'^ 
 
 From a well-known theorem from logarithms we have: 
 
 2 ^^^' n 2n + 1 "^ 3(2n + l)^ "^ 5(2n + 1)^ "^ ' * '' 
 
 which also may be written as follows: 
 
 ,V = (n + \) log. (|l + )J= 1 + 3(2nVl)^"'"5(2^W+- ' •' 
 
 If all the coefficients 3, 5, • • • are replaced by the number 3, 
 we obtain a geometrical series. The summation of this infinite 
 series shows that 
 
 1 < .V < 1 -I 
 
 / l\n+l/2 ,, 1 
 
 or 
 
 If we let 
 
 _ nXe"" _ {n -f l)!e"+i 
 
 'Seliwanoff, " Lehrbuch dor Differenzenrechnung," Leipzig, 1905, pages 
 59-60; Markhoff, "Differenzenrechnung," Leipzig, 1898. 
 
 '^ Cesaro, "Corso di analisa algebrica," Torino, 1884, pages 270 and 480.
 
 94 FORMULAS FROM THE INFINITESIMAL CALCULUS. [58 
 
 then 
 
 Wn+1 e 
 
 Dividing the quantities in (I) by e we have: 
 
 Un 
 
 1 <_Ii-< ^12«(« + 1)^ (II) 
 
 The exponent of e may be written as follows: 
 1 1 1 
 
 12n{n + 1) 12n 12(w + 1) * 
 
 ^Making use of this relation (II) may be written in the following 
 form: 
 
 Denoting the quantity: Un-e~^'^-" by m„', we shall have two mon- 
 otone number sequences: 
 
 Ul, U-2, UZy' ' ' ^(n, Wn+1, ' ' ' , 
 111', n-2, Ih',- ■ -Un, Un+U ' ' •• 
 
 These two sequences show some very remarkable features. 
 With increasing values of n the values of u„ decrease, or the 
 sequence is a monotone decreasing number sequence. The 
 values of ?/,/ become larger when n is increased and form there- 
 fore a monotone increasing number sequence. But any member 
 of this latter series satisfies, however, the inequality 
 
 Un' < Un. 
 
 Since both number sequences are situated in a finite interval 
 it follows from the well-known theorem of Weierstrass that they 
 both have a clustering point, i. e., a point in whose immediate 
 region an infinite number of points of the sequence are located. 
 Denoting this point of cluster by a, we have here an increasing 
 and a decreasing monotone sequence which both converge 
 towards a, or: 
 
 lim u,/ = lim tin = a- 
 
 n=oo n=oo 
 
 This relation may be illustrated by the accompanying diagram: 
 If we now let lim //„ = lim ?/„-e~^''^^" = o, then we shall have
 
 58 J DE MOiVRE — Stirling's formula. 
 
 for every finite value of 71 : 
 
 where a = i/„-c-^'i2" (() < ^ < i) 
 
 95 
 
 ^ ^i^ z 3 4 
 
 This gives us finally the following expression for /;!: 
 
 ail) 
 
 In this expression we need only determine the unknown 
 coefficient a. The formula of Wallis gives immediately: 
 
 ,. (2-4-6---2/?02 ,. 22"(n!)2 ,^ 
 
 hm ,— -" = lim — — — =^ = "V7r/2. 
 
 (2w) ! V2n n=oo (2w) ! V2? 
 
 Substituting in this latter expression the value for factorials 
 as found in (III) and neglecting the quantity: djV2n, we have 
 after a few reductions: 
 
 lim 
 
 an 
 
 = V^/o 
 
 =00 ^2'n{2n) 
 
 7r/2, or a = Vi^Tr, 
 
 from which we easily obtain De ]\Ioivre-Stirling's Formula in its 
 final form: 
 
 n\ = V27r-n"+^/2-e-". 
 
 This remarkable api^roximation formula gives even for com- 
 paratively small values of n surprisingly accurate results. Thus 
 for instance we have: 
 
 10! = 3,628,800; Wh-'''^2Q-K = 3,598,699.
 
 CHArXER TX. 
 
 LAW OF LARGE NUMBERS. MATHEMATICAL DEDUCTION. 
 
 59. Repeated Trials. — Let us consider a general domain of 
 action wherein the determining causes remain constant and 
 produce either one or the other of the opposite and mutually 
 exclusive events, E and E, with the respective a priori prob- 
 abilities p and q (q = 1 — p) in a single trial. The trial (observa- 
 tion) will, however, be repeated s times with the explicit assump- 
 tion that the outward conditions influencing the different trials 
 remain unaltered during each obser^'ati()n. The simplest ex- 
 ample of observations of this kind is offered by repeated draw^ings 
 of balls from an urn containing white and black balls only, and 
 where the ball is put back in the urn and mixed thoroughly with 
 the rest before the next drawing takes phice. We keep now a 
 record of the repetitions of the opposite events, E and E during 
 the s trials, irrespective of the order in which these two events 
 may happen. This record must necessarily be of one of the 
 following forms: 
 
 E happens s times, E times, 
 
 E " s - 1 " E 1 " 
 
 E " s-2 " E2 " 
 
 E " " Es 
 
 In Chapter IV, Example 17, we showed that the probabilities 
 of the above combinations of the two events, E and E, were 
 determined by the expansion of the binomial 
 
 (p + <])"• 
 96
 
 61] SIMPLE NUMERICAL EXAMPLES. 97 
 
 The general term 
 
 is the probabiHty P{E'^E^) that E will happen a and /•; /3 times 
 in the s total trials. Each separate term of the binomial expan- 
 sion of {p + qY, represents the probability of the happening (^f 
 the two events in the order given in the above scheme. 
 
 60. Most Probable Value.— In dealing with these various 
 terms, it has usually been the custom of the P^nglish and French 
 mathematicians as well as many German scholars to pay par- 
 ticular attention to a special term, the maximum term, which 
 generally is known as the "most probable value" or the "mode." 
 Russian and Scandinavian writers and the followers of the Lexis 
 statistical school of Germany have preferred to make another 
 quantity known as tlie "probable" or "expected value," the 
 nucleus of their investigations. Although it is our intention to 
 follow the latter method, we shall discuss first, briefly, the most 
 probable value. Two questions are then of special interest 
 to us: 
 
 (1) WTiat particular event is most probable to happen? 
 
 (2) What is the probability that an event will occur whose 
 probability does not differ from that of the most probable event 
 by more than a previously fixed quantity? 
 
 Neither of the two questions offers any particular principal 
 difficulties from a theoretical point of view. When regarding 
 the probability P{E'^E^), which we shall denote by T, as a func- 
 tion of the variable quantity, a, T evidently will reach a maximum 
 value for a certain value of a, {^ = s — a), and we need only 
 determine the greatest term in the above binomial expansion. 
 
 In order to answer the second question we have only to pick 
 out all the terms which are situated between the two fixed limits. 
 Their sum is then the probability that those two limits are not 
 exceeded. 
 
 61. Simple Numerical Examples. — ^^^len * is a comparatively 
 small number the actual expansion may be performed by simple 
 arithmetic. We shall, for the benefit of the student, give a 
 simple example of this kind. 
 
 8
 
 98 LAW OF LARGE NUMBERS. [61 
 
 A pair of dice is thrown 4 times in succession, to investigate 
 the chance of throwing (h)ublets. 
 
 In a single throw the probabiHty of getting a doublet is 
 
 V ^ r A 9 — r ) • Expanding ( 7^ + 7^ ) by means of the bi- 
 nommal theorem we get l-rl +"^17^) \r/''b\r) \c I 
 
 ~^^\r)\r) ~^ \ r ) ' ^^^^ ^^ ^^^^ above terms represents 
 
 the ])r()l)ability of the occurrence of the various combinations of 
 doublets (E) and non-doublets (E), and it is readily seen that 
 the event of getting no doublets at all, represented by the 
 
 o^^ 
 
 1-ist term ( - ) = ('•4823, has the greatest probability. In other 
 
 w trils it is the most probable event. 
 
 Let us next repeat the trial 12 times instead of 4. The 13 
 possible probabilities for the various combinations of doublets 
 and non-doublets will then be expressed by the respective terms 
 in the expression 
 
 {I^IT- 
 
 The 13 members have as their common denominator the quantity 
 2,176,782,336 and as numerators the following quantities: 1, 60, 
 1,650, 27,500, 309,375, 2,475,000, 14,4:57,500, 61,875,000, 193,- 
 359,375, 429,687,500, 644,531,250, 585,937,500, 244,140,625, 
 which now shows that the most probable combination is the one 
 of 2 doublets and 10 non-doublets, having a numerical value 
 equal to .2961. 
 
 A further comparison will show that the most probable 
 event in the second series had the pr<)l)al)inty .29()1, whereas 
 .4823 was its value in the first series. In other words, the prob- 
 ability decreases when the trials (observations) are increased. 
 This is due to the fact that the total number of possible cases 
 becomes large with the increase of experiments. 
 
 Another question which presents itself, in this connection, 
 is the following: What is the probability that an event will occur
 
 62] VALUE IN A SERIES OF REPEATED TRIALS. 99 
 
 whose probability does not differ from the most probable value by 
 more than a previously fixed quantity? Let us suppose we were 
 asked to determine tlie probability that a doublet does not occur 
 oftener than 5 times and not less than 1 time in 12 trials. This 
 probability is found by adding the numerical values of the prob- 
 abilities as given in the binomial expansion from the term 
 
 containing i> = tt to the power 6, to p to the first power or 
 
 14,437,500 + 0,187,500 + 193,359,375 + 429,087,500 
 
 + 044,531,250 + 585,937,500 
 
 2,170,782,336 
 
 62. The Most Probable Value in a Series of Repeated Trials. 
 — In the examples just given we determined the probability for 
 the happening of the most probable event in a series of s observa- 
 tions by a direct expansion of the binomial (p -\-q )*. This 
 may be done whenever s is a comparatively small number. But, 
 when s takes on large values, this method becomes impracticable, 
 not to say impossible. Suppose that s — 1,400, then the actual 
 straightforward expansion {p + r/)^"*"'^ would require a tremen- 
 dous work of calculation which no practical computer would be 
 willing to undertake. We must therefore in some way or other 
 seek a method of approximation by which this labor of calcula- 
 tion may be avoided and try to find an a])proximate formula by 
 which we are able to express the maximum term in a simple 
 manner, involving little computation and at the same time 
 yielding results close enough for practical as well as theoretical 
 purposes. Jacob Bernoulli in his famous treatise "Ars Conjec- 
 tandi" was the first mathematician to solve this problem. 
 Bernoulli also gave an expression for the probability that the 
 departure from the most probable value should not exceed pre- 
 viously fixed limits. The method, however, was very laborious 
 and the final form was first reached by Laplace in "Theorie des 
 Probabilites." 
 
 We saw before in Chapter IV that the general term
 
 100 LAW OF LARGE NUMBERS. [ G2 
 
 ill the binomial exi^ansion (p + 7)* represented the probabiHty 
 that an event, E, will hap])en a times and fail /3 times in s trials, 
 where p and q were the respective probabilities for snceess and 
 failure in a siiiji;le trial. The exponent a may here take all posi- 
 tive integral values in the interval (0, s), including both limits. 
 The question now arises, which particular value of a, say «„, 
 will make the above quantity a maximum term in the expansion 
 of the binomial? If a,, really is this ])articular value, then it 
 must satisfy the following inequalities: 
 
 gl gl 
 
 (I) (11) 
 
 = (an- l)!(^n+l)!^ ^ • 
 (HI) 
 
 Dividing (II) by (III) and (II) by (I) we obtain the following 
 inequalities: 
 
 an q - I3n p - 
 
 which also may be written 
 
 (/3„ + l)p ^ qan and («„ + l)q ^ ^np. 
 
 The following reductions are self evident: 
 
 (s — a„ + l)p ^ a:„(l — 2^) or sp -\- p "^ an, 
 and 
 {an + l)q ^ {s — an)p or a,// + anp '^ sp — q or a„ ^ sp — q. 
 
 From which we see that «„ satisfies the following relation: 
 
 ps — q ^ an ^ ps -\- p. 
 
 Since p -{- q = 1, we notice that an is enclosed between two 
 limits whose difference in absolute magnitude equals imity. 
 The whole interval in which a„ is situated being equal to unity, 
 and since a„ must b(> an integral number, this particular a„ is 
 deteniiiiHil iiiii(|U('ly as an integral positive number when both 
 ps — q and /AS- + p are fractional quantities. Tf ps — 7 is an 
 integral number ps + p will also be integral, and «„ had to be a
 
 63] APPROXIMATE CALCULATION OF THE MAXIMUM TERM. 101 
 
 fractional number in order to satisfy the above inequality. 
 Since by tlie nature oF the probk^m a,, can take j)ositive integral 
 values only, the binomial expansion of {p + q)^ must have two 
 terms which are greater than any of the rest. Dividing both 
 sides of the inequality by s, we shall have 
 
 V < — <7)+-or(7 + -> — and « + - > ^ . 
 
 s s ^ s ^ s .y ^ s s 
 
 Since botli p and q are proper fractions, both />/.s' and qls are less 
 than 1/6". We may therefore safely assume that the highest pos- 
 sible difference between the two quotients cv„/,s' and ^n/s and the 
 probabilities p and q will never exeeed 1 .9. Now if s is a very 
 large number this quantity may be neglected, and we may 
 therefore write ps = an and qs = j3n. 
 
 Substituting these values in our original expression for the 
 general term of the binomial expansion we get as the maximum 
 number: 
 
 63. Approximate Calculation of the Maximum Term, T^. — 
 When the trials are repeated a large number of times the straight- 
 forward calculation of the maximum term becomes very laborious. 
 The only table facilitating an exact computation is in a work 
 "Tabularum ad Faciliorem et Breviorem Probabilitatis Com- 
 putationem Utilium, Knneas," by the Danish mathematician, 
 C. F. Degen, This table, which was published in 1824, gives the 
 logarithms to tweh'c places for all values of nl from n = 1 to 
 n = 1,200. Degen's table is, however, not easily obtained, and 
 even if it were, it would be of little or no value for factorials 
 above 1,200 !. Our only resort is therefore to find an approximate 
 expression for the above value of n !. This is most conveniently 
 done by making use of Stirling's formula for factorials of high 
 orders. We have 
 
 s\ = 5^i\-*V27, 
 
 {sq)\ = {sqy'^+'^'^e-''^^l2^.
 
 102 LAW OF LARGE NUMBERS. [64 
 
 Substituting the above values iu the expression s!/{(,sp)! (sq)!) 
 
 we get 
 
 1 
 
 Hence we have 
 
 which reduces to 
 
 T = 
 
 njSPqSq 
 
 jjSp+1 n^sq+l /2 -^2x5 ' 
 1 
 
 T = 
 
 X2ir.spq 
 
 as an approximate value for the maximum term. 
 
 Tchehychcff\s Theorems. — Despite all that has been said about 
 the most probable value, its use is somewhat limited, and it 
 might well, without harm, be left out of the whole theory of 
 probabilities. Just because an event is the most probable it 
 does by no means follow it is a very i)robable event. In fact the 
 expression ( '^'2-Kspq)~'^ which for large values of s converges 
 towards zero shows that the most probable event in reality is a 
 very improbable event. This statement may seem a little 
 paradoxical; but it is easily understood by realizing that the 
 most probable event is only a probability for a possible combina- 
 tion among a large number of equally possible combinations of a 
 different order. 
 
 Instead of finding the most probable event it is more important 
 in practical calculations to determine the average number or 
 mean value of the absolute frequencies of successes. In Chapter 
 V we pointed out the close relation between a mathematical 
 expectation and the mean value of a varial)l('. This relation is 
 used by the Russian mathematician, Tchebycheft", as the basis 
 of some very general and far-reaching theorems in probabilities, 
 by means of which the Law of Large Numbers may be established 
 in an elegant and elementary manner. 
 
 64. Expected or Probable Value. — In Chapter V we defined 
 the product of a certain sum, s, and the probabiiitN' of winning 
 such a sum as the mathematical expectation of .v. It is, however, 
 not necessary to associate the happening of the event with a 
 monetary gain or loss, in fact it serves often to confuse the 
 reader and we may generalize the definition as follows. 7/ a
 
 64] EXPECTED OR PROBABLE VALUE. 103 
 
 variable at may assninc any of the values ai, a-i, az • • • a^ each with 
 a resjjective probability uf existence (p{ai) (i = 1,2 • • • s) and such 
 that X(p{ai) — 1, then we define: 
 
 llanpiai) = e{ai) 
 
 as the expected value of a i. 
 
 Some writers use also the term probable value instead of 
 expected value. In other words the expected value of a variable 
 quantity, a, which may assume any one of the values ai, a-i- • •«« 
 is the sum of the products of each individual value of the variable 
 and the corresponding ])ro})ability of existence of sucli value. 
 
 Suppose we now have two o])posite and complementary events 
 E and E for which the probabilities of happening in a single 
 trial are equal to p and q = 1 — p respectively. When the 
 trials are repeated s times the probabilities of E hapi)ening s 
 times, E no times, of E hai)])ening s — I and E once, of E s — 2 
 and E 2 times and so on, may be expressed by the individual 
 terms of the expansion: 
 
 (p + qy, 
 
 where the general term expressing the occurrence oi E a times 
 and of E (s — a) times is: 
 
 which is also the pio})ability of the existence of the frequency 
 number a. The variable in the binomial expansion is a, which 
 may assume all values from to .s- inclusive. 
 
 We now first of all proceed to find the expected value — or the 
 mathematical expectation — of the following quantities: 
 
 a, [a — e{a)] and [a — e(a)p. 
 
 We shall presently show the reason for the selection of the 
 abo^'e exi)ressi()ns, which perhaps may appear at the present, 
 somewhat i)uzzling to the student. 
 
 In mathematical symbols the expected values of the abo\'e 
 quantities are expressed as follows* 
 
 e(a) = Zaifia), e[a — e(a)] = Z[a — e(a)]if(a) 
 and 
 
 e[a - e(a)Y = ^[a - e(a)]V(«) 
 
 and the summation is to take place from a = and to a = *.
 
 104 LAW OF LARCiE NUMBERS. [65 
 
 65. Summation Method of Laplace. The Mean Error. — 
 The analytical difficulty lies in the summation of the expressions 
 as given above. Laplace was the first to give a compact expres- 
 sion for the different sums in a simple and elegant manner. 
 By the introduction of the parameter / Laplace writes: 
 
 <p(a) = (p + qy = ^ (^'^^ p'^q^'^ 
 as 
 
 (p{ta) = (tp+ qY = ^y^j (fp)''q'-\ 
 
 Differentiating with respect to /, which it must be remembered is 
 introduced as an auxiliary parameter only, we have: 
 
 <p'(ta) = sp{tp + qy-' = -«/M M (tpy-'q'"'. 
 
 Letting t assume the special value 1 the above sum becomes e (a) 
 or 
 
 e{a) =Za f M /A/^-« = sp(p + 7)^1 = sp. (L) 
 
 We might, however, have obtained the same result in a much 
 shorter manner by the following consideration. The expecta- 
 tion for a single event among the s events is equal to p. Since 
 all the events are independent of each other, it follows from the 
 addition theorem that the complete expectation of the total s 
 cases is equal to sp. 
 
 We next proceed to determine the expression: c[a — e(a)] or 
 the expected value of the differences between the constant, 
 e(a) = sp and the individual values 1, 2, 3, • ■ -, s which a may 
 assume in the binomial expansion. 
 
 The difference a — e(a.) is known as the departure or devia- 
 tion from the expected value, some of these deviations will be 
 positive, namely all the values situated to the right of the maxi- 
 mum term, which also is the most probable term in the expansion 
 (p + q)", while the a's situated to the left of the maximum value 
 of a will be less in magnitude than the largest a — sp and the 
 deviation will therefore be negative. On account of the sym- 
 metrical form of the binomial expansion we may expect an
 
 Co] SUMMATION METHOD OF LAPLACE. 105 
 
 equal number of positive and ne<;ative deviations which, taken 
 two and two at a time, are etjual in absolute magnitude. The 
 algebraic sum of all the deviations may therefore be expected 
 to be equal to zero. We shall, however, in a rigidly analytical 
 manner prove that this is actually so. We have 
 
 e[a — e(a)] = Z[a — e{a)](p{a) = Za(p(a) — ^e{a)(p{a) 
 
 = Xa(p(a) — sp2,(p{a.). 
 
 The first term in tliis last expression we found, however, to be 
 equal to e{a) = sp, and we liave finally 
 
 e[a — e{a)] = sp — sp = 0. 
 
 By squaring the quantity, a — eia), we get a- — 2ae{a) + 
 [e(a)]-, which is always positive no matter if the above difterence 
 is negative. 
 
 As a preliminary step we shall find 
 
 e{a-) — 1,orip{a). 
 Introducing the auxiliary parameter, /, we g,et: 
 
 C) 
 
 -'"('.) 
 
 {ipYq^- = itp + qY. 
 The first derivative with respect to t is: 
 
 {tpr-'cr" = spitp + 7)<-i. 
 
 Multiplying both sides of the equation by //:*, we have: 
 
 2pa(/p)Y'-"(^) = stp'{tp+q)-\ 
 Differentiating we get : 
 
 2p-Q- r j ifpY-'q"-^ = sp-(ip + 9)^1 + s(s - \)pH{tp + q)^\ 
 
 Dividing through with the constant factor p and letting t — 1 
 we have: 
 
 -(:) 
 
 p'^q^^ = s-p- -{- sp{\ — p) = s^p- + spq. 
 
 The expression on the left side is, however, nothing less than the 
 algebraic sum of ^crip{a) or simply e{cr). This leaves the final 
 result :
 
 lOG LAW OF LARGE NUMBERS. [66 
 
 e{a-) = s-p- + s])q. 
 We have now: 
 
 [a — ('{a}]- = a~ — '2ae(a) -f- [^(«)]^ 
 
 from wliic'li it follows: 
 
 e[a — i'(a)]- = s-p- + spq — 2s- p- + s-p- = spq. 
 Denotinii tins latter fiiiantity by the symbol [e(a)]- we have: 
 
 [e(a)]- = <'[cx — ('{a)]- = spq, or e(a) = ^lspq. (II.) 
 
 The quantity e{a) or sim])ly e is commonly known as the mean 
 error of the frequency number a in the Bernoullian expansion. 
 The mean error is one of the most useful functions in the theory 
 of ])robabilities and furnislu\s oiu^ of the most powerful tools of 
 the statistician. 
 
 66. Mean Error of Various Algebraic Expressions. — We next 
 proceed to prove some general theorems connected with the 
 mean error. The mean error of the sum of two observed vari- 
 ables, a and /3, is given by the formula: 
 
 e(a -h(3) = Ve2(a) + e^- 
 
 Proof: Let e(a) = Za<p(a) and ci(3) = Z/3i/^(/3) 
 
 6^(a) = Z[a - e{a)T<p{a) and 6^(^) = Z[/3 - K/3)]V(/5) 
 
 be the respective expressions for the probable \alues and the 
 mean errors of a and /3 where of course 2</?(a!) = 1 and 22i/'(jS) = 1. 
 Now (p{a^) is the ])robability for the occurrence of the special 
 value a^, of the variable values, in the same way i/'(/?^) is the 
 probability for the occurrence of /?^. If cv and /3 are independent 
 of each other, then according to the multiplication theorem, 
 <f(ay)\p(^,j^) rei)resents the probability for the simultaneous 
 occurrence of a^ and /3^ as well as the probability of the occurrence 
 of the difference: a^ -{- (3^ — e{a) — r(^), since the i)r()bable 
 values e{a) and e{j3) are constant quantities independent of 
 either a or |3. 
 
 If e denotes the mean error of a + |3 then it follows from the 
 definition of e that e- = :i:Z[a + /3 - da) - e(i3)]V(«)iA(/3) where 
 the double summation is to take place for all possible values of 
 the variables a and (3. 
 
 The above expression may be written as:
 
 66] MEAN EKKOK OK VARIOUS ALGEBRAIC EXPRESSIONS. 107 
 
 or 
 
 A mere inspection will satisfy that the first and the last terms of 
 this expression equals e^{a) and e-(fi) respectively. The first 
 term may be written as follows: 
 
 since Z\J/(^) = 1. The same also holds true for the last term. 
 With regard to the middle term we found before that 
 
 e[a — e{a)] = 0. 
 
 Hence it follows by mere inspection that this term becomes 0. 
 Thus we finally have: 
 
 e\a + /?) = e\a) + eHl3) or e(a + /3) = ^le'(0) + e'(a). 
 Since the middle term is always 0, it follows a fortiori 
 
 e(a - /3) = V6--'(a) + e\^), 
 also that 
 
 €(ka) = ke(a), 
 
 where k is a constant. This gives us the following theorems: 
 The mean error of tlie sum or of the difference of two quantities 
 is equal to the square root of the sum of the squares of each 
 separate mean error. The mean error of any quantity multiplied 
 by a constant is equal to this same constant multiplied by the 
 mean error of the quantity. (See Appendix.) 
 
 The above theorems may easily be extended to any number of 
 variables: a, (3, y ■ • • so that in general we have 
 
 e(« + |3 + 7- • •) = Ve» + 62(^) + 6^(7) + .... 
 
 We shall later make use of this formula by a comparison of 
 the different rates of mortality among different population groups. 
 
 So far we have computed the mean error for the absolute 
 frequencies of a, and the quantity '\spq was compared with the 
 most probable number of successes sp. But it may also be useful 
 to know the mean error of the relative frequencies. This calcula- 
 tion is performed by reducing the mean error of the absolute
 
 108 LAW OF LARGE NUMBERS. [ G7 
 
 frequencies to the same degree as these absohite frequencies are 
 reduced to relative frequencies. We saw before that e(a) = sp. 
 The rehitive frequency of the probable value is eiaj/s = spjs = p. 
 The mean error of p therefore is 
 
 e[e{cx) -.s] 
 
 \p(l 
 
 The following remarks of Westergaard are worthy of note: 
 "When a length is measured in meters and this measure may be 
 effected with an uncertainty of say 2 meters, the length in 
 centimetres is then simjily found by multiplication by 100 and 
 the uncertainty is 200 cm. When we wish to find the mean 
 error of p instead of sp we only need to divide the mean error 
 
 ■yJspq hy s, wliich gives -^pq/s.'" 
 The same result is also easily obtained from the formula 
 
 t{l-ci) = ke{a) 
 when we let A* = 1 s. 
 
 67. Tchebycheff's Theorem.— Tcliebychef^"s brochure ap- 
 peared first in Liouville's Journal for 1866 under the title "Des 
 valeurs Moyennes." A later demonstration was given by the 
 Italian mathematician, Pizetti, in the annals of the University 
 of Geneva for 1892. The nucleus in both Tchebycheff's and 
 Pizetti's investigations is the expression for the mean error: 
 
 ^a) = ^[^ - K^)]Va). (1) 
 
 The variable ^ may be of any form whatsoever, it may thus for 
 instance be the sum of several variables: a, ^, y • • • while (p{^) 
 is the ordinary probability function for the occurrence of ^. Let 
 us denote the difl'erence: ^r — ^(^r) by iv(r = 1, 2, 3 • • • s). We 
 may then write the above expression for e{^) as: 
 
 <^(^l) ^ + ^{h) V2 + <^(^3) 7! + • • • <p{is) ^' = "~^^ (2) 
 
 It (J/ CV (/ (* 
 
 where a is an arbitrarily chosen constant, but always larger than 
 e{^) in absolute magnitude. If we, in the above equation, select 
 all the r's which are larger than a in absolute magnitude together 
 with their corresponding probabilities, <piO ^^d denote all
 
 »2 
 
 > 1 
 
 68] tchebycheff's theorem. 109 
 
 such quantities by v', v", v"\ ••• and <p{0\ <p{0" , <p{^)"', ■■' 
 respectively, we have evidently: 
 
 a" "^ a^ "^ a2 "^ " * ^ a^ ^'^^ 
 
 For any one of these different v's which is larger in absolute 
 magnitude than a 
 
 from which it follows a fortiori: 
 
 <pi^y + <piO" + ■'■ = ^^KO < ^ . (3a) 
 
 In this latter inequality, "^(p^^) is the total probability for the 
 occurrence of a deviation from e{^) larger than a in absolute 
 magnitude. 
 
 Let now Pt be the probability that the absolute value of 
 the mean error is not larger than a; then 1 — Pt is the total 
 probability that the mean error is larger than a. We have thus 
 from the inequality (3a) 
 
 1 — Pt< — 5- or Py > 1 5- • (4) 
 
 Let also a — Xe(^). We then have by a mere substitution in the 
 above inequality: 
 
 Pt>1-^,. (5) 
 
 This constitutes the first of Tchebycheff's criterions which says: 
 The probability that the absolute value of the difference \ a — e{a) | 
 
 does not exceed the mean error by a certain multiple, X, (X > 1) is 
 
 greater than 1 — (l/X-). 
 
 Now we made no restrictions as to the variable, ^, which may 
 
 be composed of the sum of several independent variables, a, /3, 
 
 7, • • • . We saw before that 
 
 e\a + ^ + 7 + • • •) = 62(a) + e\^) + 6^(7) + • • • 
 
 Tchebycheff's criterion may therefore be extended as follows: 
 
 The Tchebycheffian probability, Pt, that the difference | a + /3 + 7 
 -\- • ■ • — e{a) — e{^) — e{y) — • • • 1 loill never exceed the mean 
 error thy a certain midtiple, \> \, is greater than 1 — (l/X^).
 
 110 LAW OF LARGE NUMBERS. [69 
 
 68. The Theorems of Poisson and Bernoulli proved by the 
 application of the Tchebycheffian Criterion. — Bernoulli in his 
 researches limited himself to the solution of the problem in which 
 the probabilities for the observed event remained constant during 
 the total luiniber of observations or trials. Poisson has treated 
 the more general case, wherein tlie individual probal)ility for the 
 happening of the event in a single trial varies during the total s 
 trials. This may probably best be illustrated by an urn schema. 
 Suppose we have s urns Ui, U2, ■ " Us with white and black 
 balls in various numbers. Let the probability for drawing a 
 white ball from the urns Vi, Uo, ••• Us in a single trial be 
 Pi, Ih, ■ • • Ps respectively, r/i, r/o, • • • q, the chances for drawing 
 a black ball in a single trial. If a ball is drawn from each urn, 
 what is the probability of a drawing a white and .v — a black 
 balls in s trials? It is easily seen that the Bernoullian Theorem 
 is a special case when the contents of the s urns and the respective 
 probabilities for drawing a white ball in a single trial are the 
 same for all urns. 
 
 69. Bernoullian Scheme. — We shall now show how the Tche- 
 bycheffian critierions may be used in answering the question 
 given above. First of all we shall start witli tlie simi)ler case 
 of the Bernoullian urn-schema. Here the probability for drawing 
 a white or a black ball from each of the s urns in a single trial is 
 J) and q respectively. The square of the mean error in a single 
 trial is pq. From the formulas in § GG it then follows: 
 
 e- = er + €2" + • • • = PQ -\- P'l + ;'7 + * ' ' -^ times = spq 
 or _ 
 
 e = ^l.S'pq. 
 
 While the above expression gives us the mean error of the absolute 
 frequency of the variable a, the relative frequency of a to the 
 total number of trials, s, is given as 
 
 ^fpq 
 
 We now ask: WTiat is the total probability that the absolute 
 deviation of the relative; frequency a/s from its expected value 
 sp/s = p never becomes larger than X times the mean error.
 
 70] poisson's scheme. Ill 
 
 c — 'Spq/s? Letting X = 'Sslt and using the symbols Pr for 
 this particular probability, we have according to Tchebycheff's 
 criterion : 
 
 Pt> 1 - 1/X', or Pt > 1 - f/s. 
 
 Since the mean error is equal to yxjis we have: 
 
 The answer to our question above follows now a fortiori as 
 follows : 
 
 The total probability that the absolute deviation of the relative 
 frequency from the postulated a priori probability, j), never 
 exceeds the fpiantity, '\pqlt, is greater than 1 — (^-/.v). 
 
 By taking / large enough we may reduce ^IjH/lt (where pq is 
 a fraction whose maximum value never can exceed 1 -r- 4,) below 
 any previously assigned quantity, 5, however small. If, for 
 instance, we choose the value .0001 for 5, we may rest assured 
 that 'ylpq/t will be less than 5 when we take / larger than 5000. 
 But no matter how large t is, so long as it remains a finite number, 
 by letting s = cc as a limiting value, fjs will simultaneously 
 approach as a limiting value. From the deductions thus 
 derived we are now able to draw the following conclusions: 
 
 1) By letting s = oo as a limiting value, the probability, Pt, 
 that the absolute difference between the relative frequency ajs and the 
 postulated a priori probability, p, never becomes greater than -^q/t 
 approaches 1 or certainty as a limit. 
 
 2) By choosing the quantity, t, which is less than Urn '^s, suffi- 
 
 ciently great, ice may bring ypq/t below any previously assigned 
 quantity, 8, or make the difference between p and ajs as small as we 
 please. 
 
 From these conclusions we obtain a fortiori the follow^lng 
 
 ,. a 
 lim- = p. 
 
 This constitutes the essential features of the Bernoullian Theorem. 
 70. Poisson's Scheme. — Let pi denote the postulated prob- 
 ability for success in the first trial, p2. in the second, ps in the
 
 112 LAW OF LARGE NUMBERS. [70 
 
 third, etc., and let furthermore qi, q-i, q^, • • • be the respective 
 probabihties for the correspoiuhng faihires. If the trial (observa- 
 tion) is repeated .s- times we obtain the following values for the 
 probable or expected value of the frequency for successes eia) 
 and the mean error e 
 
 e{a) = pi-\r p2-\- pz-\- •••Vs = ^Ph 
 
 € = Vpir/i + p^q-i + Psqz + • • • Psqs = ^^p.qi {i = 1, 2, 3, • • -5) 
 
 If by po and qo we denote the arithmetic mean or the average 
 value of the s p's and s g's, such that 
 
 2^1 + p2 + P3+ ■■• Ps .^. 
 
 Po = -^ (3) 
 
 qo = , (4) 
 
 and assume that po and 70 denote the constant probabilities 
 during each of the s trials (observations), we should according 
 to the Bernoullian Theorem have : 
 
 eiois) = spo (5) 
 
 e{aj}) = yJspoqo (6) 
 
 where as stands for the absolute frequency in a Bernoullian 
 series. 
 
 An actual comparison of (1) and (5) and (3) shows that: 
 
 i'iap) = eias) (7) 
 
 where ap is the symbol for the absolute frequency in a Poisson 
 series. In other words: If the s trials had been ])erformed with 
 constant ])robability for success equal to ^>o instead of with 
 varying probabilities pi, p^, • • • /?«, the expected or probal)le 
 value would be the same for the Bernoullian and Poisson scheme. 
 With regard to the mean error we find, however, after a little 
 calculation, 
 
 ^iKcc) = es\oi) - Zip, - poY. (8) 
 
 The expression for the mean error in Poisson's Theorem is of 
 the following form 
 
 €/- = Vpi^i + 2^292 + PzQz + • • 'PiQi = ^Zpiqi (i = 1, 2, 3- • -s).
 
 70] poisson's scheme. 113 
 
 Now piqi may be transformed as follows: 
 Writing 
 
 Vi = Po -h (Pi — Vq) 
 
 Qi = qo - (Pi - Po) 
 and multiplying we obtain: 
 
 PiQi = PoQo - {pi - po)(po - qo) — {pi — PoY, 
 
 and summing up for all values of i from i = I to i = s we have: 
 
 €p' = spoqo — ^(Pi — Pq)' = f^B — '^{Pi — Po)^- 
 
 As {pi — 2Jo)" always is a positive quantity, it is readily seen that 
 the mean error in a Poisson scheme is always less than the mean 
 error in the corresponding Bernoullian series. 
 Writing e as follows: 
 
 € = ^IplQl + 2?2<Z2 + • • • + Psqs 
 
 = J~. J P^ + • • • +P8 _ Pi^-\- • • • + Ps^ 
 
 >( 5 S 
 
 and letting X = 'ss/t, we have according to Tchebycheff's The- 
 orem the following rule: The probability Pr that the relative 
 frequency remains inside the limits: 
 
 Pi + P2+ • '■ Ps ^ f ^l\ _ Pi -^ P2 + • • • -\- Ps 
 
 e 
 t 
 
 ± 
 
 J) 
 
 1 JP I+ P 2-\- • • ' -\- Ps _ Pl^+P2^ -\- '" + Ps 
 
 t^ s s 
 
 is greater than 1 — (1/X") or 1 — (t'/s). 
 
 By taking t sufficiently large and by letting s approach infinity 
 as a limiting value the last term in the above difference, namely 
 the average probability, po, and X times the mean error, becomes 
 smaller than any previously assigned quantity, d, however small, 
 while Pt at the same time will approach 1 as a limit. 
 
 From this it now follows: 
 
 When an infinite number of trials is made on an event, following 
 the scheme of Poisson, then the expression: 
 
 ,. a pi + P2 + • ■ ■ + Ps 
 
 hm - = = Pq. 
 
 x=tn S S
 
 114 LAW OK LARGE NUMBEKS. [71 
 
 The essential part of Poisson's Theorem is contained in this 
 equation. ^Yhen p = p\ = jh = • • • pa y>'e have a BernoulHan 
 series and obtain: 
 
 hm- = p, 
 
 s-ao "^' 
 
 which result we already derived above in a direct way. 
 
 71. Relation between Empirical Frequency Ratios and Mathe- 
 matical Probabilities. — In the above limit, a indicates the total 
 number of lucky events while ,s' is the total number of trials, the 
 quotient a -^ s then is nothing- more than the empirical i)rob- 
 ability as defined in the precedinji; paragraphs. Both the 
 Bernoullian and Poisson Theorems show that this empirical 
 probability approaches the postulated a priori probability, p, 
 (or the average probability po) as a limiting value. 
 
 In this way we have succeeded in extending the theory of 
 probability to other ])r()blems than the conventional kind involved 
 in the games of chance or drawings of balls from urns. We do 
 not need to limit our investigations to problems where we are 
 able to determine a priori the probability for the happening of 
 an event in a single trial, but limit ourselves to postulate the 
 existence of such an a ])ri()ri probability. 
 
 A large number of trials or observations is made on a certain 
 event E. This event is now observed to have occurred a times 
 during the s total trials. To illustrate: An urn contains red 
 and white balls, the total number of balls being unknown, a 
 single ball is drawn and its color noted. This ball is replaced 
 and the contents of the urn is mixed. A second drawing is 
 made and the color of the drawn ball noted before the ball is put 
 back in the urn. Let this process be repeated .s- times, where s 
 is a large number, and furthermore let a be the number of red 
 balls which appeared during the s trials. 
 
 The quotient a -^ 5 we now call the empirical or a posteriori 
 probability for the observed event, in this particular case the 
 a posteriori probability for the drawing of a red ball. When 
 s = 00 the Bernoullian Theorem tells us that the empirical 
 probability found in this manner and the postulated a priori 
 probability whose niinicrical value, however, was unknown 
 before the drawings took j)lace, are identical as far as numerical
 
 72] APPLICATION OF THE TCHEBYCHEFFIAN CRITERION. 115 
 
 magnitude is concerned. As we already observed in the intro- 
 ductory remarks to this chapter it is impossible to perform a 
 certain experiment an infinite innnber of times, and it is therefore 
 out of the question to determine the limiting and ideal value of 
 the posteriori probability, and we must satisfy ourselves with an 
 approximation by performing a finite number of trials, or let s 
 be a finite number. The quotient a -^ 5 is then the empirical 
 approximate a posteriori probability. We know also that al- 
 though this quotient is an a])i)roxiniation of the postulated a 
 priori ])robability only, that ))y increasing .? or what amounts 
 to the same thing, by making a large number of trials, the dif- 
 ference between the approximate empirical probability ratio, 
 a -T- s, and the a priori probability, p, becomes smaller as the 
 number of trials is increased. But how small is the difference? 
 Or how many times shall we repeat the trials (observations) so 
 that, for practical purposes, we may disregard this difference? 
 It does not suffice to be satisfied with the fact that the difference 
 becomes proportionately smaller the greater we make the number 
 of trials and merely insist that in order to avoid large errors it is 
 only necessary to operate with very large numbers. Immediately 
 the question arises: What constitutes a large number? Is 100 
 a large number, or is 1,000, 10,000, 100,000 or even a million an 
 answer to this question? As long as this question remains 
 unanswered, it ]i(>lps but little to j)()ke upon the "law of large 
 numbers," a tendency which unfortunately is too manifest in 
 many statistical researches by amateur statisticians. As long 
 as a definition, much less than a numerical determination of the 
 range of "small numbers" is lacking, little stress ought to be 
 laid on such remarks based in the metaphorical terms of "small" 
 and "large" numbers. 
 
 72. Application of the Tchebycheflfian Criterion. — It is readily 
 seen that even a rough quantitive determination of the difference 
 between the approximate a posteriori probability and the 
 postulated a priori probability based upon the mere vague state- 
 ment of "large numbers" is utterly impossible, and it remains 
 to be seen, therefore, if the theory of probability offers us a 
 criterion that might serve as a preliminary test for the above 
 difference. To restate our problem: If p is the postulated a 'priori
 
 116 LAW OF LARGE NUMBERS. [72 
 
 probability and a -^ s is the empirical probability (a posteriori) or 
 relative frequency of the event, E, ivhat is the probobility that the 
 difference, \ (a's) — p | does not exceed a previously assigned quantity? 
 In the mean error and the associated theorem of Tchebycheff 
 we have a simple and easily applied criterion to test this prob- 
 ability. 
 
 Tchebycheff's rule states that the probability, Pt, of a devia- 
 tion of a variable from its probable value, not larger than X 
 times its mean error, is greater than 1 — (1/X"). 
 For 
 
 X = 3 Pt> \ - I = 0.888 
 
 X = 4 7V > 1 - Jg = 0.937 
 X = 5 Pt> I - h = ^^-96. 
 
 This shows that a deviation from the expected or probable 
 value of the variable equal to 4 or 5 times the mean error possesses 
 a very small probability and such deviations are extremely rare. 
 
 Let us for example assume that the observed rate of mortality 
 in a certain population group is equal to .0200. Let furthermore 
 the number exposed to risk equal 10,000. The mean error is 
 
 (.02X.98\^ 
 1 n onn ) ~ -0014. If the number of lives exposed to risk 
 
 was one million instead of 10,000, the mean error would be 
 
 (.02 X .98\^ 
 "r^TTTTTT-TTT I =.00014. A deviation four times this latter quantity 
 1,000,000/ ^ ^ 
 
 is equal to .00056, and according to Tchebycheff's criterion the 
 
 probability for the non-occurrcnre of a deviation above .00056 
 
 is greater than .937, or the probability of dying inside a year will 
 
 not be higher than .0206 or less than .0194. For an observation 
 
 series of 4,000,000 homogeneous elements we might by a similar 
 
 procedure expect to find a rate of mortality' between 0.02 + 
 
 0.00028 or 0.02 - 0.00028. Thus we notice that the mean error 
 
 of the relative frequency numbers decreases as the number of 
 
 observations increases.
 
 CHAPTER X. 
 
 THE THEORY OF DISPERSION AND THE CRITERIA OF LEXIS 
 AND CHARLIER. 
 
 73. BemouUian, Poisson and Lexis Series. — In the previous 
 chapter we Hmited our disevission to single sets consisting of 
 s individual trials and found in the mean error and the criterion 
 of Tchebycheff a measure for the uncertainty with which the 
 relative frequency ratio a/s as well as the absolute frequency 
 a were affected. How will matters now turn out if, instead of a 
 single set, we make N sets of trials? As already mentioned in 
 paragraph 54, in general in N such sets we shall obtain A^ dif- 
 ferent values of a, denoting the absolute frequency of the event 
 represented by the sequence 
 
 ai, ao, 0(3, • • • oi^;. 
 
 Our object is now to investigate whether the distribution of 
 the above values of a around a certain norm is subject to some 
 simple mathematical law and if possible to find a measure for 
 such distributions. 
 
 In this connection it is of great importance whether the pos- 
 tulated a priori probabilities remain constant or not during the 
 N sample sets. Three cases are of special importance to us.^ 
 
 1. The probability of the hapi)ening of the event remains 
 constant during all the N sets. The series as given by the ab- 
 solute frequencies in each set is knoAvn as a BemouUian Series. 
 
 2. The same probability varies from trial to trial inside each 
 of N sample sets, the variations being the same from set to set. 
 The series as given by the absolute frequencies is in this case 
 known as a Poisson Series. 
 
 3. Tlie probability remains constant in any one particular set 
 but varies from set to set. The absolute frequency series as 
 produced in this way is called a Lexis Series. 
 
 The above definition of these three series may, perhaps, be 
 made clearer by a concrete urn scheme. 
 
 1 The terminology in due to Charlier. 
 
 117
 
 118 THE THEORY OF DISPERSION. [73 
 
 A. BernovlUan Series. — .9 balls are drawn one at a time from an 
 urn, containing black and white balls in constant proportion during 
 all drawings. Such drawings constitute a sample set. Let us in 
 this particular set have obtained say a\ white and /3i black balls, 
 where (X\-\- ^\ = -v. We make N sets of drawings under the 
 same conditions, keeping a record of white balls drawn in each 
 set. The number sequence thus obtained, 
 
 «i, oi-i, «3, • • • CX-N- 
 
 is a Bernoullian Series. 
 
 B. Poisson Series. — .9 individual urns contain white and 
 black balls, the proportion of white to black varying from urn 
 to urn. A single ball is drawn from each urn and its color noted. 
 In this way we get cvi white and /3i black balls constituting a set. 
 The balls thus drawn are replaced in their respective urns and a 
 second set of s drawings is performed as before, resulting in 02 
 white and jSz black balls. The number sequence, 
 
 Oil, "2, as, • • • a,v, 
 
 of white balls in N sets represents a Poisson Series. 
 
 C. Lexis Series. — s balls are drawn one at a time under the 
 same conditions as set No. 1 in the Bernoullian series. The ai 
 white and /3i black thus drawn constitute the first set. In the 
 second and following set the composition of the urn is changed 
 from set to set. The number sequence representing the number 
 of white balls in the N respective sets: 
 
 «!, ao, OC3, • • • Qfy 
 
 is a Lexian Series. The scheme of drawings is the same as in 
 the Bernoullian Series except that the proportion of white to 
 black balls varies from set to set. 
 
 74. The Mean and Dispersion. — Since we have no a priori 
 reasons for choosing any one particular value of the various as 
 of the above sequences in preference to any other, we might give 
 equal weight to each set and take the arithmetic mean as defined 
 by the formula : 
 
 \f — '^^'i + 0:2 + 0:3 + ■ • • q:.v ,j. 
 
 N ■ ^ ^ 
 
 of the .V values of a.
 
 73] BERNOULLIAN, POISSON AND LEXIS SERIES. 119 
 
 It will 1)0 unnecessary to enter into a detailed discussion of 
 the mean, which is a quantity used on numerous occasions in 
 every day life. We shall, however, define another important 
 function known as the dispersion (standard deviation). The 
 dispersion is denoted by the Greek letter, a, and is defined by 
 the formula 
 
 c' = ^ . (II) 
 
 We shall now attempt to find the expected value of the mean 
 and the dispersion in the three series. First of all take the 
 Bernoullian Series, Let the constant probability for success in 
 a single trial be j^o- We have then for the various expected values 
 or mathematical expectations of a: 
 
 Set No. 1: ^(0:1) = spo 
 
 Set No. 2: eia^) = spo 
 
 Set Nc N: e(av) = ^po 
 
 or: 
 
 e(ai) + ejaj) + • • • + e{a^) _ 7:e(aJ) _ Nspo _ 
 
 N ~ y ~ N ~ "'^°' 
 
 which shows that the mean in a Bernoullian Series of N sample 
 sets is equal to the expected value of the absolute frequency in 
 a single set. 
 
 In regard to the dispersion we have for the various sets; 
 
 Set No. 1 : e(ai — My = e-{ai) = spoqo 
 
 Set No. 2: e{a2 — My — €-(0:2) = *Po9o 
 
 Set No. .V: eia^, - My = e\a^) = spoQo 
 
 Summing up and forming the mean we obtain for the expected 
 value of the dispersion in a Bernoullian Series, which we shall 
 denote by the symbol o-^ : 
 
 2€-(a,) Nspoqo 
 ^B - ~]v~" ^ N "" *^°^°*
 
 120 THE THEORY OF DISPERSION. [73 
 
 This result shows that the dispersion in a BernoulUan Series is 
 equal to the mean error, e, in a single set. 
 
 We now proceed to the Poisson Series. Let pi be the mathe- 
 matical probability of the happening of the event in the first 
 trial, p-2 be the pr()l)ability in the second trial and so on for all 
 trials, and let us furthermore denote the means of the p's and 
 g's by: 
 
 Pi + p2 -\- P3 • ■ ■ + ps 
 
 Po = 
 
 9o 
 
 s 
 qi + q2 + qs • " + Qs 
 
 Applying a similar analysis as above we have: 
 Set No. 1 : e(ai) = pi + p2 + • • • + Ps = spo 
 Set No. 2: eia^) = pi + P2 + •'■-{- Ps = spo 
 
 Set No. N: e(a^^) = pi -\- P2 + • • • + /?« = spo 
 
 The actual summation of the above values of e(a) gives us the 
 
 following value of the mean in a Poisson Series: 
 
 Mp = spo. 
 
 Let us for a moment assume that all the drawings had been 
 performed with a constant probability, po. According to the 
 Bernoullian scheme we should then have: 
 
 Ms = spo. 
 
 An actual comparison shows that M^ = Mp. This shows that 
 the same mean result is obtained if we draw .v balls from the urns 
 Ui, JJi, • • • Us with their corresponding probabilities pi, po, • • • p, 
 for drawing a white ball, as would be obtained if we drew all the s 
 balls from a single urn where the composition is such that the 
 ratio of the number of white to that of black balls is as po : qo, 
 where po and r/o are defined as above. 
 
 Let us now see how matters turn out in regard to the dispersion. 
 We have for the N sets: 
 
 Set No. 1 : e{ai - M)- = piqi + ^2^2 + • • • = ^p^q^ = e^ai) 
 Set No. 2: e(a2 — My = p^qi + ^272 + • • • = ^p^q^ — e^ia^) 
 
 Set No. N: e{a^ - Mf = p^q^ + ^292 + • • • = ^p.q. = t\a^)
 
 73] BERNOULLIAN, POISSON AND LEXIS SERIES. 121 
 
 111 § 70 we showed, however, that ^p^Qv could be expressed 
 as follows: 
 
 €p-(a) = 52Jo'7o - 2(p^ - 2?o)- = 6/(a) - 2(p^ - po)^. 
 
 A simple straightforward calculation gives us now for the 
 dispersion, ap, 
 
 (t;~ = (T^ - l{p^ - /Jo)-, 
 
 In the corresponding Bernoullian Series with constant proba- 
 bility, ])o, the dispersion is equal to spo({o> which shows that the 
 dispersion in a Poisson Series is less than the corresponding 
 dispersion of the Bernoullian Series. 
 
 We finally come to the mean and the dispersion in the Lexian 
 Series which we shall denote by il/^, and cr^ respectively. Let us 
 furthermore define the two quantities po and Qq as follows: 
 
 Vi + P2+ •■■ +Pn 
 
 Vo = 
 
 qo = 
 
 N 
 9i + 92 + \- Qn 
 
 N 
 
 A computation along similar lines as above gives us first for 
 the mean, Mj^: 
 
 Set No. 1 : e(ai) = spi 
 
 Set No. 2: e(a2) = sp2 
 
 Set Xo. X: ^(«A-) = spj^ 
 
 Thus we have: 
 
 Ic(q;..) Scvp, s[pi + P2+ ■•• p.vl 
 
 ^^h = —^ = 7 Y = ]^ = m- 
 
 For the dispersion we have the following expectations: 
 Set Xo. 1: e(spo — aiY 
 
 Set Xo. 2: e(spo — a-iY 
 
 Set Xo. X: e{spo — a^Y 
 
 The expected value in the I'th set is 
 
 e{spo — aS~ = 2(*2?o — a^y<p^{a),
 
 122 THE THEORY OF DISPERSION. [ 74 
 
 where ifXo^) is the general term in the i>robabiHty binomial: 
 {Pv + (IvY = 1- All analysis along similar lines as in § 65 
 gives us now: 
 
 e{spQ — aj- = s-pQ' — 2s-poPy + s-pj^ + sp,q^ 
 
 = sp,.q,. + .v-(/J, - Pq)- 
 
 as the expected value of the square of the difference between the 
 mean and the absolute frequency in the j'th set. For all N 
 sets we then have 
 
 ., ^sp„q,. .<?% 
 
 We have, however, the following identity: 
 
 2p^<7, = .Vpo^o - 2(p, — po)^ 
 and hence 
 
 <^L = <^B + y— 2(/;, - PqY. 
 
 74a. Mean or Average Deviation. — Of quite another character 
 than the standard deviation or dispersion is the so-called mean 
 or average deviation, d, defined by means of the following 
 relation : 
 
 «i - M ! + I a2 - M I + I «3 - 3/ I + • • • + 1 «A— 3/ ! 
 
 t? = 
 
 .V 
 
 where | a^ — M \ means the absolute difference between m, and 
 M. We shall now proceed to determine the expected value of 
 t? on the assumption that the observed data follow the Bernoullian 
 Ivaw. The mean in a Bernoullian series with constant prob- 
 ability po we found before to be equal to spo which was the 
 expected value of a in a single sample set of .9 trials. The 
 expected value of the absolute difference in the i^th set is therefore: 
 
 e\(x^ — spol = 2 I a^ — spo : (pM, 
 
 where as usual (pjoc) is the binomial probability function. 
 
 The deviations from spo are partly positive and partly negative. 
 We proved, however, before that 
 
 e{a^ — spo) = '^{a^ — spo)(p,(a) = 0.
 
 74] MEAN OR AVERAGE DEVIATION. 123 
 
 Hence it is readily seen that the algebraic sum of the positive 
 deviations cancel the algebraic sum of the corresponding nega- 
 tive deviations so that e\a^ — spo \ equals twice the sum of 
 the positive deviations. Positive deviations occur for values 
 of a greater than spo, i. e., for all values which a may assume 
 from s to spQ in the binomial expansion: (po + ?o)*.^ Hence we 
 have (omitting subscripts) : 
 
 e\(x — sp\ = 2^{a — sp) { j p*^'"" 
 
 = 2 j Za (M pV"" - ^p£ (1) Vq"- 
 
 The second of these sums represents the following function of 
 p and q 
 
 f(p, q) = p'+ {[) p^'q + (2 ) F^Y +••• + (,*) P'"'?''- 
 
 By ])artial differentiation in respect to p and by following 
 multiplication by p we have: 
 
 -PO^'" 
 
 Hence we mav write: 
 
 + sp['l\p 
 
 apgsq 
 
 sq' ^ 
 
 e\a — sp\ = 2^p- spf 
 
 dp 
 
 Furthermore f{p, q) is a homogenous function in respect to p 
 and q of the sth order. We may then apply the following well 
 known Eulerian Theorem from the differential calculus: If 
 f{p, q) is homogenous and has continuous first partial derivatives 
 then 
 
 Using this relation we may write: 
 
 e\ a - sp\ =^ 2 \p ^- spf V = 2pq 
 
 ^Spo is taken to the nearest integer. 
 
 dp dq
 
 124 THE THEORY OF DISPERSION. [74 
 
 The partial derivatives of f{p, q) with respect to y and q are 
 of the form: 
 
 dp ^ ^^ ^^ ^ ^ 1-2-3 •.. (5g- 1) ^ ^ 
 
 . s(s — 1) ' • • sp 
 ^ 1-2-3 •'■ sq P ^ 
 
 df „ . s(s - 1) ■ ■ ■ {sp + 1) 
 
 Hence we have: 
 
 df _df_ s(s-D^-^sp r ■'^ „ 
 
 69 ar/- 1-2.3 ■••.(7 ^ ^ '{spl sq\^ "^ 
 
 We proved, however, in § ()3 that the expression inside the 
 bracket may be written approximately as follows: 
 
 T = ' 
 
 yl'lTspq 
 This gives us finally (again using the subscripts) : 
 
 , lor l-^Poqo 
 e\a^ — spo\ = 2spoqolm = y 
 
 as the expected value of the absolute deviation in the j/th sample 
 set. This same relation evidently holds true for any other of 
 the N sample sets, which finally gives us the following result for ??: 
 
 ^4 
 
 d = y- ■ ^spoqo. 
 
 ^ IT 
 
 The dispersion in a Bernoullian series we found before to be 
 
 of the form: 
 
 (Tb = ^spoqo. 
 
 Hence we have the following relation between the dispersion 
 and the mean deviation: 
 
 (Tb 
 
 = ^^^= 1.2533^. 
 
 75. The Lexian Ratio and the Charlier Coefficient of Dis- 
 turbancy. — The results given in the last few paragraphs may be 
 embodied under the following captions.
 
 75] THE LEXIAN RATIO. 125 
 
 1. The mean in a Poisson and Lexis Series is the same as the 
 mean in a Bernuullian Series with constant prubability of po 
 in a single trial, where po is defined as abooe. 
 
 2. The dispersion in a Poisson Series is less than the correspond- 
 ing dispersion in a Bernoullian Series. 
 
 3. The dispersion in a Lexis Series is greater than the dispersion 
 in a Bernoullian Series. 
 
 The mean and the dispersion of the Bernoullian Series occupy 
 in this connection a central j)(>sition and may be used as a standard 
 of comparison with other series. This is the method adopted by 
 Lexis in investigating certain statistical series, and we shall re- 
 turn to it in the following chapter. Lexis determines first 
 in a direct manner the dispersion as defined by formula (II) 
 from the statistical data as given by the number sequence a. 
 This process is known as the direct process (by Lexis called a 
 physical process) and gives a certain dispersion, a. After this 
 the dispersion is computed by an indirect (coml)inatorial) 
 process under the assumption that the series follows the Ber- 
 noullian distribution. The ratio, a : (Tq, which Charlier calls 
 the Lexian Ratio and denotes by the symbol, L, ma\' now give 
 us an idea about the real nature of the statistical series as 
 represented by the number sequence. 
 
 When L — I, the series is by Lexis called a normal series. 
 
 When L > I, the series is called hypernormal. 
 
 When L < 1, the series is a suhnonnal series. 
 
 It is easily seen from the respective formulas that the Poisson 
 Series are subnormal series whereas the Lexian Series are hyper- 
 normal. The great majority of statistical series are — as we 
 shall have occasion to see in the following chapter — of a hyper- 
 normal kind and correspond thus to the Lexian Series. 
 
 In § 74 we found the dispersion in the Lexis series as 
 
 <^l' = <yB~ + (.^- - s)a;-, 
 where 
 
 CrT = 
 
 .V 
 
 The quantity, o-p, is the natural measure of the variations in 
 the chances from the mean or normal probability, 2^0- It i^
 
 12G THE THEORY OF DISPERSION. [75 
 
 however, dependent on the absolute values of these chances, so 
 that if all chances are changed in the same proportion, Cp is 
 also changed in the same proportion. Another drawback which 
 influences the Lexian Ratio is the variations of the number s 
 in each sample set. In order to overcome this difficiilty Charlicr 
 divides the. above quantity <jp by po- Assuming that the vari- 
 ations in the individual probabilities within each set are of no 
 perceptible influence on the dispersion, we have from the Lexian 
 dispersion: 
 
 "> o 2 
 
 Neglecting s in comparison with ,s'- and renjembering that 
 Ms = spo, we have as an approximation: 
 
 rp Va^^-2 
 
 0-, 
 
 Po M 
 
 B 
 
 Charlier calls the quantity lOOp the coefficient of disUirhancy of 
 the statistical series. It is readily seen that the Charlier coef- 
 ficient is zero in normal series. For hypernormal series it is a 
 positive real quantity whereas for subnormal series p is imaginary.
 
 CHAPTER XL 
 
 APPLICATION TO GAMES OF CHANCE AND STATISTICAL 
 PROBLEMS. 
 
 76. Correlate between Theory and Practice. — In the theo- 
 retical analysis just completed we treated the fundamental ele- 
 mentary functions in the theory of probabilities, the probability 
 function, the expected or probable value of a variable quantity, 
 the mean error, the dispersion and the coefficient of disturbancy. 
 The formulas thus derived were founded upon certain hypo- 
 thetical axioms, which formed the basis of a mathematical a 
 priori probability as defined by Laplace. As far as the purely 
 abstract mathematical analysis is concerned it matters but little 
 if the hypotheses are physically true or not, that is to say, if 
 they agree with physical facts in the universe as it is known to 
 us. A mathematical analysis may be made on the basis of 
 widely divergent hypotheses, a fact which is clearly shown in 
 the Euclidean and Non-Euclidean geometries. It is, however, 
 quite a different matter when we wish to apply our theory to 
 actual phenomena (physical observed events) as it is evident 
 that a correlation between hypothesis and actual facts follows 
 by no means a priori. It is, of course, true that the different 
 hypotheses in the tlieory of probabilities are derived to greater 
 or less extent from outside sense data. Such sense data, however, 
 give us only the effect and no clue whatsoever to the relation 
 between cause and efl'ect. In the application of our theory every 
 hypothesis — or rather the results derived from such hypothesis 
 — must be verified by actual experience. Before such a veri- 
 fication is made, we advise the reader to be sceptical and not 
 trust too much in the authority of others but follow the sound 
 advice of Chrystal : " In mathematics let no man over-persuade 
 you. Another man's authority is not your reason." We can so 
 much more encourage an attitude of scepticism in view of the 
 fact that even among the leading mathematicians of the present 
 time there exists no uniform opinion as to the truth of the 
 axioms underlying the theory of probabilities. 
 
 127
 
 128 APPLICATION TO GAMES OF CHANCE. [76 
 
 77. Homograde and Heterograde Series. Technical Terms. 
 
 — Whene\er a coiniiion charactoristic or attribute of several 
 groups of observed individual objects or events allows a purely 
 quantitative determination, it may be made the subject of a 
 mathematical analysis and in such cases we are often able to 
 make excellent use of the theory of probabilities. Such quan- 
 titative measurements may be divided into various domains 
 of classification. Traces of such classification are found in almost 
 every treatise on mathematical statistics but a uniform system 
 nomenclature is unfortunately lacking among the various 
 statisticians and any one reading the modern literature on mathe- 
 matical statistics notices often various inconsistencies of the 
 different authors. Mr. G. Udny Yule in his excellent treatise 
 "Theory of Statistics" classifies the statistical series into "sta- 
 tistics of attributes" and "statistics of variables." Apart from 
 the fact that Mr. Yule's statistics of variables also is a statistics 
 of attributes — although of different grades — the author appar- 
 ently ignores the criterion of Lexis and the associated criterion 
 of Charlier. The German writers use the terms "stetige und 
 unstetige Kollektivgegenstand " (continuous and discontinuous 
 collective objects), which were originally introduced by Fechner. 
 Other writers, such as Johannsen of Denmark and Davenport 
 of America, use still other terms. After ha^■ing made a com- 
 parison of the various systems of classification I have in the 
 following decided to adhere to the system of Charlier wherein 
 the observed statistical series are classified as homograde and 
 heterograde. 
 
 If the individuals all possess the same character or attribute 
 in the same grade (intensity) — or if we disregard the different 
 grades of the attributes — such individuals are called homograde, 
 and the statistical series thus formed is a homograde series. If 
 on the other hand we take into consideration the different 
 varying grades of the attributes obserxcd or measured and form 
 the series accordingly we obtain a heterograde series. As examples 
 of homograde series we may mention the observed recorded 
 series of coin tossing, card drawings in reference to a specified 
 event, number of births or deaths in a population group, etc. 
 A coin when tossed will either show head or tail, a person will
 
 77] HOMOGRADE AND HETEROGRADE SERIES. 129 
 
 either be dead or alive. There are no intermediate degrees as 
 for instance that of a half dead person. In all such series the 
 dividing line between the occurrence of the event (attribute) E 
 and the occurrence of the opposite event E is distinct and suggests 
 itself a priori and there is no doubt as to the classification of the 
 observed event. 
 
 The original record of observation of a homograde series — also 
 known as the yrimary list — is simply a record of the presence or 
 non-presence of a specified attribute of the individuals belonging 
 to the group under observation and is of the following form: 
 
 Primary List of Homograde Individuals. 
 
 Attribute. 
 
 Sjmbol for the Indiyidual 
 
 Present {E). 
 
 Non-present {E). 
 
 /i 
 
 1 
 
 
 u 
 
 
 1 
 
 h 
 
 
 1 
 
 u 
 
 1 
 
 
 u 
 
 1 
 
 
 In this scheme the individuals I\, h and /a possess the attribute 
 E while the individuals /o and I^ do not have this attribute. 
 
 In observing the presence of a specified attribute in a group of 
 individual objects we meet, however, frequently series of quite 
 another nature than the simple homograde series. When in- 
 vestigating the different measures of heights of persons inside a 
 certain population group no simple dichotomous (f. e., cutting 
 in two) division in two opposite and mutually exclusive groups 
 suggests itself a priori. It is of course true that we might divide 
 the total population under observation into two subsidiary groups 
 of tall individuals and short individuals. But the question then 
 immediately arises, What constitutes a short or a tall person? 
 The answer must necessarily be arbitrary. Persons above the 
 height of 170 cm. may be classed as tall while persons falling 
 short of such measure may be classed as short persons, and we 
 might in this way form a primary homograde table of the form 
 as given above. There is no logical reason, however, to choose 
 the quantity 170 cm. as the dividing line and comparatively 
 
 10
 
 130 APPLICATION* TO GAMES OF CHANCE. [78 
 
 little value would result from such a classification. It is evident 
 that all persons belonging to groups of tails or shorts are not iden- 
 tical as to the particular attribute in question. The height is 
 merely a characteristic which varies with each individual and no 
 two in(livi(hials have matheinatically speaking the same height. 
 If we take into consideration the different grades of height among 
 the individuals and arrange the primary table accordingly we 
 obtain a heterograde series of observations. The general form 
 of the primary table of such series is: 
 
 
 Primary 
 
 List 
 
 OF 
 
 Heterograde 
 
 Individuals. 
 
 Symbol 
 
 for the IndiTidual 
 
 
 
 Grade of Attribute. 
 
 
 /l 
 
 
 
 
 Xi 
 
 
 I2 
 
 
 
 
 X2 
 
 
 Is 
 
 
 
 
 x» 
 
 
 7* 
 
 
 
 
 Xt 
 
 
 /« 
 
 
 
 
 Xs 
 
 In X„ 
 
 Here the quantities .ri, .ro, • • • .Tn give the measures (in kilo- 
 gram, liter, meter, etc.) of the characteristic in question.^ 
 
 As examples of heterograde series we may mention the lengths, 
 volumes or weights of animals, plants or inorganic objects; 
 astronomical observations as to the brightness of celestial objects; 
 meteorological records of rainfall. tein])erature or barometer 
 heights ; the frequency of deaths among policyholders as to 
 attained age in an assurance company; duration of sickness or 
 disablement, etc. 
 
 The investigation of heterograde series is a ])r()bleni of which 
 we shall treat later under the theory of errors or frcxjuency curves. 
 The homograde series may, however, be explained fully by means 
 of the Bernoullian, Poisson and Lexian Series as founded on the 
 mathematical theory of probabilities in tlic ])re^■i()us chapters. 
 
 78. Computation of the Mean and the Dispersion in Practice. 
 — It would be sui>erfluous to enter into a detailed denionstratioii 
 of the practical calculation of either the mean or the dispersion 
 
 ' It is to be noted that in the homograde series the primary list is given by 
 abstract numbers while the heterograde series consists of concrete numbers.
 
 78] COMPUTATION OF THE MEAN. 131 
 
 were it not for the fact that this calculation is performed with a 
 lot of unnecessary and useless labor by the untrained student and 
 even by many professional statisticians. By the ordinary school 
 method the number zero is chosen as the starting point and all 
 the variables are expressed in their absolute magnitudes, i. e., 
 their distance from 0. In this way one often encounters mul- 
 tiplication and addition of large numbers. The Danish biologist 
 and statistician, W. Johannson, has illustrated the futility of this 
 method in the following example taken from his treatise " Forela^s- 
 ninger over Lieren om Arvelighed" (Copenhagen, 1905).^ Dr. 
 Petersen, the director of the Danish Biological Station, counted 
 the tail fin rays of 703 flounders (Pleuronedes) caught around the 
 neighborhood of the Skaw. The observations follow: 
 
 Number of rays: 47 48 49 50 51 52 53 54 55 56 57 5S 59 60. 61 
 
 No. of flounders: 5 2 13 23 5S 90 134 127 HI 74 37 10 4 2 1 
 
 The ordinary way of computing the mean would be as follows: 
 [5 X 47 + 2 X 48 + 13 X 49 + h 1 X 61] 4- 703, 
 
 where 703 is the total number of individuals under observation. 
 In Chapter X we gave the following formula for the mean: 
 
 ^r _ mi + mo + VH + • • • + w,v ,^. 
 
 M - -^ . (1) 
 
 This formula may evidently be written as follows: 
 
 mi — Mo + rrii — Mo -\- 7113 — Mo + • • • + m^ — Mq 
 
 M = 
 
 N 
 
 (2) 
 2(m, - Mo) 
 
 + Mo = -^^ ^ + Mo = h + 3/, 
 
 In this expression Mo, which Charlier calls the provisional mean, 
 is an arbitrarily chosen number. To show how the introduction 
 of this quantity actually shortens the calculation of the mean 
 we return to the above quoted series of observations of tai', fin 
 rays of flounders. 
 
 1 German edition "Elemente der exakten Erblichkeitslehre" (Jena, 1913), 
 page 11.
 
 132 APPLICATION TO GAMES OF CHANCE. [78 
 
 NtTMBEu OF Rays (x) ix 703 Flounders According to Observations op 
 
 Dr. Petersen. 
 
 A^ = ZF{x) = 703, Mo = 53. 
 
 
 Frequency 
 
 
 
 
 
 X. 
 
 = ^X-r) 
 
 X — 
 
 Mo. 
 
 (x - 3fo)F{x). 
 
 47 
 
 5 
 
 -6 
 
 
 - 30 
 
 
 48 
 
 2 
 
 -5 
 
 
 - 10 
 
 
 49 
 
 13 
 
 -4 
 
 
 - 52 
 
 
 50 
 
 23 
 
 -3 
 
 
 - 69 
 
 
 61 
 
 58 
 
 -2 
 
 
 -116 
 
 
 52 
 
 96 
 
 -1 
 
 
 - 96 
 
 
 53 
 
 134 
 
 
 +0 
 
 
 + 
 
 54 
 
 127 
 
 
 +1 
 
 
 + 127 
 
 55 
 
 111 
 
 
 +2 
 
 
 +222 
 
 56 
 
 74 
 
 
 +3 
 
 
 +222 
 
 57 
 
 37 
 
 
 +4 
 
 
 + 148 
 
 58 
 
 16 
 
 
 +5 
 
 
 + 80 
 
 59 
 
 4 
 
 
 +6 
 
 
 + 24 
 
 60 
 
 2 
 
 
 +7 
 
 
 + 14 
 
 61 
 
 1 
 
 
 +8 
 
 
 + 8 
 
 Sum = S 
 
 703 
 
 
 
 -373 
 
 +845 
 
 We have now: 
 
 
 
 
 
 
 b = (845 - 373) ^ 703 = 0.67, M = Mo + 6 = 53.67. 
 
 The method is quite simple and needs hardly any explanation. 
 From a cursory examination of the material we notice tluit the 
 mean is situated in the neighborhood of the series consisting of 
 53 rays. We choose therefore the provisional mean, J/o; as 53. 
 We next form the algebraic differences of x — Mq. These dif- 
 ferences are then multii)lied by F{x). The algebraic sum of 
 these products divided by N = 2F(.t) gives us the value of 6, 
 which quantity added to Mo gives the value of the mean, M. 
 
 To show a slightly modified form of the method we take the 
 following observations of coal-mine accidents in Belgium, covering 
 the i)eriod 1901-1910, from "Annales des Mines de Belgique." 
 These data I have reduced to a stationary population group of 
 140,000 mine workers. In other words the quantity s as defined 
 in § 83 is eciual to 140,000.
 
 78] COMPUTATION OF THE MEAN. 133 
 
 Number (m) of Persons Killed in Coal Mine Accidents in Belgium, 
 
 1901-1910. 
 s = 140,000, N = 10, Mo = 140. 
 
 Year. 
 
 m. 
 
 1901 
 
 164 
 
 1902 
 
 1.50 
 
 1903 
 
 100 
 
 1904 
 
 130 
 
 190.-) 
 
 127 
 
 1906 
 
 133 
 
 1907 
 
 144 
 
 1908 
 
 1.50 
 
 1909 
 
 1.33 
 
 1910 
 
 133 
 
 Sum = 2 
 
 
 Hence 
 
 
 m — Mo. 
 
 (771 - MoY- 
 
 +24 
 
 576 
 
 + 10 
 
 100 
 
 +20 
 
 400 
 
 -10 
 
 100 
 
 -13 
 
 169 
 
 — 7 
 
 49 
 
 + 4 
 
 16 
 
 + 10 
 
 100 
 
 - 7 
 
 49 
 
 - 7 
 
 49 
 
 -44 +68 1608 
 
 b = (68 - 44) ^ 10 = 2.4, .1/ = 140 + 2.4 = 142.4. 
 
 In this example probably it would have been easier to have formed 
 the sum 1,m^ directly and then obtained the mean by division 
 by 10. The actual formation of the algebraic sums of m^, — Mq 
 however, greatly facilitates the calculation of the dispersion, a, 
 to which we now shall turn our attention. 
 The formula for the dispersion 
 
 ^'=^'-'^=-^'(^=1,2,3, ....V) (3) 
 
 may evidently be written as follows: 
 , (im - MoY- + (1112 - MoY + • • • + (w^ - MoY 
 
 N 
 
 2(m, - MoY 
 
 
 (4) 
 
 where h as usual means M — Mo, Mq being the provisional mean. 
 For Belgian coal mine accidents we thus obtain from the above 
 data: 
 
 0-2 = (1608 -^ 10) - .5.76 = 155.04. 
 
 Where the number of observed individuals is very large an 
 arrangement as that given above for the Belgian statistics becomes 
 too bulky and it is therefore customary to group the observations 
 in classes as for instance in the example of Dr. Johannsen. The 
 dispersion is then computed according to the following elegant
 
 134 
 
 APPLICATION TO GAMES OF CHANCE. 
 
 78 
 
 method due to Charlier from whose brochure "Grunddragen af 
 den matematiska Statistiken" ("Rudiments of ^Mathematical 
 Statistics") I take the following example: 
 
 Number of Boys (//() pkk 500 Children Born in 24 Provinces of Sweden 
 DURING Each Month in 1883 and 1890. 
 
 s = 500, N = 576, Mo = 257, w = 5. 
 Class. 
 
 Limits 
 
 Hi. 
 
 Num 
 
 ber. 
 
 Frequency 
 
 xF\x). 
 
 x«i?(i). 
 
 {X + l)=F(x), 
 
 200-204 
 
 — 
 
 11 
 
 1 
 
 - 11 
 
 + 
 
 121 
 
 100 
 
 205-209 
 
 — 
 
 10 
 
 
 
 
 
 
 
 
 
 
 210-214 
 
 — 
 
 9 
 
 
 
 
 
 
 
 
 
 
 215-219 
 
 — 
 
 8 
 
 1 
 
 - 8 
 
 + 
 
 64 
 
 49 
 
 220-224 
 
 — 
 
 7 
 
 2 
 
 - 14 
 
 + 
 
 98 
 
 72 
 
 225-229 
 
 — 
 
 6 
 
 5 
 
 - 30 
 
 + 
 
 180 
 
 125 
 
 231-234 
 
 — 
 
 5 
 
 13 
 
 - 65 
 
 + 
 
 325 
 
 208 
 
 235-239 
 
 - 
 
 4 
 
 18 
 
 - 72 
 
 + 
 
 288 
 
 162 
 
 240-244 
 
 — 
 
 3 
 
 47 
 
 -141 
 
 + 
 
 423 
 
 188 
 
 24.5-249 
 
 — 
 
 2 
 
 60 
 
 -120 
 
 + 
 
 240 
 
 60 
 
 250-254 
 
 — 
 
 1 
 
 81 
 
 - 81 
 
 + 
 
 81 
 
 
 
 25.5-2.59 
 
 
 
 
 108 
 
 
 
 
 
 
 108 
 
 260-264 
 
 + 
 
 1 
 
 91 
 
 + 91 
 
 + 
 
 91 
 
 364 
 
 265-269 
 
 + 
 
 2 
 
 60 
 
 + 120 
 
 + 
 
 240 
 
 540 
 
 270-274 
 
 + 
 
 3 
 
 44 
 
 + 132 
 
 + 
 
 396 
 
 704 
 
 275-279 
 
 + 
 
 4 
 
 22 
 
 + 88 
 
 + 
 
 352 
 
 550 
 
 280-284 
 
 + 
 
 5 
 
 16 
 
 + 80 
 
 + 
 
 400 
 
 576 
 
 28.5-289 
 
 + 
 
 6 
 
 6 
 
 + 36 
 
 + 
 
 216 
 
 294 
 
 290-294 
 
 + 
 
 7 
 
 
 
 
 
 
 
 
 
 
 29.5-299 
 
 + 
 
 8 
 
 
 
 
 
 
 
 
 
 
 300-304 
 
 + 
 
 9 
 
 1 
 
 + 9 
 
 + 
 
 81 
 
 100 
 
 Sum = 2 
 
 576 
 
 + 14 
 
 +3596 
 
 +4200 
 
 The class width interval in the above scheme was chosen as 5. 
 The observed frequencies are given in column 3. We thus find 
 that the greatest frequency of 108 falls in tlic class interval 
 255-259. Choosing this class interval as the origin wc designate 
 the other class intervals with their proj)er positi\e and negative 
 numbers as shown in column 2. The provisional mean, Mq, 
 is taken as the center of class 0, or Mo = 257. In this way the 
 class interval ic — 5 is taken as the unit. 
 
 The whole calculation is very simple. We first of all form the 
 product X X F(x). The sum of these products divided with 
 57G = N gives the distance — b — from the ])r()\isioiiaI mean to 
 the arithmetic mean, expressed in units of the class interval, w.
 
 78] COMPUTATION OF THE MEAN. 135 
 
 We have thus: 
 
 b = w X 14 -^ 576 = + 0.0243U' = + 0.122, 
 or 
 
 M = 257+ b = 257.12. 
 
 The formula for the dispersion takes the form 
 
 ■^^^fi^)_j,y 
 
 N 
 
 where b is expressed in units of the class interval. The table gives 
 
 us 
 
 2F(.r).r2 _ 359(5 ^j. 
 
 a- = u'-[359G -^ 57G - (0.024)-] = ^^3.242, 
 
 a = ic X 2.498 = 12.49. 
 
 Charlier now checks the results by means of the following relation: 
 
 -(.r + \rF{x) = Z.v-F{x) + 2^F{x) + ZFix). 
 
 For the above example we have: 
 
 2.r2F(.r) = + 3,596 
 
 22.rF(.r) = + 28 
 
 2F(.r) = + 576 
 
 Sum - + 4,200 = 2(.r + l)-F{x), 
 
 which proves the accuracy of the calculation. 
 
 The full elegance of the Charlier self checking scheme is shown 
 at a later stage under the calculation of the parameters of fre- 
 quency curves. In the meantime the student may test the ad- 
 vantage of the provisional mean by trying to compute the mean 
 and the dispersion by the conventional school method. A 
 direct computation by this method would in the last example 
 take about a whole day's labor. 
 
 Before we proceed to apply the formulas previously demon- 
 strated, we wish to call the attention of the reader to the following 
 important properties of the mean and the dispersion: 
 
 1. The algebraic sum of the de\'iations from the mean — /. e., 
 2(w„ — M) — is zero. This follows immediately from formula 
 (2) of §78. We have: 
 
 2(m,_- Mo) 
 N 
 
 M = _A_i_ ^+ j/^ = 5 + Mo,
 
 136 APPLICATION TO GAMES OF CHANCE. [79 
 
 where Mo, the provisional mean, is an arbitrarily chosen number 
 and b = 1(7??,. — Mq) -^ N. If Mq = M we have evidently 
 b = 0, which proves the statement. 
 
 2. The dispersion (standard deviation) is the least possible 
 root-mean-square deviation, i. e., the root-mean-square deviation 
 is a minimum, when the deviations are measured from the mean. 
 
 We have (see formula (4)): 
 
 „ 2(/», - MY- Z(m. - MoY ,2 
 a- = jr = ^r 6^ 
 
 from which tlic proposition follows a fortiori. 
 
 79. Westergaard's Experiments. — The Danish statistician, 
 
 Harald Westergaard, in his " Statistikens Teori i Grundrids" 
 
 gives the following results of 10,000 observations divided into 
 
 100 equal sample sets of drawings of balls from a bag containing 
 
 an equal number of red and white balls (the ball was returned 
 
 to the bag after each drawing): 
 
 White: 33 34 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 
 
 Frequency. 01 12223 3 4565 11 95 10 48 
 
 White: 55 5d 57 .58 59 60 (il 62 63 
 
 Frequency: 3 5 4 4 111. 
 
 The elements as resulting from Westergaard's drawings clearly 
 represent a BernouUian Series where the number of comparison 
 s is equal to 100. Arranging the data in classes — taking 3 as 
 the class interval — the computation of the mean and the dis- 
 persion is easily performed by means of the Charlier self checking 
 scheme. 
 
 [{NOULLIAN 
 
 Series. 
 
 Number of 
 
 White Balls 
 
 IN 100 
 
 Drawings 
 
 
 
 (Westergaard). 
 
 
 
 
 s = 100, N = 100, 
 
 Mo = 49, w = 
 
 = 3. 
 
 
 m. 
 
 X. 
 
 Fix). 
 
 xF(x). 
 
 xH'ix). 
 
 (X+1)2F(X). 
 
 33-35 
 
 -5 
 
 1 
 
 - 5 
 
 25 
 
 16 
 
 36-38 
 
 -4 
 
 
 
 
 
 
 
 
 
 39-41 
 
 -3 
 
 5 
 
 -15 
 
 45 
 
 20 
 
 42-44 
 
 -2 
 
 8 
 
 -16 
 
 32 
 
 8 
 
 4.5-47 
 
 -1 
 
 15 
 
 -15 
 
 15 
 
 
 
 48-50 
 
 
 
 25 
 
 
 
 
 
 25 
 
 51-53 
 
 + 1 
 
 19 
 
 + 19 
 
 19 
 
 76 
 
 54-56 
 
 +2 
 
 16 
 
 -f32 
 
 64 
 
 144 
 
 57-59 
 
 +3 
 
 8 
 
 + 24: 
 
 72 
 
 128 
 
 60-62 
 
 +4 
 
 2 
 
 + 8 
 
 32 
 
 50 
 
 63-65 
 
 -1-5 
 
 1 
 
 + 5 
 
 25 
 
 36 
 
 Sum 
 
 
 100 
 
 (-51-t-88) 
 
 329 
 
 503
 
 ^Uj ciiarlier's experiments. 137 
 
 Control Check. 
 
 Xx'^Fix) = 329 
 
 2'ExF{x) = 74 
 
 2F(.r) = 100 
 
 Sum = 503 = 2(0,- + l)2/'(a:) 
 
 b = w(88 - 51) : 100 = /r X 0.37 = 1.11, 
 
 or M = 3/o+ b = 50.11, 
 
 a^ = u'2[329 : 100 - 6-]^ = ic'-{3.29 - 0.137) = 28.377, 
 
 or a = 5.33. 
 
 Giving due allowance for the respective mean errors of the mean 
 and the deviation we have finally :- 
 
 .1/ = 50.11 it 0..53G, a = 5.33 d= 0..378. 
 
 We shall now compare these values with the corresponding the- 
 oretical values of the Bernoullian series. The a priori probabil- 
 ities of drawing red and white are in this example p — q — \. 
 Hence we have as the theoretical values for the mean and the 
 dispersion : 
 
 M^ = 100 X i = 50, <Js = VlOO XhXh = 5. 
 
 A comparison between the observed and the theoretical ideal 
 values — taking into account the proper mean errors — shows a 
 very close agreement as far as the dispersion is concerned while 
 the difference in the mean is about \ of the mean error. A 
 computation of the Lexian Ratio and the Charlier Coefficient of 
 Disturbancy yields the following results: 
 
 L = 1.072; lOOp = 3.68. 
 
 Taking into account the proper mean errors due entirely to 
 t\ie fluctuation of sampUmj we find, however, that our theoretical 
 results and formulas of the previous chapters have been verified 
 in an absolutely satisfactory manner. 
 
 80. Charlier's Experiments. — In the above mentioned bro- 
 chure, "Grunddragen." Charlier gives the results of a long series 
 of card drawings illustrating the BernouUian, the Poisson and 
 the Lexian Series. As an example showing the frequency dis- 
 
 • h is expressed in units of u\ 
 
 ^ For mean errors of M and a see Addenda.
 
 138 APPLICATION TO GAMES OF CHANCE. [80 
 
 tributioii in a Bernoullian Series Churlier made 10,001) individual 
 drawings (with replacements) from an ordinary wliist deck and 
 recorded the number of black and red cards drawn in this manner. 
 Arranging the drawings in sample sets of 10 individual drawings, 
 ]M. Charlier gives the following table: 
 
 Bernoullian Series. Number (w) of Black Cards in Sample Sets of 10. 
 .s = 10, N = 1,000, Mo = 5, IV = 1. 
 
 m. 
 
 X. 
 
 F(x). 
 
 xF^x). 
 
 x'^F(x). 
 
 (x +i)mx). 
 
 
 
 -5 
 
 3 
 
 - 15 
 
 + 
 
 75 
 
 + 48 
 
 1 
 
 -4 
 
 10 
 
 - 40 
 
 + 
 
 160 
 
 + 90 
 
 2 
 
 -3 
 
 43 
 
 -129 
 
 + 
 
 387 
 
 + 172 
 
 3 
 
 _2 
 
 116 
 
 -232 
 
 + 
 
 464 
 
 + 110 
 
 4 
 
 -1 
 
 221 
 
 —221 
 
 + 
 
 221 
 
 
 
 5 
 
 
 
 247 
 
 
 
 
 
 
 + 247 
 
 6 
 
 + 1 
 
 202 
 
 +202 
 
 + 
 
 202 
 
 + 808 
 
 7 
 
 +2 
 
 115 
 
 +230 
 
 + 
 
 460 
 
 + 1,035 
 
 8 
 
 +3 
 
 34 
 
 + 102 
 
 + 
 
 306 
 
 + 544 
 
 9 
 
 +4 
 
 9 
 
 + 36 
 
 + 
 
 144 
 
 + 225 
 
 10 
 
 +5 
 
 
 
 
 
 
 
 
 
 
 Sum: 1,000 - 67 +2,419 +3,285 
 
 CoxTKOL Check. 
 
 i:^xF{x) = + 2,419 
 
 2SxF(x) = - 134 
 
 1:F{x) = + 1,000 
 
 Sum = + 3,285 = 2(.r + iyF(x) 
 
 From the above values we obtain: 
 6 = - G7 : 1,000 = - 0.G7; a- = 2,419 : 1,000 - // = 2.415. 
 
 Making due allowance for mean errors we have thus: 
 
 M = 5 - 0.067 = 4.933 ± 0.050; a = 1.554 ± 0.035. 
 
 For the theoretical mean and dispersion we obtain the following 
 values: (v = q = h) 
 
 Mj,= 5; 0-^= 1.581, 
 
 which gives the following values for the Lexian Ratio and the 
 Charlier coefficient : 
 
 L = .983, loop is imaginary. 
 
 These results would indicate a slightly subnormal series. Tak- 
 ing into account the fluctuations due to sampling and for which the
 
 80] charlier's experiments. 139 
 
 mean error serves as a measure the results become normal and 
 serve again as a verification of the theory. 
 
 Poisson Series. — As an illustration of the frequency distribution 
 in a Poisson .Series Charlier made the following experiment: 
 From an ordinary whist deck was drawn a single card and the 
 color noted. Before the second drawing a spade was eliminated 
 from the deck and replaced by a heart from another deck of 
 cards, so that tlie deck then contained 12 spades, 13 clubs, 13 
 diamonds and 14 hearts; from this deck another card w^as drawn 
 and the color noted. Then another spade was eliminated and a 
 heart substituted. P>om this deck, containing 11 spades, 13 
 clubs, 13 diamonds and 15 hearts, a card was again drawn. The 
 drawings were in this manner continued until all the spades were 
 replaced by hearts. The same operation was ai)plied to the 
 clubs, which were replaced by diamonds. After 27 drawings 
 the deck contained only red cards. Altogether 100 sample sets 
 of 27 drawings were made with the following results: 
 
 Poisson Series. Number (w) of Black Cards i.v Sample Sets of 27. 
 s = 27, N = 100, Mo = 7, w = I. 
 
 m. X. F{z). xF[x). x'^F(x). (i + l)2F(z). Control Check. 
 
 18 
 
 24 +378 
 
 14 + 32 
 
 +100 
 
 22 
 
 68 +510 
 126 
 128 
 25 
 
 12 +5 1 +5 +25 36 
 
 13 +6 1 +6 +36 49 
 
 Sum: 100 +16 +378 510 
 
 The calculation of the mean and the dispersion with their 
 respective mean errors yields the following result: 
 
 h= + 0.16, M = 7.16 db 0.211, 
 a-"- = 3.78 - (0.16)- = 3.754, a = 1.937 d= 0.149. 
 
 The theoretical Poisson values according to the formulas of 
 § 67 are: 
 
 Mp - 6.75, (7p = 2.111. 
 
 If we now take the arithmetic mean of the various proba- 
 
 3 
 
 -4 
 
 2 
 
 - 8 
 
 + 32 
 
 4 
 
 -3 
 
 6 
 
 -18 
 
 + 54 
 
 5 
 
 _2 
 
 14 
 
 -28 
 
 + 56 
 
 6 
 
 -1 
 
 14 
 
 -14 
 
 + 14 
 
 7 
 
 
 
 22 
 
 
 
 
 
 8 
 
 + 1 
 
 17 
 
 + 17 
 
 + 17 
 
 9 
 
 +2 
 
 14 
 
 +28 
 
 + 56 
 
 10 
 
 +3 
 
 8 
 
 +24 
 
 + 72 
 
 11 
 
 +4 
 
 1 
 
 + 4 
 
 + 16
 
 140 APPLICATION TO GAMES OF CHANCE. [80 
 
 bilities of drawing a black card wo find that p^ — j. If all the 
 tlrawin^s had been i)c'rformed with a constant probability we 
 should accorcHiiij; to the Bernoullian scheme have: 
 
 Mj, = 27 X I = G.75, cr^ = V27 + i X f = 2.25. 
 
 These results verify the formulas as obtained under the discussion 
 of the Poisson Series. (Mp = Mg, Cp < cr^.) 
 
 Lexian Series. — In testing the Lexian Series Charlier first 
 took 10 samples of 10 individual drawings in each sample from 
 an ordinary whist deck. The number of black cards thus 
 drawn was recorded. After this, 10 samj)les of the same mag- 
 nitude were taken from a deck containing 25 black and 27 red 
 cards; and then 10 samples from a deck with 24 black and 28 red 
 cards. Of the total 270 samples (until the deck contains only 
 red cards) Charlier gives the first 100 which gave the following 
 result: 
 
 Lexian Series. Number (m) of Black Cards in 10 Drawings. 
 s = 10, A^ = 100, Mo = 4. 
 
 m. 
 
 X. 
 
 nx). 
 
 xF{x) 
 
 x^'Fix). 
 
 (x+-l)^F{x). 
 
 Control Check. 
 
 1 
 
 -3 
 
 4 
 
 -12 
 
 + 3G 
 
 + 16 
 
 
 2 
 
 -2 
 
 9 
 
 -18 
 
 + 36 
 
 + 9 
 
 
 3 
 
 -1 
 
 19 
 
 -19 
 
 + 19 
 
 + 
 
 
 4- 
 
 
 
 21 
 
 
 
 
 
 + 21 
 
 
 5 
 
 +1 
 
 23 
 
 +23 
 
 + 23 
 
 + 92 
 
 
 6 
 
 +2 
 
 10 
 
 +20 
 
 + 40 
 
 + 90 
 
 +294 
 
 7 
 
 +3 
 
 12 
 
 +36 
 
 + 108 
 
 + 192 
 
 + 76 
 
 8 
 
 +4 
 
 2 
 
 + 8 
 
 + 32 
 
 + .50 
 
 + 100 
 
 Sum: 100 +38 +294 +470 +470 
 
 The final computations (with mean errors) give: 
 b= + 0.38, M = 4.38 db 0.167, 
 tr2 = 294 : 100 - b'- = + 2.796, <t = + 1.072 ± 0.118. 
 The mean probability in all trials was: 
 
 po = 21.50 : 52 = 0.4,135, or M^ = spo = 4.135, 
 
 ap = ^IspoQa = 1.557. 
 
 A calculation of the mean and the dispersion according to the 
 formulas under the Lexian Series (see § 74) gives according to 
 Charlier: 
 
 i/^ = 4.135, (Tl = 1.643.
 
 81] EXPERIMENTS BY BONYNGE AND FISHER. 141 
 
 This shows that the dispersion in a Lexian Series is greater than 
 the corresponding BernouUian dispersion. The Lexian Ratio: 
 L = cTi^ : (Tii has the value LOG. The series according to the 
 terminology of Lexis has a hypernormal dispersion, although 
 a very small one. Charlier in "Grunddragen" (§ 30) says that 
 when arranging the material in 27 samples, each saiupit con- 
 taining 100 single trials, the Lexian Ratio has the value iy=3.82, 
 indicating a greater hypernormal dispersion than in the smaller 
 samples. 
 
 81. Experiments by Bonynge and Fisher. — As an additional 
 verification of the BernouUian, Poisson and Lexian Series my 
 co-editor, ]\Ir. Bonynge, and myself have repeated the experi- 
 ments of Westergaard and Charlier in a slightly modified form. 
 
 BernouUian Series. — \\\ 20 sample sets, each set containing 
 500 individual drawings, from an ordinary whist deck, I counted 
 the number of diamonds drawn in each sample. My records gave 
 the following scheme: 
 
 Bernoullian Series. Number of Diamonds (m) in 20 Sample Sets op 
 
 500 Drawings. 
 s = 500, N = 20, Mo = 125. 
 
 m. m — Mo. (m — ^o)*. 
 
 123 - 2 4 
 143 + 18 324 
 
 124 - 1 1 
 133 +8 64 
 142 + 17 289 
 130 +5 25 
 117 - 8 64 
 122 - 3 9 
 132 +7 49 
 109 -16 256 
 130 +5 25 
 139 + 14 196 
 138 + 13 169 
 129 +4 16 
 136 + 11 121 
 121 - 4 16 
 135 + 10 100 
 124 - 1 1 
 135 + 10 100 
 116 - 9 81 
 
 Sum: -44 +122 1,910 
 
 The results with their respective mean errors are as follows: 
 M = 128.9 ± 2.01, <T = 8.962 d= 1.416
 
 142 APPLICATION TO GAMES OF CHANCE. [81 
 
 The theoretical Bernoulhan mean and the dispersion have the 
 values : 
 
 M^ = 125, a^ = ^Jfq = V500 X i X f = 9.682, 
 
 where p = I denotes the a priori probability of drawing a 
 diamond. 
 
 Again I counted the number of aces (irrespective of color) 
 which appeared in 100 sample sets of 100 individual drawings 
 from the same deck of cards. The records arranged in classes 
 gave the following scheme: 
 
 Number of Aces (to) in 100 Sample Sets of 100 Individual Drawings. 
 
 
 
 s = 100, 
 
 N = 100, 
 
 Mo = 8, 
 
 w = 1. 
 
 
 m. 
 
 X. 
 
 nx). 
 
 xF{x). 
 
 x^F{x). 
 
 (x+lWix). 
 
 Control Check 
 
 2 
 
 -6 
 
 1 
 
 - 6 
 
 36 
 
 25 
 
 
 3 
 
 -5 
 
 8 
 
 -40 
 
 200 
 
 128 
 
 
 4 
 
 -4 
 
 8 
 
 -32 
 
 128 
 
 72 
 
 
 5 
 
 -3 
 
 7 
 
 -21 
 
 53 
 
 28 
 
 
 6 
 
 -2 
 
 9 
 
 -18 
 
 36 
 
 9 
 
 
 7 
 
 -1 
 
 21 
 
 -21 
 
 21 
 
 
 
 
 8 
 
 
 
 13 
 
 
 
 
 
 13 
 
 
 9 
 
 + 1 
 
 15 
 
 + 15 
 
 15 
 
 60 
 
 
 10 
 
 +2 
 
 3 
 
 + 6 
 
 12 
 
 27 
 
 
 11 
 
 +3 
 
 9 
 
 +27 
 
 81 
 
 144 
 
 +811 
 
 12 
 
 +4 
 
 1 
 
 + 4 
 
 16 
 
 25 
 
 -110 
 
 13 
 
 +5 
 
 2 
 
 + 10 
 
 50 
 
 72 
 
 + 100 
 
 14 
 
 +6 
 
 2 
 
 + 12 
 
 72 
 
 98 
 
 
 
 15 
 
 +7 
 
 
 
 + 
 
 
 
 
 
 801 
 
 16 
 
 +8 
 
 
 
 + 
 
 
 
 
 
 
 17 
 
 +9 
 
 1 
 
 + 9 
 
 81 
 
 100 
 
 
 Sum: 100 -55 811 801 
 
 b= - 55 : 100 = - 0.55, 
 M = Mo+ b = 7.45 ± 0.279 (with mean error), 
 
 <^ = w^ ^.-,, X 6- = / .8075, 
 
 2/'(x) 
 
 or 
 
 a = 2.794 ±0.198 (with mean error). 
 
 The theoretical Bernoullian values are: 
 
 Mj, = 100 X i\ = 7.09, a^ = VlOO X tV X if = 2.663. 
 
 A comparison between tlie empirical and the theoretical a priori 
 values exhibits a close correspondence.
 
 81] EXPERIMENTS BY BONYNGE AND FISHER. 143 
 
 Poisson Series. — As an illustration of the Poisson Series 
 Mr. Bonynge made the following experiment. A sample set of 
 20 single drawings of balls from an urn (one ball being drawn at a 
 time) was made under the following conditions: 
 
 In drawing No. 1 the urn contained 20 white and 20 black balls. 
 
 " 2 " 
 
 u 
 
 <( 
 
 21 
 
 a 
 
 (I 
 
 19 
 
 " 3 " 
 
 (< 
 
 (< 
 
 22 
 
 (< 
 
 II 
 
 18 
 
 ■ ( It II nrv (( <( li oq II u 1 (< a 
 
 Altogether Bonynge took 500 sample sets which arranged in 
 classes give the following scheme: 
 
 Poisson Series. 
 
 Number 
 
 OF Black Balls (m) 
 
 in 500 Sample Sets 
 
 
 Individual Drawings (Bonynge). 
 
 
 
 s = 
 
 20, N 
 
 = 500, Mo 
 
 = 5. 
 
 
 m. 
 
 X. 
 
 J^x). 
 
 xF{x). 
 
 x'^Fix). 
 
 (x + \yF(x). 
 
 
 
 -5 
 
 2 
 
 - 10 
 
 50 
 
 32 
 
 1 
 
 -4 
 
 9 
 
 - 36' 
 
 144 
 
 81 
 
 2 
 
 -3 
 
 35 
 
 - 105 
 
 315 
 
 140 
 
 3 
 
 -2 
 
 52 
 
 - 104 
 
 208 
 
 52 
 
 4 
 
 - 1 
 
 86 
 
 - 86 
 
 86 
 
 
 
 5 
 
 
 
 109 
 
 
 
 
 
 109 
 
 6 
 
 + 1 
 
 85 
 
 + 85 
 
 85 
 
 340 
 
 7 
 
 + 2 
 
 69 
 
 + 138 
 
 276 
 
 621 
 
 8 
 
 + 3 
 
 30 
 
 + 90 
 
 270 
 
 480 
 
 9 
 
 + 4 
 
 16 
 
 + 64 
 
 256 
 
 400 
 
 10 
 
 + 5 
 
 6 
 
 + 30 
 
 150 
 
 216 
 
 11 
 
 + 6 
 
 1 
 
 + 6 
 
 36 
 
 49 
 
 Sum: S= 500 + 72 1876 2520 
 
 Hence we have : 
 
 h = 0.144, M = 5.144, (t^ = 3.732, a = 1.932. 
 
 The theoretical Poisson values are: 
 
 Mp = 5.25, <Tp = 1.86 (see formulas, § 74). 
 
 The mean of the various probabilities of drawing a black ball is 
 Vo — f ^- According to the Bernoullian scheme we should then 
 have the following values for the mean and the dispersion: 
 
 M^ = 20 X U = 5.25, a^ = (20 X U X ||)^ = 1.968. 
 
 These values confirm the Poisson theorems (Mp = M^, dp < ag). 
 Lexian Series. — As additional illustration of the Lexian Series
 
 144 APPLICATION TO GAMES OF CHANCE. [81 
 
 I took 20 sample sets, each set containing 500 drawings of a 
 single ball from an urn (with replacements). The contents of 
 the urn varied from set to set as follows: 
 
 Sample set No. 1 : 20 white and 20 black balls. 
 
 « '< 2 : 21 " " 19 " 
 
 " " " 3 : 22 " " 18 " " 
 
 " " " 20 : 39 " " 1 " " 
 
 In the 21st set all the black balls were eliminated and the urn 
 contained white balls only. This set, however, was not taken in 
 consideration in calculating the mean and the dispersion. 
 
 Lexian Series. 
 
 Number 
 
 (m) 
 
 OF Black B 
 
 alls in 20 1 
 
 Sample Si 
 
 
 Individual Drawings (Fisher). 
 
 
 
 s = 
 
 500, 
 
 AT = 20, . 
 
 Mo = 130. 
 
 
 No. of Set. 
 
 111. 
 
 
 (w — Mo). 
 
 (m - MoY 
 
 1 
 
 251 
 
 
 + 121 
 
 14641 
 
 2 
 
 240 
 
 
 + 116 
 
 13456 
 
 3 
 
 222 
 
 
 + 
 
 92 
 
 8464 
 
 4 
 
 216 
 
 
 + 
 
 86 
 
 7396 
 
 5 
 
 193 
 
 
 + 
 
 63 
 
 3969 
 
 6 
 
 176 
 
 
 + 
 
 46 
 
 2116 
 
 7 
 
 183 
 
 
 + 
 
 53 
 
 2809 
 
 8 
 
 173 
 
 
 + 
 
 43 
 
 1849 
 
 9 
 
 156 
 
 
 + 
 
 26 
 
 676 
 
 10 
 
 135 
 
 
 + 
 
 5 
 
 25 
 
 11 
 
 140 
 
 
 + 
 
 10 
 
 100 
 
 12 
 
 127 
 
 
 - 3 
 
 
 9 
 
 13 
 
 115 
 
 
 - 15 
 
 
 225 
 
 14 
 
 96 
 
 
 - 34 
 
 
 1156 
 
 15 
 
 78 
 
 
 - 52 
 
 
 2704 
 
 16 
 
 69 
 
 
 - 61 
 
 
 3721 
 
 17 
 
 55 
 
 
 - 75 
 
 
 5625 
 
 18 
 
 43 
 
 
 - 87 
 
 
 7569 
 
 19 
 
 29 
 
 
 - 101 
 
 
 10201 
 
 20 
 
 19 
 
 
 - Ill 
 
 
 12321 
 
 Sum: 
 
 S = 
 
 
 - 539 + 661 
 
 99012 
 
 b = (661 - 539) : 20 = 6.6, M = Mo-\-b = 136.0 ± 15.86. 
 (^2 = 99012: 20 -62 = 4913.4, a = 70.098d=11.09 (with mean errors). 
 The theoretical Lexian values are: 
 
 Ml = 131.25, ar. = 72.676 (see § 74).
 
 81] EXPERIMENTS BY BONYNGE AND FISHER. 145 
 
 If the series represented a true Bernoullian Series, we should 
 have 
 
 Mb = 500 X U = 131.25, as = V500 X f J X M = 9.839. 
 
 These values confirm the Lexian Theorem (Ml — Mb, o-l><tb)- 
 A computation of the Charlier Coefficient of Disturbancy from 
 the observed values gives : 
 
 lOOp = 50.80 
 
 whereas the theoretical value is 55.38, showing a decidedly 
 hypernormal dispersion, a result which was to be expected since 
 the probabilities of drawing black varies from | to ^^ in the 
 various sets of samples. 
 
 All the above experiments show a completely satisfactory 
 verification of the various theorems of the previous chapters 
 and may perhaps serve as a vindication of the followers of 
 Laplace, who like him hold that an a priori foundation for 
 probability judgments is indispensable. 
 
 11
 
 CHAPTER XII. 
 
 CONTINUATION OF THE APPLICATION OF THE THEORY OF 
 PROBABILITIES TO HOMOGRADE STATISTICAL SERIES. 
 
 82. General Remarks. — In this chapter it is our intention to 
 discuss the application of the theory of probabilities to homograde 
 statistical series with special reference to vital statistics. We 
 owe the reader an apology, however, inasmuch as in the former 
 paragraphs we have employed the term .sfafistics without defining 
 its meaning in a rigorous manner. A definition may perhaps 
 appear superfluous since statistics nowadays is almost a house- 
 hold word. The term unfortunately is often employed as a mere 
 phrase without any understanding of its real meaning. This 
 applies especially to that band of self-styled statisticians, mere 
 dilettanti, who, with an energy which undoubtedly could be 
 better employed otherwise, attempt to investigate and analyze 
 mass phenomena regardless of method and system. When 
 investigations are undertaken by such dilettanti the common 
 gibe that "statistics will prove anything" becomes, alas, only 
 too true and proves at least that "like other mathematical tools 
 they can be wielded effectively only by those who ha\e taken the 
 trouble to understand the way they work."^ 
 
 By the science of statistics ice understand the recording and 
 subsequent quantitative analysis of observed mass phenomena. 
 
 By mathematical statistics (also called statistical methods) we 
 understand the quantitative determination and measurement of the 
 effect of a complex of causes acting on the object under investigation 
 as furnished by previously recorded observations as to certain attri- 
 butes among a collective body of individual objects. 
 
 Practical statistics — if such a name may be used — then simply 
 becomes the mechanical collection of statistical data, i. e., the 
 recording of the observed attributes of each individual. In no 
 way do we wish to underestimate the importance of this process 
 
 > See Nunn, "Exercises in Algebra" (London, 1914), pages 432-33. 
 
 146
 
 83] STATISTICAL DATA AND MATHEMATICAL PROBABILITIES. 147 
 
 which is as important for the statistical analysis as is the gathering 
 of structural materials for the erection of a large building. 
 
 Mathematical statistics is thus the tool we must use in the final 
 analysis of the statistical data. It is a very effective and powerful 
 tool when used properly by the investigator. At the same time 
 it is not an automatic calculating machine in which we need only 
 put the material and read off the result on a dial. A person 
 without any knowledge whatsoe^Tr about the nature of loga- 
 rithms may in a few hours be taught how to use a logarithmic 
 table in practical computations, but it would be foolish to view 
 the formulas and criteria from probabilities when applied to 
 statistical data in the same light as a table of logarithms in cal- 
 culating work. Such formulas and criteria must be used with 
 caution and discretion and only by those who have taken the 
 trouble to make a thorough study of probabilities and master 
 their real meaning and their relation to mass i)henomena. If 
 put in the hands of mere amateurs the formulas become as 
 dangerous a toy as a razor to a child. 
 
 It is not our intention to give in this work a description of the 
 technique of the collection of the material, which depends to a 
 large extent on local social conditions and for which it is difficult 
 to give a set of fixed rules. In the following we sluill treat the 
 mathematical methods of statistics exclusively, and furthermore 
 make the theory of probabilities the basis of our investigations, 
 
 83. Analogy between Statistical Data and Mathematical 
 Probabilities. ^Let us for the moment imagine a closed commun- 
 ity with a stationary population from year to year and let us 
 denote the size of such a population by s. Let us furthermore 
 suppose we w^ere given a series of numbers: 
 
 mi, mo, mz, • • ■ m^, 
 
 denoting the number of children born in various years in this 
 community. The ratios 
 
 mi mo mz mu 
 
 s ' ,s ' s ' s 
 
 may then be looked upon as probabilities of a childbirth in 
 various years. As Charlier justly remarks, "such an identi- 
 fication of a statistical ratio with a mathematical probability is
 
 148 HOMOGRADE STATISTICAL SERIES. [83 
 
 ax first sight a mere analogy which possibly may have very little 
 in common Anth the observed statistical phenomena, but a 
 closer scrutiny shows the great importance for statistics of such 
 a view." If such ratios could be regarded as mathematical 
 probabilities wherein the various ??i's were identical to favorable 
 cases in s total trials, the mean and the dispersion could be de- 
 termined a priori from the Bernoullian Theorem. The founders 
 of mathematical statistics regarded the identification of an or- 
 dinary statistical series with a Bernoullian Series almost as 
 axiomatic. This view is found even among some leading writers 
 of the present time. Among others we apparently find this 
 traditional view by the eminent English actuary, G. King, in his 
 classic "Text Book." In Chapter II of this well-known standard 
 actuarial treatise a probability is defined as follows: "If an event 
 may happen in a ways and fail in (3 ways, all these ways being 
 equally likely, the probability of the happening of the event is 
 a -T- (a + /3)." With this definition as a basis King then de- 
 duces the elementary formulas of the addition and multiplication 
 theorems. He then continues: "Passing now to the mortality 
 table, if there be h persons living at age .v, and if these h+n survive 
 to age X -\- n, then the probability that a life aged .r will survive 
 n years is Z^+n ^ h = nPx- And again "tlie probability that a 
 life aged x and a life aged y will both survive n years is „PxX nVv"^ 
 From the above it would appear that the author unreservedly 
 assumes a one-to-one correspondence between the U+n survivors 
 and "favorable ways" as known from ordinary games of chance 
 and a similar correspondence between the original J^ i)ersons and 
 "equally possible cases." A simple consideration will sliow that 
 there exists no a priori reason for such a uiiifiue correspondence 
 between ordinary empirical death rates and mathematical proba- 
 bilities. None of the original h persons can be considered as 
 
 ' Mr. H. Moir in his "Primer of Insurance" tried to avoid the difTioulty by- 
 giving a wholly new definition of "equally likely events." According to 
 Moir "events may be said to be 'equally likely' when they recur with regu- 
 larity in the long run." Apart from the half metaphorical term "in the long 
 run" Mr. Moir fails to state what he means by the expression "with regu- 
 larity." If the statement is to be understood as regular repet it ions of a certain 
 event in various sample sets, it is evident that we may obtain a regular recur- 
 rence of the observed absolute frequencies in a Poisson Series, where — as 
 we know — the events are not equally likely." — A.F.
 
 84] COMPARISON AND PROPORTIONAL FACTORS. 149 
 
 being "equally likely" as in the sense of games of chance. 
 Numerous factors such as heredity, environment, climatic and 
 economic conditions, etc., play here a vital part in the various 
 complexes embracing the original Ix persons. 
 
 The belief in an absolute identity of mathematical probabilities 
 and statistical frequency ratios seems to have originated from 
 Gauss. The great German mathematician — or rather the 
 dogmatic faith in his authority as a mathematician — proved 
 thus for a number of years a veritable stumbling block to a 
 fruitful devel()})inent of mathematical statistics. Gauss and his 
 followers maintained that all statistical mass phenomena could 
 be made to conform with the law of errors as exhibited by the 
 so-called Gaussian Normal Error Curve. If certain statistical 
 series exhibited discrepancies they claimed that such deviations 
 arose from the limited number of observations. The deviations 
 would become less marked if the number of observed values was 
 enlarged and would eventually disappear as the number of ob- 
 servations approached infinity as its ultimate value. The Gaus- 
 sian dogma held sway despite the fact that the Danish actuary, 
 Oppermann, and the French mathematicians, Binemaye and 
 Cournot, have pointed out that several statistical series, despite 
 all efforts to the contrary offered a persistent defiance to the 
 Gaussian law. The first real attack on tlie dogma laid down so 
 authoritatively by Gauss was delivered by the French actuary, 
 Dormay, in certain investigations relating to the French census. 
 It was, however, first after the appearance of the already men- 
 tioned brochure by Lexis, "Die ]\Iassenerscheinungen, etc.," that 
 a correct idea was gained about the real nature of statistical 
 series. 
 
 The Lexian theory was expounded in the previous chapters of 
 this work, and we are therefore ready to enter upon the investi- 
 gations of a few selected mass observations from the domain of 
 vital statistics. 
 
 84. Number of Comparison and Proportional Factors. — In 
 the mathematical treatment of the Lexian theory of dispersion 
 we tacitly assumed that the total number of individual trials in 
 a sample set or the number of comparison, s, remained constant 
 from set to set. In the observations on games of chance it
 
 150 
 
 HX)MOGRADE STATISTILAL SERIES. 
 
 84 
 
 remained in our power to arrange the actual experiments in such 
 a manner that s would be constant. In actual social statistical 
 series such simple conditions do not exist. In comparing the 
 number of births in a country with the total population it is 
 readily noticed that the poi)ulation does not remain constant 
 but varies from year to year. For this reason the various 
 numbers m denoting the births are not directly comparable with 
 another. We may, however, easily form a new series of the form: 
 
 Wa', 
 
 s s s s 
 
 — • Vli, — • mo, — ■ 7??3, • ■ • — 
 *1 ^2 *3 -^A' 
 
 wherein the various numbers, mi, mo, ms • • •, corresponding to 
 the numbers of comparison Si, S2, s^, • • ■ , are reduced to a constant 
 number of comparison s. This series is by Charlier called a 
 reduced statistical series. Such a reduction requires, in many 
 
 Proportional Factoks for a Hypothetical Stationary Population in 
 
 Sweden and Denmark Equal to 5,000,000 and 2,500,000 
 
 Respectively. 
 
 
 Sweden, 
 
 
 
 npnmark, 
 
 
 Year. 
 
 Inhabitants. 
 
 a:ai,. 
 
 Year. 
 
 Inhabitants. 
 
 s:sk 
 
 1876 
 
 4,429,713 
 
 1.1288 
 
 1888 
 
 2,143,000 
 
 1.1666 
 
 1877 
 
 4,484,542 
 
 1.1150 
 
 89 
 
 2,161,000 
 
 1.1569 
 
 1878 
 
 4,531,863 
 
 1.1033 
 
 1890 
 
 2,179.000 
 
 1.1473 
 
 79 
 
 4,578,901 
 
 1.0919 
 
 91 
 
 2,195.000 
 
 1.1390 
 
 1880 
 
 4,565,668 
 
 1.0952 
 
 92 
 
 2,210,000 
 
 1.1312 
 
 81 
 
 4,572,245 
 
 1.0936 
 
 93 
 
 2,226,000 
 
 1.1230 
 
 82 
 
 4,579,115 
 
 1.0919 
 
 94 
 
 2,248,000 
 
 1.1121 
 
 83 
 
 4,603,595 
 
 1.0861 
 
 1895 
 
 . 2,276.000 
 
 1.0984 
 
 84 
 
 4,644,448 
 
 1.0765 
 
 96 
 
 2,3()6,(K)0 
 
 1.0841 
 
 1885 
 
 4,682,769 
 
 1.0677 
 
 97 
 
 2,33S,(H)0 
 
 1.0694 
 
 86 
 
 4,717,189 
 
 1.0600 
 
 98 
 
 2,371,000 
 
 1.0544 
 
 87 
 
 4,734,901 
 
 1.0560 
 
 99 
 
 2,403,000 
 
 1.0404 
 
 88 
 
 4,748,257 
 
 1.0.530 
 
 1900 
 
 2,432,000 
 
 1.0280 
 
 89 
 
 4,774,409 
 
 1.0472 
 
 01 
 
 2,4()2,00() 
 
 1.01.54 
 
 1890 
 
 4,784,981 
 
 1.0449 
 
 02 
 
 2,491,000 
 
 1.0036 
 
 91 
 
 4,802,751 
 
 1.0410 
 
 03 
 
 2,519.<)()() 
 
 0.9925 
 
 92 
 
 4,806,865 
 
 1 .0402 
 
 04 
 
 2. .546,000 
 
 0.9819 
 
 93 
 
 4,824,150 
 
 1 .0365 
 
 1905 
 
 2,574,000 
 
 0.9713 
 
 94 
 
 4,873,183 
 
 1.0261 
 
 06 
 
 2,603,000 
 
 0.9604 
 
 1895 
 
 4,919,260 
 
 1.0165 
 
 07 
 
 2.()35.()00 
 
 0.9488 
 
 96 
 
 4,962,568 
 
 1 .0076 
 
 OS 
 
 2,(')))S,()()0 
 
 0.9370 
 
 97 
 
 5,00n,()32 
 
 0.0981 
 
 09 
 
 2,7()2,()()0 
 
 0.92.52 
 
 98 
 
 5,062,918 
 
 0.9875 
 
 1910 
 
 2,7:',7,()00 
 
 0.91.34 
 
 1899 
 
 5,097.402 
 
 0.9809 
 
 11 
 
 2,80(),()()0 
 
 0.8929 
 
 1900 
 
 5,136,441 
 
 0.9734 
 
 1912 
 
 2,830,000 
 
 0.8834
 
 85] CHILD BIRTHS IN SWEDEN. 151 
 
 cases, a certain correction. However, wiien the general ratios 
 s -7- Sk {k = I, 2, 3 • ■ ■ N) are close to unity the reduced series 
 may be treated as a directly observed series. In most of the 
 following examples taken from Scandinavian statistical tabular 
 works the proportional factor s -r- Sk, is close to unity as shown in 
 the table below. For Sweden I have, following Charlier, assumed 
 a stationary population s = 5,000,000. The corresponding 
 Danish s I have taken as 2,500,000. 
 
 The above figures are taken from " Sveriges officielle statistik " 
 and "Statistisk Aarbog for Danmark " for 1913 (Precis de 
 Statistique, 1913). 
 
 85. Child Births in Sweden. — From Charlier's "Grunddragen" 
 I select the following example showing the number of children 
 born in Sweden in the period from 1881-1900 as reduced to a 
 stationary population of 5,000,000. 
 
 Number of Children born in Sweden as to Calendar Year (Charlier). 
 s = 5,000,000, X = 20, Mo = 140,000. 
 
 Year. 
 
 m. 
 
 TO — Mo. 
 
 
 (to - A/o)'. 
 
 1881 
 
 145,230 
 
 +5,230 
 
 
 27,352,900 
 
 82 
 
 146,630 
 
 +6,630 
 
 
 44,089,600 
 
 83 
 
 144,320 
 
 +4,320 
 
 
 18,662,400 
 
 84 
 
 149,.360 
 
 +9,360 
 
 
 87,609,600 
 
 1885 
 
 146,600 
 
 +6,600 
 
 
 43,560,000 
 
 86 
 
 148,270 
 
 +8,270 
 
 
 68,392,900 
 
 87 
 
 148,020 
 
 +8,020 
 
 
 64,320,400 
 
 88 
 
 143,680 
 
 +3,680 
 
 
 13,542,400 
 
 89 
 
 138,300 
 
 
 -1,700 
 
 2,890,000 
 
 1890 
 
 139,600 
 
 
 - 400 
 
 160,000 
 
 91 
 
 141,070 
 
 + 1,070 
 
 
 1,144,900 
 
 92 
 
 134,830 
 
 
 -5,170 
 
 26,728,900 
 
 93 
 
 136,540 
 
 
 -3,460 
 
 11,971,600 
 
 94 
 
 134,840 
 
 . 
 
 -5,160 
 
 26,625,600 
 
 1895 
 
 136,820 
 
 
 -3,180 
 
 10,112,400 
 
 96 
 
 135,330 
 
 
 -4,670 
 
 21,808,900 
 
 97 
 
 132,750 
 
 
 -7,2.50 
 
 52,.562,500 
 
 98 
 
 134,820 
 
 
 -5,180 
 
 26,832,400 
 
 99 
 
 131,320 
 
 
 -8,680 
 
 75,342,400 
 
 1900 
 
 134,460 
 
 
 -5,540 
 
 30,691,600 
 
 
 Sum 2 = 
 
 = + 53,190 - 
 
 - 50,390 
 
 654,401,400 
 
 From which we obtain: 
 
 6 = (+ 53,190 - 50,.390) : 20 = 140 
 M = Mo+b= 140,140
 
 152 HOMOGRADE STATISTICAL SERIES. [86 
 
 a- = 654,401,400 : 20 - b"- = 32,700,470, or a = 5,718. 
 The empirical probability of a birth (po) is 
 
 po = M :s = ().()2S()3, so that qo = I - po = 0.97197 and the 
 Beriioullian dispersion 
 
 o-fi = ^spo f/o = 369.0. 
 
 The actual observed dispersion (5,718) is thus much greater 
 than the Bernoullian. The birth series is considerably hyper- 
 normal. The Lexian ratio has the value 
 
 L = 5,718 : 369.0 = 15.50, 
 
 while the Charlier coefficient of disturbancy is: 
 
 lOOp = 4.07. 
 
 Both the values of L and p show that the birth series by no 
 means can be conij)ared with the ordinary games of chance but 
 is subject to outward ]>erturbing influences. 
 
 86. Child Births in Denmark. — The following example shows 
 the corresponding birth series for Dermiark in the 25-year period 
 from 1888-1912 as reduced to a stationary population of 2,500,000. 
 The computation of the various parameters follows: 
 
 b = (39,713 - 30,287) : 25 = + 377, 
 M = Mo+b= 73,377, 
 cr2 = 281,208,156 : 25 - 6^ = 11,106,197.2, 
 cr/ = s2)o qo = 71,223. (po = M : s = 0.0293508), 
 1 = ^ .^^= 12.5 
 
 lOOp = 100( V^2 _ ^^2) . ^/ = 4 52. 
 
 NuMUEK OF Children Born in Denmark as to Calendar Year. 
 
 5 = 2,500,000, A^ = 25, .1/n = 73,000. 
 
 Year. m. m — Mo. (m — Mo)^. 
 
 1888 78,659 + 5,659 32,024,281 
 
 89 77,956 + 4,956 24,561,936 
 
 1890 76,154 + 3,154 9,947,716 
 
 91 77,377 + 4,377 19,158,129 
 
 92 74,059 + 1,059 1,121,481 
 
 93 76,965 + 3,965 15,721,225 
 
 94 75,9.56 + 2,956 8,740,636 
 1895 75,649 + 2,649 7,017,201 
 
 96 76,183 + 3,183 10,131,489 
 
 97 74,404 + 1,404 1,971,216
 
 86 
 
 CHILD BIRTHS IN DENMARK. 
 
 153 
 
 Year. 
 
 m. 
 
 
 m 
 
 ■-Mo. 
 
 (m-A/o)2. 
 
 98 
 
 75,570 
 
 
 + 
 
 2,570 
 
 6,604,900 
 
 99 
 
 74,236 
 
 
 + 
 
 1,236 
 
 1,527,()06 
 
 1900 
 
 74,146 
 
 
 + 
 
 1,146 
 
 1,313,316 
 
 01 
 
 74,341 
 
 
 + 
 
 1,341 
 
 1,798,281 
 
 02 
 
 73,058 
 
 
 + 
 
 58 
 
 3,364 
 
 03 
 
 71,802 
 
 - 1,198 
 
 
 
 1,435,204 
 
 04 
 
 72,359 
 
 - 641 
 
 
 
 410,881 
 
 1905 
 
 70,981 
 
 - 2,019 
 
 
 
 4,076,361 
 
 06 
 
 71,280 
 
 - 1,720 
 
 
 
 2,958,400 
 
 07 
 
 70,516 
 
 - 2,484 
 
 
 
 6,170,256 
 
 08 
 
 71,438 
 
 - 1,567 
 
 
 
 2,455,489 
 
 09 
 
 79,597 
 
 - 2,403 
 
 
 
 5,774,409 
 
 1910 
 
 68,777 
 
 - 4,223 
 
 
 
 17,833,729 
 
 11 
 
 66,016 
 
 - 6,984 
 
 
 
 48,776,256 
 
 1912 
 
 85,952 
 
 - 7,048 
 
 
 
 49,674,304 
 
 
 Sum: 2 
 
 = -30,287 
 
 +39,713 
 
 281,208,156 
 
 Practically the same deductions hold true for this Danish 
 series as for the Swedish series. We meet again a hypernormal 
 series subject to perturbing influences. The closeness of the 
 two values of the Charlier coefficient of disturbancy indicates 
 that the number of births in Sweden and Denmark apparently 
 are subject to the same outward disturbing influences. 
 
 87. Danish Marriage Series. — The following table shows the 
 number of marriages in Denmark from 1888-1912. 
 
 
 Number of M 
 
 ARRIAGES ] 
 
 [N Denmark. 
 
 
 
 &■ = 2,500,000, 
 
 .V = 25, . 
 
 Mo = 18,000. 
 
 
 Year. 
 
 m. 
 
 m — Ma. 
 
 
 (m - ilfo)». 
 
 1888 
 
 17,605 
 
 - 395 
 
 
 156,025 
 
 89 
 
 17,622 
 
 - 378 
 
 
 142,884 
 
 1890 
 
 17,181 
 
 - 819 
 
 
 670,761 
 
 91 
 
 17,017 
 
 - 983 
 
 
 966,289 
 
 92 
 
 17,012 
 
 - 988 
 
 
 976,144 
 
 93 
 
 17,676 
 
 - 324 
 
 
 104,976 
 
 94 
 
 17,445 
 
 - 555 
 
 
 308,025 
 
 1895 
 
 17,736 
 
 - 264 
 
 
 69,696 
 
 96 
 
 18,239 
 
 
 + 239 
 
 57,121 
 
 97 
 
 18,676 
 
 
 + 676 
 
 456,976 
 
 98 
 
 18,870 
 
 
 + 870 
 
 756,900 
 
 99 
 
 18,661 
 
 
 + 661 
 
 436,921 
 
 1900 
 
 19,015 
 
 
 + 1,015 
 
 1,030,225 
 
 01 
 
 17,870 
 
 - 130 
 
 
 10,900 
 
 02 
 
 17,712 
 
 - 288 
 
 
 82,944 
 
 03 
 
 17,791 
 
 - 209 
 
 
 43,681 
 
 04 
 
 17,895 
 
 - 105 
 
 
 11,025 
 
 1905 
 
 17,947 
 
 - 53 
 
 
 2,809
 
 154 HOMOGRADE STATISTICAL SERIES. [87 
 
 Year. 
 
 m. 
 
 06 
 
 18,592 
 
 07 
 
 19,072 
 
 08 
 
 18,750 
 
 09 
 
 18,453 
 
 1910 
 
 18,255 
 
 11 
 
 17,749 
 
 1912 
 
 18,034 
 
 - 251 
 
 in—Mo. 
 
 (m-Afo)". 
 
 + 592 
 
 350,464 
 
 + 1,072 
 
 1,149,184 
 
 + 750 
 
 562,500 
 
 + 453 
 
 205,209 
 
 + 255 
 
 65,025 
 
 
 63,001 
 
 + 34 
 
 1,156 
 
 Sum: 2 = -5,742 +6,617 8,686,841 
 
 Hence we have: 
 
 b = (6,6-17 - 5,742) : 25 = 35, M = Mo + 6 = 18,035. 
 
 <T- = (8,686,841 : 25) - b^- = 346,249, a = 588.43, 
 
 (Ti, = 133.81, L = 4.41, lOOp = 5.73. 
 
 We encounter again a hypernormal series with quite large 
 perturbations. For Sweden Charlier has computed che coef- 
 ficient of disturbancy for marriages in the period 1876-1900 and 
 found it to be 5.49. A comparison with the same quantity for 
 the above Danish data shows that the perturbing influences 
 for the two countries are about the same. 
 
 88. Stillbirths. — As another example from vital statistics I 
 give the number of stillbirths in Denmark from 1888-1912 as 
 compared with a hypothetical number of 70,000 births per annum. 
 
 Number of Stillbirths in Denmark as Reduced to a Stationary Number 
 
 OF 70,000 Births per Annum. 
 
 s = 70,000, N = 25, Mo = 1,700. 
 
 Year. m. m— Mo. (m — Afo)». 
 
 1888 1,861 + 161 25,921 
 
 89 1,924 + 224 50,176 
 
 1890 1,830 + 130 16,900 
 
 91 1,779 + 79 6,241 
 
 92 1,811 +111 12,321 
 
 93 1,788 + 88 7,744 
 
 94 1,719 + 19 361 
 1895 1,7.53 + 53 2,809 
 
 96 1,714 + 14 196 
 
 97 1,811 + 111 12,321 
 
 98 1,797 + 97 9,409 
 
 99 1,7.37 + 37 1,369 
 1900 1,696 - 4 16 
 
 01 1,732 + 32 1,024 
 
 02 1,694 - 6 36 
 
 03 1,685 - 15 225 
 
 04 1,682 - 18 324
 
 89] COAL MINE FATALITIES. 155 
 
 Year. m. M-mo. (m— Mo)*. 
 
 1905 1,705 +5 -25 
 
 06 1,620 - 80 6,400 
 
 07 1,723 + 23 529 
 
 08 1,694 - 6 36 
 
 09 1,665 - 35 1,225 
 1910 1,658 - 42 1,764 
 
 11 1,6.59 - 42 1,764 
 
 12 1,638 - 62 3,844 
 
 Sum: 2 = -310 +1,184 161,216 
 
 Actual computation gives: 
 h = (1,184 - 310) : 25 = 34.9G, .1/ = 1,734.96, 
 <j- = 101, 21() : 25 - h- = 5,220.44, lOOp = 3.407. 
 
 The series is again liypernonnal. We shall show presently, 
 when discussing the disturbing influences, that this series after 
 the elimination of the secular perturbations actually represents a 
 normal series. In the meantime we give a few examples relating 
 to accident statistics. 
 
 89. Coal Mine Fatalities. — The following table gives the 
 number of deaths from accidents in coal mines in various countries 
 in the period 1901-1910 together with the number of compari- 
 son s. 
 
 United 
 Year Belgium Austria England France Germany Japan States 
 
 s = 140,000 s = 68,000 s = 900.000 s = 180,000 8 = 500.000 s = 110,000 s = 610,000 
 
 1901 164 81 1,224 218 1,170 263 1,982 
 
 02 150 73 1,116 196 995 188 2,263 
 
 03 160 50 1,134 184 960 278 1,952 
 
 04 130 62 1,116 193 900 239 2,135 
 1905 127 99 1,215 187 930 3.54 2,214 
 
 06 133 70 1,161 1,262 985 .578 2,944 
 
 07 U4 73 1,179 198 1,240 399 2,977 
 
 08 150 58 1,188 171 1,355 262 2,220 
 
 09 133 73 1,287 210 1,021 667 2,440 
 1910 133 63 1,530 194 985 245 2,391 
 
 This gives the following values for the Charlier coefficient: 
 
 lOOp 
 
 Belgium 2.55 
 
 Austria 13.85 
 
 England 4.71 
 
 France 34.19 
 
 Germany 9.27 
 
 Japan 44.121 
 
 U. S. A 12.07 
 
 1 1 doubt whether the Japanese data as given by the Bureau of Mines are 
 reliable.
 
 156 HOMOGRADE STATISTICAL SERIES. [89 
 
 The comparatively large values of p show that the fatal ac- 
 cidents in coal mines are subject to violent perturbations. The 
 disturbinj:!; influences are greatest for France where the Charlier 
 coefficient is above 3-i, which immediately shows that some 
 powerful disturbing influence has made itself felt. Looking over 
 the table we find a very large number of deaths for the year 1906. 
 The extremely heavy death rate in this year was caused by the 
 Courrieres mine explosion, in which 1,099 persons lost their lives 
 and marks j^robably the most fatal disaster in the whole history 
 of coal-mining. Eliminating this catastrophe from the data in 
 the table given above we find indeed that the coefficient of dis- 
 turbancy becomes imaginary, indicating very stable conditions 
 in French mines. Thus eliminating the more fatal catastrophes 
 we get at least for France a subnormal series for the everyday 
 accidents. In order better to illustrate the influence of the 
 elimination of the most disturbing catastrophes I submit the 
 following two series as reduced to a stationary s = 630,000 of 
 fatal coal mine accidents in the United States in the period 
 1900-1914 as recorded by the Bureau of Mines. The first series 
 shows total number of deaths nik, the second series gives the total 
 deaths nik' per year after eliminating all such accidents in w^hich 
 5 or more men were killed. 
 
 Number of Deaths from Accidents in Coal Mines in United States. 
 
 
 
 s = 
 
 ■ 630,000, N = 
 
 = 15. 
 
 
 
 
 mic 
 
 Ttlt' 
 
 
 
 fj%ii 
 
 TO*' 
 
 1900 
 
 2,173 
 
 1,843 
 
 
 1908 
 
 2,293 
 
 1,967 
 
 01 
 
 2,048 
 
 1,863 
 
 
 09 
 
 2,520 
 
 2,053 
 
 02 
 
 2,337 
 
 1,837 
 
 
 1910 
 
 2,470 
 
 2,085 
 
 03 
 
 2,016 
 
 1,768 
 
 
 11 
 
 2,3.50 
 
 1,984 
 
 04 
 
 2,20.5 
 
 1,911 
 
 
 12 
 
 2,060 
 
 1,839 
 
 1905 
 
 2,286 
 
 1,964 
 
 
 13 
 
 2,3.50 
 
 1,957 
 
 06 
 
 2,111 
 
 2,075 
 
 
 1914 
 
 2.070 
 
 1,810 
 
 07 
 
 3,074 
 
 2,190 
 
 
 
 
 
 The first series gives a coefficient of disturbancy equal to 11.06 
 while the same cpiaiitity for the second series has the value 5.51. 
 Despite the fact that tlie coefficient of disturbancy is reduced 
 about 50 per cent, there still remains disturbing influences, which 
 clearly shows that conditions in American mines are not so stable 
 as in the mines of France, Belgium and England.
 
 90] REDUCED AND WEIGHTED SERIES IN STATISTICS. 157 
 
 90. Reduced and Weighted Series in Statistics. — So far all 
 our problems in statistical analysis have been related to series 
 where the value of s was constant or where the ratio s : Sk was 
 so close to unity that it might be used as a factor of propor- 
 tionality. We shall now consider the case where this ratio differs 
 greatly from unity. As an illustration of this kind of series I 
 choose the number of fatal coal mine accidents in various states 
 of the American Federation together with the number of people 
 engaged in coal mining in these states. The figures as taken from 
 the report of the Bureau of Mines relate to the year of 1914.^ 
 
 Number of Persons Engaged in Mining (sii) and Number Killed 
 
 (wii) in 20 States During the Year 1914. 
 
 s = 1000. .V = 20. 
 
 s*. mic. posk- Itoj— po«A:l- 
 
 1 Alabama 24,552 128 73 55 
 
 2 Colorado 10,550 75 31 44 
 
 3 Illinois 79,529 141 237 96 
 
 4 Indiana 22,110 44 66 22 
 
 5 Iowa 15,757 37 47 10 
 
 6 Kansas 12,500 33 37 4 
 
 7 Kentucky 26,332 61 79 18 
 
 8 Maryland 5,675 18 17 1 
 
 9 Missouri 10,418 19 31 12 
 
 10 New Mexico 4,021 18 12 6 
 
 11 Ohio 45,815 62 136 74 
 
 12 Oklahoma 8,948 31 27 24 
 
 13 Pennsylvania 175,745 595 524 71 (Anthracite Mines) 
 
 14 Pennsylvania 172,196 402 513 111 (Bituminous Mines) 
 
 15 Tennessee 9,580 26 29 3 
 
 16 Texas 4,900 11 15 4 
 
 17 Virginia 9,162 27 27 
 
 18 Washington 5,730 17 17 
 
 19 W. Virginia 74,786 371 223 148 
 
 20 Wyoming 8,353 51 25 26 
 
 Sum: 2 = 726,659 2,167 709 
 
 It will be noted that the population engaged in mining varies 
 
 greatly from state to state. In making a simple reduction to a 
 
 common number of comparisons by a proportional factor it is 
 
 evident, however, that we would give the same weight to the 
 
 observed from New INIexico with a population of miners equal to 
 
 1 Catastrophes in the Eccles Mine in \\'est Virginia and in the Royalton 
 Mine of Illinois are eliminated.
 
 158 HOMOGRADE STATISTICAL SERIES. [90 
 
 4,021 as to the mining population of the state of Pennsylvania 
 
 where over 340,000 persons are engaged in the same industry. 
 
 This procedure is faulty. Let us imagine for the moment two sets 
 
 of drawings from a bag containing white and black balls. The 
 
 first sample set contained 10,000 drawings and the second set 
 
 only 100 drawings. If these series were reduced to a common 
 
 number of comparison s = 1,000 we should have 
 
 1,000 , 1,000 , , J. P , 
 
 m 1 and ~T;^<r '"2 (wi and ???2 standmg tor the number 
 
 of white balls) as tlie number of white balls drawn in sample sets 
 of 1,000 single drawings. 
 
 But these values are not eciually rehable. The mean error in 
 the second series is in fact 10 times as large as the mean error in 
 the first series. In order to overcome this difficulty we ask the 
 reader to consider the following series: 
 
 The element — ?7?i is repeated si times 
 
 tt u 
 
 t( a 
 
 '-m, " 
 
 i< 
 
 St 
 
 
 — ms 
 
 cc 
 
 S2 
 
 S3 
 
 In this way we obtain a series with Si + So + ■'?3 + ■ • • + ^at 
 elements which may be termed a reduced and iceigJited series 
 since the larger Sk appears oftener than the smaller values of Sk- 
 We shall now see if it is possible to determine the expected value 
 of the mean and the dispersion if the series is supposed to follow 
 the Bernoullian Law. 
 
 The mean is defined by the following relation: 
 
 -mi 
 
 -sr 
 
 s s s 
 
 M= -miH H WiH 7/12+ •••+- W2
 
 90] REDUCED AND WEIGHTED SERIES IN STATISTICS. 159 
 
 < Sy » 
 
 s s \ 
 
 + • • • + — m^v + • • • + — Wjv M- [*i + 52 + • • • + .?vl 
 
 Denoting the average empirical probability by po we have 
 
 2mfc : 25fc = po and, 
 
 J/^ = .s'Po. 
 
 As to the dispersion it takes on the following form: 
 
 ''' = [w'''~'^''J'^ ---^ik"''''^''} 
 
 <: S Y > 
 
 -^ [^1 + ^2 + • • • Sy\ 
 
 Hsk -TTik — spo ) 2 - {nik—SkPo)- 
 
 In finding the theoretical dispersion, assuming a Bernoullian 
 distribution for which po may be used an an approximation of 
 the mathematical a priori probability, we ask the reader to 
 examine the general term of the expression for a", viz.: 
 
 ^ (mk — •npo)- : ^Sk. 
 
 If the individual trials follow the Bernoullian Law the expected 
 value of the factor {rtik — SkPoY takes the form: 
 
 e[(mk — .s'kpo)-] = 2(»^. — .n-poYtpinik) = Skpoqo. 
 This brings the general term for cr- to the form: 
 
 o 
 S' S 
 
 ^ PoQo = ^r- spoqo-
 
 160 HOMOGRADE STATISTICAL SERIES. [90 
 
 Thus the expected value of a- accordhig to the Bernoulliaii 
 distribution may be written as follows: 
 
 ^•=-v g jV"* 
 
 o-b" = 2 s^spoqo = -^-spoqo, or : a^ = fVspoQo, 
 
 where as before /^ = -"U- : -^t find / = ^Z- 
 
 JNs_ 
 
 These formulas give us the means of computing the Lexian 
 Ratio and the Charlier coefficient of disturbancy in the ordinary 
 way. Some of the computations require, however, a great 
 amount of arithmetical work and the goal is reached more 
 easily by making use of the mean deviation (in § 74a). 
 We found there the following relation: 
 
 In the w^eighted series it is readily seen that the value of d 
 will be of the form: 
 
 \Sk 
 
 Z^j vik — Skpo 
 
 ^Sk ^Sk 
 
 If the series may be assumed to follow a Bernoullian distri- 
 bution we have 
 
 o-g = 1.253:]t?. 
 
 P'rom the above formulas it is readily noticed that we may find 
 the mean and the dispersion directly from the observed series 
 without a preliminary reduction to a common number of com- 
 parison s. This is in fact the method used in the above example 
 of coal-mine accidents in various states. We have: 
 
 2^0 
 
 Znik : ^Sk = 2,167 : 726,659 = .002982, 
 
 ^ ^ 2.9! w, — Skpoj ^ ^^^^^ ^ ^^^ . 726.659 = 0.9757, 
 
 a = 1.253.3 X I? = 1.223, 
 "^0 000 
 
 (Ts' = P-nm^ = ;^.,^ X 1,000 x 0.997 x 0.003 = 0.O817, 
 
 lOOV^-cr^^ 
 lOOp = ^ = 40 approx.
 
 91 ] SECULAR AND PERIODICAL FLUCTUATIONS. 161 
 
 The large value of the Charlier coefficient of disturbancy 
 clearly shows that conditions in coal mines by no means are 
 uniform in the whole union but vary greatly according to the 
 locality. An actual computation shows in fact that in a few 
 states such as ^Michigan and Iowa we find an imaginary coeffi- 
 cient of disturbancy whereas States as Ohio and West Virginia 
 exhibit marked hypernormal series with a large coefficient of 
 disturbancy. The establishment of this fact is of some im- 
 portance in connection with accident assurance. Many sta- 
 tisticians seem to be of the opinion that a standard accident table 
 computed from the data of the whole union ought to serve as 
 the basis for assurance premiums. Such a table would assume 
 uniform conditions all over the union. The enormously high 
 value of p as computed above shows the fallacy of such a view. 
 
 91. Secular and Periodical Fluctuations. — In the last para- 
 graphs we have just learned how to detect the presence of dis- 
 turbing influences in a statistical series. A value of the Lexian 
 ratio differing from unity or a value of the Charlier coefficient of 
 disturbancy differing from zero indicates the presence of fluc- 
 tuations in the chances for the event or phenomena under in- 
 vestigation. After having established the presence of such 
 fluctuations it is the duty of the statistician to trace the sources 
 of the disturbing influences. This is in general done by means of 
 the theory of correlation, which will be discussed in the second 
 volume of this work. 
 
 It is, however, possible to classify the disturbances under two 
 categories which by Charlier are termed as secular and periodical 
 variations.^ The periodical fluctuations are in general difficult 
 to discuss on account of the variations in the period of the dis- 
 turbing forces. In many cases we are in absolute ignorance 
 about the length of such a period and therefore unable to subject 
 the series to a mathematical analysis. If the length of the period 
 is known it is indeed not difficult to determine the periodical 
 disturbances. This is often the case in series giving the occur- 
 rence of a certain disease in various months. In statistics giving 
 the frequency of malaria in a community the observed cases are 
 
 1 Lexis uses the terms "evolutionary" ("symptomatic") and "periodical" 
 ("oscilating") for such fluctuations. 
 
 12
 
 102 HOMOGRADE STATISTICAL SERIES. [91 
 
 nearly all limited to the warmer months and infrequent in the 
 winter months. 
 
 In the secular fluctuations due to certain outward influences 
 workinj; continually in the same direction it is quite easy to 
 calculate the rate of such variations. 
 
 Let /3 denote the increase (decrease) of the original probabilities 
 (pi, p-2, Pa, ■ ■ • Pn) from set to set in the given statistical series 
 so that 
 
 P2 — Pi = /3 
 
 Ps — P2 = 
 
 Pn - Pn-i = /3 
 
 We then have: 
 
 Pk = Pi + a- - i)/3. (1) 
 
 The mean probability has the value: 
 
 Pl + P2 + P3 + ■ ■ ■ + Pn 
 
 Po =- 
 
 N 
 
 _ Pi + Pi + ^ + Pi + 2/3 + • • • + Pi + (iV - 1)^ ,.,, 
 
 N ^"^ 
 
 = Pi + ^^/?. 
 Eliminating pi from (1) and (2) we have: 
 
 Pk - Po= (^ ^- - 2 ) ^' 
 
 If the observed and reduced numbers »?i, Wo, '"3, • • • ^y may be 
 regarded as approximately coinciding with ,s7>i, sp^, sp^, • ■ ■ sp.^ 
 we may write (2) as follows: 
 
 ('■ - m 
 
 In order to obtain an expression for .<f/3 in known quantities we 
 must climate the quantity k. ]\Iulti])lying both sides of the 
 equation (3) by k — (X -\- l)/2 we have: 
 
 {rrik 
 
 -,„(/.--l±i) = (;.-^^)V
 
 91 ] SECULAR AND PERIODICAL FLUCTUATIONS. 163 
 
 Summing this expression for all values k from k = 1 to k = N 
 we have: 
 
 .(._^),.,_,,)_.,,(,_^y 
 
 (4) 
 
 The following expressions from the summation of series are 
 well known to the reader from elementary algebra: 
 
 Z A- = ^ A^(A^ + 1), 
 
 Substituting these values in (4) we obtain after a few simple 
 trar^iormations the following expression for .s'/3: 
 
 12 / N + l\ 
 
 Tr '.ECULAK Annual Decrease OF Number of Stillbirths in Denmark. 
 s = 70,000, .V = 25, M = 1,735 
 
 
 
 
 
 
 * 2~ 
 
 Year. 
 
 k. 
 
 mk. 
 
 mt- 
 
 -M. 
 
 1SS8 
 
 1 
 
 1,SG1 
 
 + 126 
 
 -12 
 
 89 
 
 2 
 
 1,924 
 
 + 189 
 
 -11 
 
 1890 
 
 3 
 
 1,830 
 
 + 105 
 
 -10 
 
 91 
 
 4 
 
 1,779 
 
 + 
 
 44 
 
 - 9 
 
 92 
 
 5 
 
 1,811 
 
 + 
 
 76 
 
 - 8 
 
 93 
 
 6 
 
 1,788 
 
 + 
 
 53 
 
 - 7 
 
 94 
 
 7 
 
 1,719 
 
 — 
 
 16 
 
 - 6 
 
 1895 
 
 8 
 
 1,753 
 
 + 
 
 18 
 
 - 5 
 
 96 
 
 9 
 
 1,714 
 
 — 
 
 21 
 
 - 4 
 
 97 
 
 10 
 
 1,811 
 
 + 
 
 76 
 
 - 3 
 
 98 
 
 11 
 
 1,797 
 
 + 
 
 62 
 
 - 2 
 
 99 
 
 12 
 
 1,737 
 
 + 
 
 2 
 
 - 1 
 
 1900 
 
 13 
 
 1,696 
 
 — 
 
 39 
 
 
 
 01 
 
 14 
 
 1,7:32 
 
 - 
 
 3 
 
 + 1 
 
 02 
 
 15 
 
 1,694 
 
 — 
 
 41 
 
 + 2 
 
 03 
 
 16 
 
 1,685 
 
 - 
 
 50 
 
 + 3 
 
 04 
 
 17 
 
 1,682 
 
 — 
 
 53 
 
 + 4 
 
 1905 
 
 IS 
 
 1,705 
 
 - 
 
 30 
 
 + 5 
 
 06 
 
 19 
 
 1,602 
 
 _ 1 
 
 115 
 
 + 6 
 
 07 
 
 20 
 
 1,723 
 
 — 
 
 12 
 
 + 7 
 
 08 
 
 21 
 
 1,694 
 
 - 
 
 41 
 
 + 8 
 
 09 
 
 22 
 
 1,665 
 
 — 
 
 70 
 
 + 9 
 
 1910 
 
 23 
 
 1,658 
 
 - 
 
 77 
 
 + 10 
 
 11 
 
 24 
 
 1,658 
 
 - 
 
 77 
 
 + 11 
 
 1912 
 
 25 
 
 1,638 
 
 - 
 
 97 
 
 + 12 
 
 (*~^2^)^'"*- 
 
 M). 
 
 - 1,512 
 
 - 2,079 
 
 - 1,C50 
 
 - 396 
 
 - 808 
 
 - 371 
 + 96 
 
 - 90 
 + 84 
 
 - 228 
 
 - 124 
 
 - 2 
 
 
 - 3 
 82 
 
 - 150 
 
 - 212 
 
 - 150 
 
 - 690 
 84 
 
 - 328 
 
 - 630 
 
 - 770 
 
 - 847 
 
 - 1,164 
 
 Sum: -11,590
 
 1()4 HOMOGRADE STATISTICAL SERIES. [91 
 
 As ail exaraplo illustrating secular fluctuations I take the 
 previously discussed series of stillbirths in Denmark. 
 Wc luive in this case 
 
 hence; 
 
 s(3= - 11,590 : 1,300 = - 8.92. 
 
 From this we may draw the conclusion that the number of still- 
 births in Denmark pr. 70,000 births per annum on the average 
 is decreased by 8.92. 
 
 If the fluctuations are of an essential secular character we may 
 
 write 
 
 m = M + {k- 13)(- 8.92) 
 
 as the number of stillbirths pr. annum. Apart from accidental 
 fluctuations due to sampling we should therefore obtain a nearly 
 normal series for the 2r)-vear period if we calculated the number 
 of stillbirths each year according to the expression: iiik 
 — {k — 13)(— 8.92). Such a computation is given below: 
 
 Number of Stillbikths in Denmark Freed from Secular Fluctuations. 
 
 .s = 70,000, N = 25. 
 Year. k. to*- (it - 13) (- 8.92). Year. k. m*- (fc - 13)( - 8.92). 
 
 1888 1 1,754 
 
 89 2 1,826 
 
 1890 3 1,741 
 
 91 4 1,699 
 
 92 5 1,740 
 
 93 6 1,726 
 
 94 7 1,066 
 1895 8 1,708 
 
 96 9 1,678 
 
 97 10 1,784 
 
 98 11 1,779 
 
 99 12 1,728 
 
 A computation of the characteristics of this series gives: 
 
 M = 1,735, a = 37.09, <t,i = 41.0, lOOp imaginary. 
 
 The dispersion is now slightly subnormal and the coefficient 
 of disturbancy is imaginary whereas in the original series in 
 § 88 it had a value equal to 3.4. 
 
 1900 
 
 13 
 
 1,696 
 
 01 
 
 14 
 
 1,741 
 
 02 
 
 15 
 
 1,712 
 
 03 
 
 16 
 
 1,712 
 
 04 
 
 17 
 
 1,718 
 
 1905 
 
 IS 
 
 1,730 
 
 06 
 
 19 
 
 1,675 
 
 07 
 
 20 
 
 1,875 
 
 08 
 
 21 
 
 1,765 
 
 09 
 
 22 
 
 1,745 
 
 1910 
 
 23 
 
 1,747 
 
 11 
 
 24 
 
 1,756 
 
 1012 
 
 25 
 
 1,745
 
 92] CANCER STATISTICS. 165 
 
 92. Cancer Statistics. — Mr. F. L. Hoffman in his treatise " The 
 MortaUty from Cancer Throughout the World" gives some very 
 interesting statistics on mortaHty from cancer in various h)cahties. 
 Through the kin(hiess of ]Mr. Hoffman I am able to submit the 
 following series relating to cancer among males in the City of 
 New York (Manhattan and Bronx Boroughs) : 
 
 Deaths from Cancer {rrik) in the City of New York as Reduced to a 
 Stationary Population of 1,000,000. 
 
 Year. 
 
 1889 
 
 1890 
 91 
 92 
 93 
 94 
 
 1895 
 96 
 97 
 98 
 99 
 
 1900 
 01 
 02 
 03 
 04 
 
 1905 
 06 
 07 
 08 
 09 
 
 1910 
 11 
 12 
 
 1913 
 
 A computation of the dispersion and the Charlier coefficient 
 of disturbancy gives a value of lOOp in the neighborhood of 18, 
 indicating marked fluctuations. An inspection of the series shows 
 immediately that there is a marked increase in the rate of death 
 from cancer. Working out the secular disturbances in the ordi- 
 
 
 s = 1,000,000, 
 
 N = 25, 
 
 M = 560. 
 
 
 h. 
 
 rrik. 
 
 mii—M. 
 
 -T- ('- 
 
 ^-^^)(..-M) 
 
 1 
 
 377 
 
 -183 
 
 -12 
 
 2,196 
 
 2 
 
 476 
 
 - 84 
 
 -11 
 
 924 
 
 3 
 
 410 
 
 -150 
 
 -10 
 
 1,500 
 
 4 
 
 444 
 
 -116 
 
 - 9 
 
 1,044 
 
 5 
 
 462 
 
 - 98 
 
 - 8 
 
 784 
 
 6 
 
 423 
 
 -137 
 
 - 7 
 
 959 
 
 7 
 
 442 
 
 -118 
 
 - 6 
 
 708 
 
 8 
 
 493 
 
 - 67 
 
 - 5 
 
 335 
 
 9 
 
 505 
 
 - 55 
 
 - 4 
 
 220 
 
 10 
 
 515 
 
 - 45 
 
 - 3 
 
 135 
 
 11 
 
 513 
 
 - 47 
 
 - 2 
 
 94 
 
 12 
 
 547 
 
 - 13 
 
 - 1 
 
 13 
 
 13 
 
 595 
 
 + 35 
 
 
 
 
 
 14 
 
 540 
 
 - 20 
 
 + 1 
 
 -20 
 
 15 
 
 580 
 
 + 20 
 
 + 2 
 
 40 
 
 16 
 
 609 
 
 + 49 
 
 + 3 
 
 147 
 
 17 
 
 639 
 
 + 79 
 
 + 4 
 
 316 
 
 18 
 
 619 
 
 + 59 
 
 + 5 
 
 295 
 
 19 
 
 658 
 
 + 98 
 
 + 6 
 
 588 
 
 20 
 
 631 
 
 + 71 
 
 + 7 
 
 497 
 
 21 
 
 683 
 
 + 123 
 
 + 8 
 
 984 
 
 22 
 
 710 
 
 + 150 
 
 + 9 
 
 1,350 
 
 23 
 
 710 
 
 + 1.50 
 
 + 10 
 
 1,500 
 
 24 
 
 721 
 
 + 161 
 
 + 11 
 
 1,771 
 
 25 
 
 718 
 
 + 158 
 
 + 12 
 
 Sum: 2 
 
 1,896 
 = 18,276
 
 166 HOMOGRADE STATISTICAL SERIES. [93 
 
 nary manner we find: 
 
 indicating an increase of death from cancer of about 14 persons 
 pr. annum for a population of 1,()0(),0()(). Eliminating the secular 
 disturbances in the same manner as above, we now get a coefficient 
 of disturbancy equal to ().9S3/ (/ = V — 1), practically a normal 
 dispersion when taking into account the mean error due to 
 sampling. 
 
 93. Application of the Lexian Dispersion Theory in Actuarial 
 Theory. Conclusion. — The Russian actuary, Jastremsky, has 
 applied the Lexian Dispersion Theory in testing the influence of 
 medical selection in life assurance.^ The research by Jastremsky 
 evolves about the following question. Is medical selection a 
 phenomena independent of the age of the assured? Let ^'^(/^ 
 denote the observed rate of mortality after t years' duration of 
 assurance. In the same manner q/'^ denotes the rate of mor- 
 tality of a life aged x after 7 or more years of duration (i ^ 5). 
 Forming the ratio ^'V/^ : f/^^^^ for all ages of x we obtain a certain 
 homograde series for which we may compute the Lexian Ratio 
 and the Charlier Coefficient and thus determine if the fluctuations 
 are due to samphng onlv or dependent on the age of the assured. 
 Space does not allow us to give a detailed account of the very 
 interesting research by Jastremsky as applied to the Austro- 
 Ilungarian Mortality Table (Vienna, 1909), and we shall limit 
 ourselves to quote his final results as to the Lexian Ratio, L, 
 for Whole Life Assurances and Endowment Assurances: 
 
 
 Whole Life Assurances. 
 
 Endowment Assurances 
 
 t 
 
 L 
 
 L 
 
 1 
 
 0.88 
 
 1.01 
 
 2 
 
 0.89 
 
 0.96 
 
 3 
 
 1.12 
 
 1.05 
 
 4 
 
 1.05 
 
 0.98 
 
 5 
 
 1.07 
 
 0.91 
 
 The above values of L all lie close to unity and the series may 
 therefore be considered as a BernouUian Series where the fluctu- 
 
 ' Ja.strcmsky. "Dcr Aualesc-Koeffizient," Zcilschr. f. d. yes. Vers.-Wiss., 
 Band XII, 1912.
 
 93 ] APPLICATION OF THE LEXIAN DISPERSION THEORY. 167 
 
 ations are due to sampling entirely. Or in other words, the ratio 
 iPt = ^'^7^ : qJ'^^ is a quantity independent of the age of the 
 assured. 
 
 The great majority of statistical series may be subjected to a 
 similar analysis as given in the preceding chapters. The char- 
 acteristics as described previously, the Lexian Ratio and the 
 coefficient of disturbancy, tell us the magnitude of possible fluc- 
 tuations from sam])le to sample. In many cases we may by means 
 of the secular coefficient of disturbancy, ^, partly or wholly 
 eliminate such fluctuations, due to secular causes, and thus be in 
 a better position to study the periodical fluctuations. 
 
 A statistical research may be likened to the navigation of a 
 difficult waterway, full of hidden rocks and skerries out of sight 
 to the navigator. The amateur statistician, sailing the ocean 
 in a blind and happy-go-lucky manner, often comes to grief on 
 those rocks and suffers a total shipwreck. The skillful navigator, 
 the mathematically trained statistician, is always on the lookout 
 for the sea marks. In the Lexian Ratio and the Charlier Coef- 
 ficient of Disturbancy he recognizes a beacon light, often signal- 
 ling "Danger ahead." He stops his engines. In case he does 
 not possess the particular charts giving the exact location of the 
 hidden reefs his prudence advises him to call a i)ilot to bring his 
 ship safely in harbor. On the other hand, if he has reliable 
 charts and knows his profession thoroughly he may venture 
 forth and do his own piloting, by a study of the charts. It is 
 to the study of such charts — i. e., a special study of the higher 
 statistical characteristics — that we shall turn our attention in 
 the second i)art of this treatise. The reader who has followed us 
 up to this point may perhaps feel discouraged by realizing how 
 little he has gained in knowledge after having learned a mass of 
 technical detail and formulas. We can quite appreciate and 
 understand this feeling. So far, he has perhaps chiefly been 
 impressed by the treacherous and misleading character of sta- 
 tistical mass phenomena, but to recognize a danger signal and 
 thus avoid the pitfalls is one of the fundamental essentials in 
 safe navigation in statistical research.
 
 PART II 
 
 FREQUENCY CURVES AND 
 HETEROGRADE STATISTICS
 
 CHAPTER XIII. 
 
 THE THEORY OF ERRORS AND FREQUENCY CURVES AND ITS 
 APPLICATION TO STATISTICAL SERIES. HISTORICAL NOTES. 
 
 94. General Remarks. The Hypothesis of Elementary Er- 
 rors. — In the previous chapters we have discussed the elementary 
 statistical parameters, the mean and the dispersion, together with 
 the Lexian ratio and the Charlier coefficient of disturbancy and 
 their application to the mathematical analysis of the homograde 
 series. We shall now extend this discussion to the parameters of 
 higher orders, such as the skewness and the excess, and also give 
 the theory for a mathematical analysis of the other great domain 
 of statistical series, the heterograde series. 
 
 The main reason for the separate treatment of the homograde 
 statistical series is on account of their close analogy to ordinary 
 mathematical probabilities. Whenever the number of comparison, 
 s, may be regarded as equivalent to the total number of equally 
 possible cases in ordinary a priori probabilities and the observed 
 occurrences of the attribute (event) as the favorable number of 
 cases, m, among the total number of possible events, s, we are 
 justified in regarding the ratio ni : s in the light of a mathematical 
 probability. For this reason all homograde series may be ex- 
 plained as being sul)ject to the same mathematical laws as those 
 governing ordinary a priori probabilities, which are fully explained 
 by the series of Bernoulli, Poisson and Lexis and the various 
 combinations of such series. IMoreover, in all such series it is 
 possible to compute both the mean and the dispersion by the 
 indirect or combinatorial process instead of the direct or physical 
 process. 
 
 The nucleus of the three fundamental series, the Bernoullian, 
 the Poisson and the Lexian, as well as their various combinations 
 is found in the development of the point binomial (p + qY of 
 the Bernoullian Theorem as described in Chapter IX where the 
 general term expressing the probability of the occurrence of an 
 event E a times and of the complementary event E, ^ = s — a 
 times is given by the formula 
 
 169
 
 170 THEORY OF ERRORS AND FREQUENCY CURVES. [ 94 
 
 The numerical computation of this exact expression becomes, 
 however, too unwieldy for large values of s and we shall therefore 
 try to replace it with a more flexible approximation, preferably 
 by a continuous function or by a rapidly convergent infinite series. 
 On page 101 we gave such an approximation formula for the 
 maximum value of <p (a), denoted by the symbol Tm- We wish, 
 however, to find a simpler expression for the more general term as 
 well. This further development necessitates the determination of 
 several higher characteristics or parameters than those expressed 
 by the mean and the dispersion. If we should succeed in this task 
 the homograde series can be fully explained by the theory of 
 mathematical probabilities and placed upon a sound a priori 
 basis. 
 
 The question now arises whether the a priori theory of mathe- 
 matical probabilities will furnish a similar basis for the second 
 domain of statistics, the heterograde series. We are of course able 
 to compute by means of the direct or physical process both the 
 mean and the dispersion in various heterograde series, such as 
 measurements on heights of adult males, number of fin rays in 
 fishes or number of telephone calls over a trunk line in a given 
 interval of time. But are we also able, like in the case of the 
 homograde series, to forecast those parameters by means of the 
 criterions of the Bernoullian, Poisson and Lexian series, i. e. by 
 the indirect or combinatorial process? 
 
 A simple consideration will soon lead us to the admission that 
 no a priori reasoning or a simple theorem like that of the Ber- 
 noullian will enable us to forecast the mean stature of Danish, 
 Norwegian, Swedish or English adult males or the mean number of 
 telephone calls over a trunk wire in a given interval of time. And 
 while we by the physical or direct process can compute both the 
 mean and the dispersion from previously collected statistical data, 
 we have no way of knowing whether such parameters, purely 
 empirical in form and nature, have any real significance beyond 
 that of abstract mathematical calculations. Nor do such empirical 
 parameters offer similar explanations as those of the homograde 
 series. We arc for instance able to predict the probability that in a 
 series of 1000 successive drawings (with replacements) from a 
 deck of whist the number of drawn aces will fall between 100 and 
 120. But we are not able by an a priori reasoning or by mathe- 
 matical deduction to forecast the probability that among 1000 
 Scandinavian adult males — all chosen at random — the height
 
 94 ] GENERAL REMARKS. 171 
 
 of an arbitrarily selected individual will fall between 170-175 
 centimeters. 
 
 Experience has, however, shown that the heterograde series 
 show similar grouping tendencies around the mean value as those 
 encountered in the homograde series. As an example we may- 
 compare the BernouUian series of black cards in sample sets of 10 
 as collected by M. Charlier and shown on page 138 and the Poisson 
 series of black l)alls in sample sets of 20 collected by Mr. Bonynge 
 (shown on page 143) with a series of measurements, relating to the 
 heights of Danish conscripts for the year 1916. Such a comparison 
 is given below. 
 
 Charlier's Bonynge's Danish Anthropometric 
 
 Data Data Data 
 
 m X F{x) m x F{x) w x F{x) 
 
 0— .5 3 0—5 2 
 
 1 
 
 — 4 
 
 10 
 
 2 
 
 — 3 
 
 43 
 
 3 
 
 — 2 
 
 116 
 
 4 
 
 — 1 
 
 221 
 
 5 
 
 
 
 247 
 
 6 
 
 + 1 
 
 202 
 
 7 
 
 + 2 
 
 115 
 
 8 
 
 + 3 
 
 34 
 
 9 
 
 + 4 
 
 9 
 
 10 
 
 + 5 
 
 
 
 1 
 
 — 4 
 
 9 
 
 2 
 
 — 3 
 
 35 
 
 3 
 
 — 2 
 
 52 
 
 4 
 
 — 1 
 
 86 
 
 5 
 
 
 
 109 
 
 6 
 
 + 1 
 
 85 
 
 7 
 
 + 2 
 
 69 
 
 8 
 
 + 3 
 
 30 
 
 9 
 
 + 4 
 
 16 
 
 10 
 
 + 5 
 
 6 
 
 11 
 
 + 6 
 
 1 
 
 140-145 cm. 
 
 — 5 
 
 32 
 
 146-150 " 
 
 — 4 
 
 44 
 
 151-1.55 " 
 
 — 3 
 
 243 
 
 156-160 " 
 
 — 2 
 
 1284 
 
 161-165 " 
 
 — 1 
 
 3777 
 
 166-170 " 
 
 
 
 5742 
 
 171-175 " 
 
 + 1 
 
 4796 
 
 176-180 " 
 
 + 2 
 
 2129 
 
 181-185 " 
 
 + 3 
 
 588 
 
 186-190 " 
 
 + 4 
 
 81 
 
 over 191 " 
 
 + 5 
 
 11 
 
 1000 18727 
 
 500 
 
 The grouping tendency or the clustering around the mean value 
 is manifest in all three series; but while this tendency in the case 
 of the two homograde series as offered in the experiments by 
 Charlier and Bonynge may be fully explained by means of the 
 theorems of mathematical probabilities no such reasoning is 
 sufficient to explain the clusteruig tendency of the heterograde 
 series relating to Danish conscripts. The calculus of probabilities 
 in itself would not be sufficient to explain the grouping tendency of 
 the variates in a heterograde series unless a general hypothesis 
 will aid us in explaining the variation among several heterograde 
 objects in respect to a specific attribute. 
 
 Thus the question which now confronts us is whether it is 
 possible to establish a simple hypothesis which will enable us to 
 extend the principles of the mathematical theory of probal)ilities
 
 172 THEORY OF ERRORS AND FREQUENCY CURVES. [ 94 
 
 to the domain of the heterograde series and to build up a theory 
 similar to that of the honiograde series. The great Laplace was 
 the first to solve this problem, and his investigations and analysis 
 on this important subject are indcHnl some of the most important, 
 but also some of the most difficult to follow in his Theorie des 
 Probahilites. The hypothesis employed by Laplace in explaining 
 the phenomena of variation in a heterograde series is the hypothesis 
 of elementary errors. The hypothesis was later on somewhat 
 simplified by the German astronomer and engincM^-, Hagen, and 
 it has of late years been further developed through the elegant 
 researches of the Scandinavian astronomer and statistician, 
 M. Charlier. According to the Laplacean— Hagen — Charlier 
 theory every variate or individual deviation from a certain norm is 
 generated as the sum of a mass of small and unknown quantities — 
 generally infinite in number — which are known as elementary errors 
 (deviations) . The word error nuist of course be tak(Mi in a different 
 sense than that we usually associate with the word. In precision 
 measurements we are actually dealing with true or natural errors 
 arising from impcu-fections of iho instruments and the observer, 
 but it would of course not be right to regard a deviation of say 
 5 centimeters from the mean stature of a population group as an 
 error in the usual sense of the word. Used in its wider sense as an 
 expression for deviations the term will, however, be readih' under- 
 stood, and it is in this sense we shall use it in the following jiages. 
 Expressed in matheniatical symbols the hypothesis of elemen- 
 tary errors may be presented as follows. Let .r^. (where k = 
 1, 2, 3, . . . s denotes the kth error source among a total of s 
 sources) represent the magnitude of a statistical variate expressed 
 as a deviation (error) from a certain norm, then 
 
 /, (r) (A; = 1, 2, 3, . . . s) 
 
 may be regarded as the probability that x^ assumes the value r. 
 As to this particular elementary error probability function Laplace 
 makes no other assumptions than those which follow directly from 
 the definition of a mathematical probability. That is to say 
 
 0</. (r)<l, 
 where A; = 1, 2, 3, . . . s and r = ± 1, ± 2, ± 3, . . . d= co 
 
 Since it is certain that one or more of the aljove values of r,
 
 95 ] APPLICATION TO STATISTICAL SERIES. 173 
 
 whether positive or negative, are bound to occur, we have evidently 
 the relation 
 
 r = +00 
 
 The epoch-making analysis of Laplace lies in the determination 
 of the unknown function /^(r) from such simple and general 
 assumptions. 
 
 95. Application to Statistical Series. Definitions. — The La- 
 placean-Charlier hypothesis of elementary errors opens the way 
 for a mathematical analysis of a vast number of statistical data 
 and series, which we shall briefly discuss in the following para- 
 graph. First of all we submit therefore the following definition of a 
 statistical object. 
 
 A number of similar objects (a species) which can be arranged in 
 numerical order according to the measurable variation of a certain 
 observed attribute (character), also called a variate, is known as a 
 statistical object, eventually as a statistical series. 
 
 It is readily seen that this definition covers a wide range of sub- 
 jects and that statistical methods instead of being applicable to 
 social and economic problems only are equally useful in botany, 
 zoology, biolog>' and even in astronomy, physics or chemistry. 
 Moreover, since the deviation of an individual variate of the 
 statistical series as measured from a certain arbitrarily choosen 
 norm evidently may be regarded as the sum of several elementary 
 errors (the word error to be taken in its wider sense), it is evident 
 that the statistical object can be subjected to a mathematical 
 analysis on the basis of the theory of errors. 
 
 A simple consideration will also convince the reader that the 
 above definition covers not alone the heterograde series but also 
 the homograde series. For instance in the Bernoullian and Poisson 
 series as presented in the experiments by Charlier and Bonynge 
 on page 170, the number tn, which gives the number of favorable 
 events in each sample set, may be considered as a statistical 
 variate and F{x) as a statistical series. 
 
 This simple fact is of the utmost importance since it makes 
 it possible to treat both the homograde and heterograde 
 series on the common ])asis of elementaiy errors and links 
 in the case of the homograde series the a priori mathematical 
 probabilities with the a posteriori probabilities. Such connection
 
 174 THEORY OF ERRORS AND FREQUENCY CURVES. [ 95 
 
 is of special interest in the further treatment of the celebrated 
 Rule of Bayes. 
 
 While thus the honiograde and heterograde series may be 
 viewed from a common viewpoint it is, however, necessary to 
 point out a distinct difference in the nature of the statistical 
 variates themselves. In one case we find the variate (the measura- 
 ble attribute) expressed in whole numbers onlj', such as the number 
 of fin rays in fishes, petals in flowers or the occurrence of a specified 
 color in card drawings. The variates are in such cases known as 
 integral variates. The observations on tail fin rays of flounders by 
 the Danish biological station on page 131 offer such an example. 
 As a further illustration of integral vjjriates we choose the follow- 
 ing statistical series from the observations of the English phys- 
 icists, Rutherford and Geiger. Messrs. Rutherford and Geiger 
 counted the numbers of alpha particles radiated from a bar of 
 polonium during a long series of intervals, each lasting one-eighth 
 of a minute. The table states the number of times, F(x), the 
 number of particles omitted in this interval had a given' value, x. 
 
 X 
 
 F{x) 
 
 
 
 57 
 
 1 
 
 203 
 
 2 
 
 383 
 
 3 
 
 525 
 
 4 
 
 532 
 
 5 
 
 408 
 
 6 
 
 273 
 
 7 
 
 139 
 
 8 
 
 45 
 
 9 
 
 27 
 
 10 
 
 10 
 
 11 
 
 4 
 
 12 
 
 
 
 13 
 
 1 
 
 14 
 
 1 
 
 ( 
 
 As an example of a very slight variation the Danish biologist and 
 botanist, W. Johannsen, quotes the following observations by 
 his colleague Professor Raunkjcer of Copenhagen on the number of 
 involucral leaves of 100 samples chosen at random of taraxacum 
 erythrospernum: 
 
 No. of Leaves Frequency 
 X F{x) 
 
 13 99 
 
 14 1
 
 95] APPLICATION TO STATISTICAL SERIES. 175 
 
 In other cases it is not possible to express the measure of the 
 attribute in whole numbers. Thus measurements of stature, 
 chest circumference and weight of recruits, or measurements of the 
 percentages of sterility in wheat, barley, r3'c and oats will in 
 general possess all possible fractional values between two integral 
 numbers. Hence we must group the; observations in classes, and 
 such classified variates are known as graduated variate.s. 
 
 The measurements of heights of Danish conscripts for the 
 year 1910 and shown on page 170 offer an illustration of grad- 
 uated variates. Another case is furnished in the number of deaths 
 by attained ages in a mortality table. In most mortality tables 
 the deaths are given by integral ages only and represent then^fore 
 strictly sp(>aking integral variates. 
 
 In biology we encounter numerous homograde series especially 
 in investigations on dimorphism or polymorphism. Johannsen of 
 Copenhagen produced from crossbr(>eding between a species of 
 beans with white blossoms and yellow seeds and a species with 
 violet lilossoms and black seeds a bastard species with violet 
 blossoms and muddy colored seeds. The offsprings — 558 individ- 
 ual plants — of this Ijastard showed following variations : 
 
 White Blossoms Violet Blossoms 
 
 IGO 398 
 
 Color of Seeds Color of S<'eds 
 
 I I 
 
 II II 
 
 yellow bronze violet black 
 
 39 121 105 293 
 
 In respect to blossoms we have two alternatives, in respect to 
 seeds 4 alternatives. 
 
 As a few illustrations of the wid(^ range of vai'iable phenomena 
 which allow to be classified as statistical objects, we present the 
 following table:
 
 176 
 
 THEORY OF ERRORS AND FREQUENCY CURVES. 
 
 96 
 
 Statistical Object. 
 
 Sample. 
 
 Attribute. 
 
 Variate. 
 Integral. 
 
 Seriefi. 
 
 Drawings of diamonds 
 from deck of card. 
 
 Series of s 
 drawings. 
 
 Frequency of 
 Diamonds. 
 
 Homograde. 
 
 Petals of Flowers. 
 
 Individual 
 flowers. 
 
 No. of Petals. 
 
 u 
 
 Heterograde. 
 
 Fin Rays of Fish. 
 
 Individual 
 fishes. 
 
 Xo. of Rays. 
 
 " 
 
 II 
 
 Sex at Birth. 
 
 Series of s 
 births. 
 
 Frequency of 
 Male Births. 
 
 u 
 
 Homograde 
 
 Age Distribution of 
 Assured Persons. 
 
 The Policy- 
 holder. 
 
 Age. 
 
 Graduated. 
 
 Heterograde. 
 
 Distribution of 
 Amounts Assured. 
 
 The Policy. 
 
 .\mount in 
 Dollars. 
 
 " 
 
 11 
 
 Antropometric Meas- 
 ures of Recruits. 
 
 The Recruit. 
 
 Chest Measure. 
 
 II 
 
 
 Precision Measures. 
 
 Individual 
 Measure. 
 
 Error. 
 
 " 
 
 " 
 
 Invalidity. 
 
 The Individ- 
 ual Pa- 
 tient. 
 
 Duration in days 
 of Invalidity. 
 
 - 
 
 " 
 
 Income and Wages. 
 
 The Individ- 
 ual 
 Worker. 
 
 Yearly Income 
 in Dollars. 
 
 
 
 Cross of Flowers. 
 
 Series of s 
 Hastard 
 Flowers. 
 
 Color of Blos- 
 soms, Seeds, 
 etc. 
 
 Integral. 
 
 Homograde. 
 
 It is to be noted that in the primary observations on homograde 
 series the num])ers are all abstract, whereas the heterograde series 
 consist of concrete numbers. Another peculiarity of the homo- 
 grade series Is that they are always connected witli the number of 
 comparison, .s, which is absent in the heterograde series. 
 
 96. Compound Frequency Curves. — According to the Lai)laccan-Charlicr 
 hyT)othcsis any fncjucncy curve may be considered as being generated as a 
 sum of independent frequency curves and represents therefore in the final
 
 96] COMPOUND FREQUENCY CURVES. 177 
 
 instance really a compound frequency curve. Mathematically speaking wo 
 may therefore consider any frequency curve, no matter of what form, as 
 being represented by the symbolic relation. 
 
 Fix) = 2 Niifiix) for i = 1, 2, 3, . . . 
 
 The functions (fiix) may sometimes all be normal or Laplacean probability 
 curves. On the other hand, by assigning different values to Ni or the areas 
 of the separate curve we may ol)tain a (!omi)ound curve of wavelike form. 
 Suppose for instance we had two samples of <)l)servations on the heights of 
 say 100,000 Japanese recruits and 10,000 Danish recruits, eacih individual's 
 measure written on separate cards. Suppose furthermore that we mixed those 
 110,000 cards in an urn, and then formed a new frequency distribution of this 
 mixture This new frequency distribution would be a compound curve with 
 two strongly }m)nounced crests or maximums. One (the Japanese) clustering 
 around the value of 160 centimeters, while a smaller crest (the Danish) would 
 tend to cluster around the value of 170 centimeters. 
 
 Another instance is offered in the distribution of the frontal breadths of a 
 Naples specimen of the crab, Carcinus mcvnas, as measured by Weldon. Wel- 
 don thought it very probable that this rather skew frequency distribution 
 was produced by a fusion of two distinct races or species of individuals, which 
 were clustered symmetrically around separate means. The distinguished 
 English biometrician, Karl Pearson, tested this hypothesis for him and 
 analysed the compound curve as two component curves representing respec- 
 tively 58.55 and 41.45 per cent of the total area of the comj)omid frequency 
 curve. Thus Weldon's hypothesis was verified by a mathematical analysis. 
 
 A quite different type of example is offered in the frequency distribution of 
 deaths by attained age as represented by the dj. (!olumn in any ordinary mor- 
 tality table. The fact that the deaths in th(> dj^ column showed a marked 
 clustering tendenc-y, strongly suggestive of the normal Laplacean curve around 
 the age group 70-75, was already noted by Lexis, who in this way made a 
 very interesting attempt to determine what he called a Normalalter for 
 the age of man. Later on Italian statisticians took up the problem and ana- 
 lysed the dx curve of Italian life tables as a sum of several normal frequency 
 curves. Karl Pearson was the third investigator to take up the problem in a 
 very fascinating essay in his Chances of Deaih. Pearson pictures Death as 5 
 marksmen shooting at a human target passing over the Bridge of Life. Each 
 marksman aims with different precision and skewness. The result is 5 com- 
 ponent skew curves. 
 
 Although the brilliant and perfect literary style of the eminent English 
 biometrician rouses the admiration and brings forth the reader's unstinted 
 praise, I can, however, not help being in accord with the distinguished Amer- 
 ican biologist and statistician, Raymond Pearl, who in his 1920 Lowell Lec- 
 tures, although speaking in tlu; highest terms of praise of Pearson's work, 
 characterized a mathematical analysis of this kind as being nothing more than 
 a highly inter(>sting and neat graduation foriiuila Init wholh' void of any 
 biological significance. 
 
 It is as a mere matter of fact a comparatively easy matter to break np any 
 death curve of a mortality table into separate mathematical components. As 
 an example of such a process I offer the illustration in Chapter XVI, where
 
 178 THEORY OF ERRORS AND FREQUENCY CURVES. (97 
 
 I havo broken up the recently published American AM^^^"* mortality table 
 into tAvo curves of the Gram-Charlier type, obtaining as g;ood results as the 
 Italian investigators and Pearson, who use 5 component curves. 
 
 But as already pointed out by Lexis "a mere mathematical analysis in 
 component groups does not enlarge our knowledge of the causal relationships. 
 It would be a quite different matter, however, if it were possible to establish 
 clustering tendencies around definite ages for each of the more important 
 causes of death." 
 
 An attempt to do this has been made by the present \\Titer in his forth- 
 coming book An Elementary Treatise on Frequency Curves and their Applica- 
 tion to the Human Death Curve. I start with the hypothesis "that the frequency 
 distribution of deaths at attained ages classified according to certain groups 
 of causes of death among the survivors in a mortality table tend to cluster 
 around specific ages in such a mamier that their frequency distribution can 
 be represented by a Gram-Charlier frequency cui-ve." If this hypothesis can 
 be accepted as having a sound biological basis I have shown that it is possible 
 by a mathematical analysis resting on such hypothesis to construct mortality 
 tables from mortuary records by sex, attained age and cause of death, and 
 vrithout any information about the number of lives exposed to risk at various ages. 
 This j)roi)osal has been met by a storm of j^rotest from many American ac- 
 tuari(\s, who claim that 1 have attempted the impossible. Final judgment 
 should be suspended, however, until the actual appearance of the work, which 
 I think must be judged from a biological rather than from a mathematical 
 point of view. The fact that the method has given good results in the con- 
 struction of many mortality tables among highly different races and occupa- 
 tions must, I think, be attributed to purely biological causes and not to ac- 
 tuarial or mathematical methods, which in the process have been employed 
 as a mere tool, as a means rather than as an end. 
 
 97. Early Writers. — -The idea of frequency curves or frequency 
 distribution is probably very old. It very likely aros(^ in the mind 
 of man when he began to make quantitative observations. Un- 
 doubtedly the surveyors and engineers of the pcoi)le of ancient 
 civilization had noticed that successive and independent measure- 
 ments of the same object often showinl variations. On the other 
 hand we have no means of knowing if the ancient geometers and 
 mathematicians knew how to estimate and value such variations 
 from the true value of the object. It is probable that the great 
 Greek astronomers, such as Hipparch antl Aristarch in their 
 astronomical observations have employed some rational method 
 of allowing for errors due to th(^ instruments and the individual 
 observer, but no records are availal)le so as to settle this question. 
 
 The great Danish astronomer, Tycho Brahe, the fatlu^r of mod- 
 ern astronotny, on the other hand made careful adjustments for 
 errors of obs(;rvations and has left us records on the systematic 
 method of such adjustments.
 
 98 ] LAPLACE AND GAUSS. 179 
 
 However, it was not before the close of the eighteenth century 
 that the errors of observations were subjected to a mathematical 
 trcat'ucnt. The first known writer on the mathematics of the 
 subject was the English actuary, Thomas Simpson, a most remark- 
 able self-taught mathematician who in 1757 issued his "Miscel- 
 laneous Tracts on some curious and very interesting subjects in 
 Mechanics, Physical Astronomy and Speculative^ Mathematics." 
 In this interesting and instructive little book is found a chapter 
 entitled "An Attempt to shew the Advantage arising by Tak- 
 ing the Means of a Number of Observations in Practical As- 
 tronomy." 
 
 About 15 years later the French mathematician, Lagrange, took 
 up the ideas of Simpson in a memoir, which at the time caused 
 considerable notice in mathematical circles. Lagrange in his 
 treatment followed a course very much similar to that employed 
 by de Moivre in the discussion of the problem which bears his own 
 name. 
 
 In 1778, Daniel Bernoulli in the scientific publications of the 
 Russian acadeny of Petrograd subjected the memoirs of Lagrange 
 to a searching criticism and proposed the first mathematical 
 formula for a frequency curve or curve of errors around the mean. 
 Bernoulli suggested as a law of error or frequency function, (p (a:), 
 the following expression: 
 
 ^(.r) = + \/ r — x\ where /• is a constant 
 
 This (Hiuation rei)resents a symmetrical semi-circle and gives as 
 we shall hav(> occasion to show at a later stage a rough approxitna- 
 lion to tlie presumptive law of erroi'. 
 
 A very important c()ntri])ution to the theor}^ was also made 
 by the American, Adrain, in his journal "The Analyst." 
 
 98. Laplace and Gauss. — Laplace was the next mathematician 
 to take up the subject of fretiuencv curves in his monumental 
 work "Theoi-ie analytique d(\s probabilites." The great French- 
 man dealt with the subject in a manner which leaves little to 
 be desired. ]\I. Charlie^-, the eminent Swedish astronomer and 
 statistician, has justly remarked that among the various deductions 
 of the law of errors, the exhaustive researches of Laplace occupy 
 beyond doubt a leading position because of their generality and 
 far reaching applications. On the other hand, the analysis of 
 Laplace is by no means easy to follow in all its details and the 
 4th chapter of the "Theorie Analytique des Probabilites" accord-
 
 180 THEORY OF ERRORS AND FREQUENCY CURVES. [ 98 
 
 ing to a remark by Todhunter in his "History of Probabilities" 
 forms one of the most important but at the same time also one of 
 the most cliffieult parts of the great work. 
 
 No (ioiil)t the extreme difficulty of fully mastering the far 
 reaching l)ut intricate analysis of Laplace was realized by the 
 mathematicians. Already his friend and disciple, Poisson, realized 
 this and issued in 1832 a note entitled "Sur la probabilite des 
 resultats niovens dvs observations." But the wealth of ideas in 
 Lajilace's treatise and their wide range of application were really 
 never fully recognized for almost a full century when they were 
 taken up by Charli(>r, who more than any one else has proven 
 their gr(^at worth as the most general and direct basis for a com- 
 plete theory of frequency functions and the associated problem of 
 correlation. 
 
 In the meantime Laplace's method had l)een supplanted by the 
 independent and contemporary researches of the great German 
 mathematician, Gauss. The method employed by Gauss in 
 deriving his law of error or freqiu^ncy curve and the therewith 
 associated crit(M-ions of the method of least squares is undoubtedly 
 very simple and (>legant and nmch easier to follow for the beginner 
 than the analysis of Laplace. Gauss in his stutlies confided himself 
 to the so-called precision (>rrors or errors arriving from repeated 
 measurements by means of physical instruments, such as astronom- 
 ical or geodetic observations or measurements in experimental 
 physics or chemistry. 
 
 The ideas put forth by Gauss were followed up by a number of 
 astronomers and [)hysicists, such as Bcssel, Encke, Hansen, and 
 Hagen of Germany, Andrae, D'Arrest, and Gylden of Scandi- 
 navia, Airy, Herschell, and Tait in England, Laurent in France, 
 and Newcomb and Chauvenet in America. And the Gaussian 
 UK^hods ai-e still used exclusiv(4y in preference to those introduced 
 by Laplace in most of our text-books on theoiy of errors and the 
 related sul)ject of l(>ast squares. 
 
 One reason for this preference for the theory of Gauss apart 
 from its simpUcily of representation is to be look{Hl for in the fact 
 that until a compaiatively recent date the majority of applications 
 of the theory of fi(>(iuency curves or error curves had reference to 
 precision measurements. As pointed out by N. 11. Jorgensen,^ in 
 his excellent Danish tn^atise on "Frequency Surfaces and Coi-rela- 
 tion" it will, as a I'uie, be found that the Gaussian (>rror law may be 
 
 1 Undersiigclscr over IVoquonsfladcr og Korrelation (Copenhagen, 1910).
 
 99 ] quetelet's studies. 181 
 
 regarded as an excellent method of approximation, which becomes 
 well-nigh perfect in the case of errors of precision measurements 
 with dc^licate instruments in the hands of carefully trained ob- 
 servers. The Gaussian frequency curve may therefore be said to 
 fulfill all the requirements in pli-axi s of a law of eri-or, where we are 
 concerned with errors in the true sense of the word. 
 
 99. Quetelet's Studies. — Matters became, however, quite differ- 
 ent when the biologists and economists began to employ math- 
 ematical analysis in their research work. It was the great Belgian 
 astronomer and statistician, Quetelet, who first introduced exact 
 measurements in the study of biological and anthropological 
 phenomena and showed that a number of collected statistical data 
 on heights, weights and chest measurements of recruits exhibited a 
 close conformity to the Gaussian law of error, although the varia- 
 tion among the individual objects as measured could not be con- 
 sidered solely as erroi's in the original sense of the word. 
 
 Investigations along this line were greatly accelerated by the 
 discoveries of Quetelet. All sorts of measurements were taken and 
 the rapidly growing collections of statistical data relating to 
 economic and social conditions as recorded by various govern- 
 mental statistical bureaus furnished material for further investiga- 
 tions. But unfortunately in all these investigations the Gaussian 
 error law came to act as a veritable Procrustean bed to which all 
 possible measurements should be made to fit. The belief in 
 authority so typical of modern German learning and which had 
 also spread to America was too great to question the supposed 
 generalit}' of the law discovered by the great Gauss. Statisticians 
 could not conciliate themselves with the thought of the possible 
 presence of "skew" frequency curves, although numerous data 
 offered complete defiance to the Gaussian dogma and exhibited a 
 markedly skew frequency distribution. Supposedly great author- 
 ities argued naively that the reason the data did not fit the curve 
 of Gauss was that the observations were not numerous enough to 
 eliminate the presence of skewness. In other words, skewness was 
 regarded as a by-product of sampling and was believed could be 
 made to disappear completely if we could take an infinite number 
 of observations. 
 
 Voices had, however, been raised against these energetic but 
 futile Procrustean efforts. Already Quetelet realized the existence 
 of skew frequency curves. This is clearly brought out in his 
 correspondence on this subject with Mr. Bravais of the Ecole
 
 182 THEORY OF ERRORS AND FREQUENCY CURVES. [ 100 
 
 Polytechnique of Paris as published in an appendix to his "Lettres 
 siir la Th('oi-i(> (l(^s ))i'ol)abilites." 
 
 100. Opperman, Gram, and Thiele. — Neither Quetelet nor 
 Bravais succeeded, however, in giving a complete mathematical 
 treatment of the theory of skew frequency curves. The first com- 
 plete mathematical demonstration of this aspect of the matter was 
 given by various Scandinavian investigators. A Danish actuary, 
 Opperman, was pi'()l)ably the leading spirit in organizing the 
 r(>volt against the i)elief in authority as preached by the adherents 
 of the doctrine of Gauss. Opperman, who was a self-taught 
 mathematician, seems to have looked with suspicion on many of 
 the researches b}^ German mathematicians of the latter half part 
 of the nineteenth century. He was a great admirer of the early 
 Scotch and English mathematicians with whose works he was 
 thoroughly familiar, and it is said he took great delight in pointing 
 out how many of the lengthy and formidable German demonstra- 
 tions in the realm of the theory of functions had been demon- 
 strated in a more elementarj' and clearer manner b}' such men as 
 Wallis, Stirling, MacLaurin, Gregory, Briggs and Napier. As a 
 practical actuary and managing din^ctor of the Danish Govern- 
 ment Life Assurance Fund he had ample opportunity to notice 
 that many frequency distributions occurring in actuarial work 
 offered a notorious defiance to the frc^quency curve of Gauss. 
 Around Opperman there gathei-ed a small gi'oup of young en- 
 thusiastic students of mathematics among whom we may espe- 
 cially mention Gram and Thi(>le and to whom he expounded his 
 ideas. Opperman himself wrote very little and alwa\'s in a con- 
 densed form. A reviewer of his work remarks that he rewrote his 
 essays several times so as to be able to represent on a single page 
 what other mathematicians usually required a dozen of pages to 
 express. He has left very little material bearing on the theory of 
 frefiuency curves, but his discussions on this subj(>ct with his 
 younger disciples evidently bore fruit. 
 
 J. P. Gram was the first mathematician to show that the normal 
 symmetrical Gaussian error curve was but a special cas(> of a more 
 general system of skew frequency curves which could be repre- 
 sented by a series. In his very original doctor's thesis in Danish 
 on "The development of series by means of the method of least 
 squares" (Copcnhagan, 1879) ^ he extended some theories orig- 
 inally expounded by the Russian mathematician, Tchebycheff, to 
 ' Om Raekkeudviklinger.
 
 100] OPPERMAN, GRAM, AND THIELE. 183 
 
 the representation of frequency functions by means of a series. 
 By using the Gaussian curve 
 
 - (X- My 
 
 20-2 
 
 as a seneratinp; function Gram showed that an arbitrary frequency 
 function could l^e represented approximately by a series of the form 
 
 F(X) = C,<p (X) + Ci (p' (X) + Co (p" (X) + C3 (p'" (X) + C4 <p"" (x) 
 
 + . . . 
 
 In this development Gram established some far reaching prop- 
 erties of infinite determinants and their relations to orthogonal 
 functions which later have become of much use in the recent epoch- 
 making researches on integral equations b3^ the Swedish math- 
 ematician and actuary, Fredholm. 
 
 To Gram, therefore, belongs the honor of having been the first 
 mathematician to give a systematic theory for the development of 
 skew frequency curves. 
 
 While Gram's later work as a managing tlirector of a life insur- 
 ance company occupied most of his time and left but little o[)por- 
 tunity for purely mathematical work his friend, T. N. Thiele, began 
 to lecture at the University of Copenhagen on the general theor}' of 
 observations. The substance of these decidedly original lectures 
 was in 1889 published in book form under the title "A General 
 Theory of Observations." - 
 
 In several respects this woi-k occupies a dual position to the 
 work of the great Laplace, although Thiele is set like flint against 
 the idea of basing the theory of probabilities on the conception of 
 an a priori probability. In his lectures he always maintained that 
 the greatest benefit derived from the study of the method of least 
 ■ square was that the student learned where not to use it. Among 
 one of the great achievments of Thiele is the introduction in the 
 theory of frequency curves of a certain system of statistical char- 
 acteristics to which he gave the name of semi invariants and which 
 are practically identical to the system of moments later on intro- 
 duced by Pearson. By means of these semi invariants Thiele 
 arrived at the same series as deduced by Gram. In Thiele's work 
 we also find a very original treatment of the theory of correlation 
 
 - Almindelig lagttagelseslsere.
 
 184 THEORY OF ERRORS AND FREQUENCY CURVES. [ 101 
 
 originally introduced by Bravais. But instead of the term correla- 
 tion he uses the words "bonded observations." 
 
 Like Laplace's "Theorie des Analytiques des Probabilites " 
 Thiele's original work and its subsequent abridged translation in 
 English offers by no means an easy reading, especially to the 
 beginner. It contains, however, like the work of Laplace a verita- 
 ble wealth of ideas and methods which remain unsurpassed in the 
 realm of mathematical statistics and no serious worker on the 
 general theory of observations can afford to neglect to study the 
 original works of Laplace and Thiele. 
 
 101. Modern Investigations. — The investigations of Gram and 
 Thiele bring us up to the close of the nineteenth century. Their 
 ideas reached but a small number of students of mathematical 
 statistics because of the very limited knowledge of Scandinavian 
 languages among mathematical readers in general. But from the 
 beginning of the nineties other voices began to be heard against 
 the Gaussian dogma. In Germany it was Fechner who first 
 entered the ranks of the opposition with his so-called "zweispa- 
 Itiges Gesetz." His work was continued by Lipps and the Leipzig 
 astronomer, Bruhns, who by the publication in 1906 of his " Kollek- 
 tivmasslehre " gave an almost complete theory of frequency curves 
 where we again find the series originally developed by Gram and 
 Thiele. 
 
 Although quite considerable valuable work has thus been done 
 in Germany along the lines of frequency curves it was, however, 
 in England that the renewal of the classical probability theory 
 took place with the renowned memoirs by the English math- 
 ematician, Karl Pearson, entitled "Contributions to the Math- 
 ematical Theory of Evolution" in Philosophical Transactions for 
 1895. 
 
 Since that year Pearson has produced a certain type of statistical 
 literature, of almost Gargantuan proportions. The quarterly 
 journal "Biometrika" of which he is the editor is devoted to the 
 mathematical study of biological problems. When Pearson first 
 introduced his famous types of curves (now more than 12 in num- 
 ber) the study of frequency distributions was greatly accelerated. 
 The application of these curves to biological i)roblems was ap- 
 parently so simple that they were used in a rather loose manner 
 by many biologists and anthropologists who had but little training 
 in mathematical analysis. Examples of such pseudo mathematical 
 analysis are especially found in the writings of the American
 
 101 ] MODERN INVESTIGATIONS. 185 
 
 anthropologist, Franz Boas, which may be held up as a warning 
 to all statisticians to keep away from the higher mathematical 
 analysis of collected statistical data unless they are familiar with 
 the tools of the probability calculus. 
 
 Such misuse can of course not be laid at the door of Mr. Pearson 
 who indeed has protested vigorously against the erroneous appli- 
 cation of his methods by investigators of the Boas type. On the 
 other hand, it is equally true that Mr. Pearson at times has relied 
 too nmch on his mathematical formulas and violated the maxim 
 of the Danish biologist, Johannsen, that "we must practice 
 biology with mathematics and not as mathematics." 
 
 The immense production of Pearson coupled with his well-nigh 
 perfect and forceful style of writing has to a certain extent over- 
 shadowed the researches of his compatriot, Edgeworth, whose 
 works according to the Danish actuary and mathematician, 
 Jorgensen, are greatly superior to those of Pearson both in scien- 
 tific rigor and in practical applications. Edgeworth has deduced 
 the previously mentioned series by Gram and Thiele in a very 
 elegant form in the Cambridge Philosophical Transactions (1904) 
 and he has in a series of articles in the Journal of the Royal Statis- 
 tical Society outdistanced many of his contemporaries among 
 the mathematical statisticians. Unfortunately Edgeworth's con- 
 tributions have not gotten the attention they deserve, probably 
 because of the rather fragmentary and unsystematic maimer in 
 which they have appeared. Among the new methods introduced 
 by Edgeworth we may particularly mention the so-called method 
 of translation. 
 
 The Pearsonian types of frequency curves are represented by 
 formulas which in mathematical language are termed closed 
 expressions in contradistinction to the development in series. This 
 latter method is still being preferred by the Scandinavian math- 
 ematicians under the leadership of the Swedish astronomer Charlier. 
 Charlier started his first investigation with a small brochure en- 
 titled " Uberdas Fehlergesetz " in the Meddelendan for 1905, wherein 
 he followed the method originally introduced by Laplace and in a 
 most elegant way of. deduction reached the series of Gram and 
 Thiele. He has since that year published a series of small mono- 
 graphs on various aspects of mathematical statistics and their 
 .application to stellar statistics which beyond doubt are destined to 
 become classics in the history of probabilities. 
 
 Charlier has shown that all frequency curves fall into two types
 
 186 THEORY OF ERRORS AND FREQUENCY CURVES. [ 101 
 
 which he has designated as type A and type B. The A type is the 
 usual expansion of Ciiani and Thiele with the normal frequency 
 curve as the seneratin<i!; function. Type B which covers decidedly 
 skew frequenc}' curves is given by the series 
 
 Fix) =Coi^ (.r) + 1^' A (^ (.t) + I A^ <p {x) -{- . . . 
 
 ( - 1)" 
 
 where 
 
 X 
 - A /» XcosOO 
 
 <fj (x) = — j c COS [Xsinco — xo}]dcji) 
 
 TT J 
 
 is the generating function. 
 
 The decidedly constructive work begun by Charlier has been 
 ably supplemented by his talented disciple Wicksell and the 
 Danish actuary, Jorgensen. Wicksell in 1C20 issued in Swedish a 
 series of lectures on mathematical statistics delivered during the 
 autumn of 1919 before the Swedish Assurance Society. He has 
 also written numerous excellent monographs on mathematical 
 statistics and their application to vital statistics. 
 
 N. R. Jorgensen issued in 191G his large octavo volume on 
 "Researches on Frequency Surfaces and Correlation" ' which 
 beyond doubt is the most important work among the contributions 
 of Danish actuaries since the appearance of the memoirs of Gram 
 and Thiele. Jorgensen's systematic treatise has greatl}' furthered 
 the studies of the Scandinavian school both in tlu^oretical and 
 practical aspects. A very important feature of his book is the 
 insertion of an (extensive collection of numerical tables of various 
 functions which greatly facilitates the jiractical api)lications of the 
 theory. These tables, many of which are the results of his individ- 
 ual efforts, hold equal rank with the well-know^i ''Tables for j 
 Biometricians and Statisticians" edited by Karl Pearson in 1914. ^ 
 
 Besides the writings of Charlier, Wicksell and Jorgensen a 
 number of Scandinavian mathematicians, actuaries and statis- 
 ticians have coiiti'il)uted valuable researches both on frequency 
 curves and correlation methods. We may especially mention such 
 men as Guldberg, Gyllcmberg, Malmquist, Burrau and Lundquist. 
 In this group we might also include the Danish biologist, Johann- 
 
 ^ Undersogelser over Frequensflader og Korrelation.
 
 101 ] MODERN INVESTIGATIONS. 187 
 
 sen, whose writinfj;s on the theory of heredity are recognized as 
 standard texts on the apphcation of the mathematical statistical 
 methods to problems deahng with inherited characteristics in 
 organic Hfe. 
 
 A very interesting attempt to develop a theory of frequency 
 curves has been made by the Dutch astronomer, Kapte3'n, in his 
 "Skew Frequency Curves in Biology and Statistics" (Groningen, 
 1912). Kapteyn's theory which has much in common with 
 Edgeworth's method of translation introduces a new idea in the 
 generation of a frequency curve by making the size of the individ- 
 ual object depend not alone upon the sources influencing a collec- 
 tion of such individuals but also upon the size of the object at a 
 previously given time t. This idea of introducing the time factor 
 in the theory of probabilities is, however, more justly credited to 
 the French mathematician, Bachelier, whose large treatise on 
 probabilities of which the first volume appeared a few j-ears ago 
 has introduced some new thoughts regarding the conception of 
 continuous probabilities which are bound to strongly influence 
 the whole theory. 
 
 Before closing this necessarih' brief and incomplete historical 
 note we wish to mention the close connection of the theory of 
 frequenc}!- curves with that of integral equations. Since the 
 appearance of the epoch-making memoirs by Fredholm the theory 
 of integral equations has occupied a central position in math- 
 ematical analysis. This 3-oungest branch of higher analysis has 
 already found numerous practical applications in physics and 
 chemistry and possesses equally' important properties in the wa\' of 
 solving numerous statistical problems. In fact, the whole theory of 
 frequency curves and correlations can be reduced to the solution 
 of a few integral equations whose constants contain all the char- 
 acteristic properties of the frequency distribution. On the basis 
 of this principle, a complete theoiy of frequency curves could be 
 presented on a single book page.
 
 CHAPTER XIV. 
 THE MATHEMATICAL THEORY OF FREQUENCY CURVES. 
 
 102. Frequency Distributions. — If N successive observations 
 originutiuf;- I'roiu the same essential circumstances or the same 
 source of causes are made in respect to a certain statistical variate, 
 X, and if the individual ol)servations o, (^ = 1, 2, 3, .... A^) are per- 
 muted in their natural order in accordance with their magnitude 
 then this particular permutation is said to form a frequency dis- 
 tribution of X and is denoted by the symbol F{x). 
 
 The relative frequencies of this specific permutation, that is the 
 ratio which each absolute frequency or group of frequencies bear 
 to the total number of observations, is called a relative frequency 
 function or probability function and is denoted by the symbol (p{x). 
 
 If the statistical variate is continuous or a graduated variate, 
 such as heights of soldiers, ages at death of assured lives, physical 
 and astronomical precision measurements, etc., then 
 
 is the probability that the variate x satisfies the following relation 
 
 z ~ -dz < X < z -\- ~dz 
 
 or that X falls between the above limits. 
 
 If the statistical variate assumes integral (discrete) values only 
 as for instance the number of alpha particles discharged from cer- 
 tain radioactive metals and gases, such as polonium and helium, 
 number of fin rays in fishes, or number of flower petals in plants, 
 then <p{z) is the probability that a: assumes the value 2. From the 
 above definitions it follows directly that 
 
 (a) V{z) = Nip{z) (Integral variates) 
 
 (b) dzF(z) = N(p{z)dz (Integrated variates) 
 
 Interpreting the above results graphically we find that (a) will 
 be represented by a series of disconnected or discn^te points while 
 (b) will be represented by a continuous curve. 
 
 As to the function <p(z) we make for the present no other 
 assumptions than those following immediately from the customary 
 
 188
 
 103] PARAMETERS AS SYMMETRIC FUNCTIONS. 189 
 
 definition of a niatfiematical probability. That is to say the 
 function ip{z) iriust be real and positive. Moreover, it must also 
 satisfy the relation 
 
 + 00 
 
 J (p{z)dz = 1, 
 
 — 00 
 
 or in the ease discrete variates: 
 
 2 =00 
 
 2 = -00 
 
 which is but the mathematical way of expressing the simple 
 hypothetical disjunctive judgment that the variate is sure to 
 assume some one or several values in the interval from — 0° to 
 + CO. The zero point may be arbitrarily chosen and need not 
 coincide with the natural zero of the number scale. Thus for 
 instance if we in the case of Danish recruits choose the zero point 
 of the frequency curve at 170 centimeters an observation of 
 180 centimeters would be recorded as + 10 and an observation of 
 160 centimeters as — 10. 
 
 103. Parameters considered as Symmetric Functions, — In 
 regard to a frequency function we may assume a priori that it will 
 depend only upon the variate x and certain mathematical rela- 
 tions into which this variate enters with a number of constants 
 Xi, ^2, X3, X4 . . ., symbolically expressed by the notation 
 
 F{x, Xi, X2, X3, X 4 . . .) 
 
 where the X's are the constants and x the variate. 
 
 All these constants or parameters are naturally independent of 
 X and represent some peculiar properties or characteristic essen- 
 tials of the frequency function as expressed in the original observa- 
 tions Oi (f = 1, 2, 3, . . .A^). We may, therefore, say that each 
 constant or statistical parameter entering into the final math- 
 ematical form for the frequency function is a function of the 
 observations 0\. This fact may be expressed in the following 
 symbolic form 
 
 Xi = Si (Oi, O2, O3, ... Oat) 
 X2 = S-i (Oi, O2, O3, . . . On) 
 
 X.v = Sn (oi, 02, 03, . . . Oat)
 
 190 THEORY OF ERRORS AND FREQUENCY CURVES. [ 103 
 
 But from purely a priori considerations we are able to tell some- 
 thing else about the function Si . {i = 1, 2, 3, . . . N). It is only 
 when permuting the various o's in an ascending magnitude accord- 
 ing to the natural number scale that we obtain a frequency func- 
 tion. This arrangement itself has, however, no influence upon 
 any one of the o's which were generated before this purely arbitrary 
 permutation took place. The ultimate and previously measured 
 effects of the causes as reflected in each individual numerical 
 observation, oi, depend only upon the origin of causes which form 
 the fundamental basis for the statistical object under investigation 
 and do not depend upon the order in which the individual o's occur 
 in the series of observations. 
 
 Suppose for instance that the observations occurred in the 
 following order 
 
 Oi, 02, 03, . . . On 
 
 By permuting these elements in their natural order we obtain the 
 frequency distribution F{x). But the very same distribution 
 could have been obtained if the observations had occurred in any 
 other order as for instance 
 
 07, O9, Oat, . . . O3 . . . Oi. 
 
 SO long as all of the individual o's were retained in the original 
 records. Or to take a concrete example as the study of the number 
 of policyholders according to attained ages in a life assurance 
 office. We write the age of each individual policyholder on a small 
 card. When all the ages have been written on individual cards 
 they may be permuted according to attained age and the resulting 
 series is a frequency function of the age x. We may now mix these 
 cards just as we mix ordinary playing cards in a game of whist, 
 and we get another permutation in general different from the 
 order in which we originally recorded the ages on the cards. But 
 this new permutation can equally well be used to produce the 
 frequency function if we are only sure to retain all the cards and 
 do not add any new cards. 
 
 The various functions S{oi, 02, 03, . . . on) are, therefore, sym- 
 metric functions, that is functions which are left unaltered by 
 arbitrarily permuting th(» A^ elements o, and no interchange what- 
 ever of the values of the various o's in those symmetric functions 
 can have any influence upon the final form of the frequency func- 
 tion or frequency curve, F{x).
 
 ] 04 ] SEMI-INVARIANTS OF THIELE, 191 
 
 We now introduce under the name of power sums a certain well- 
 known form of fundamental symmetrical functions defined by 
 the following relations 
 
 So =< + 0o + 0° + ...0^ = .V 
 
 Si = oj + 0^, + O3 + . . . On = Soi 
 S2 = o'l -\- oz -j- ol + . . . o§i = So? 
 
 N I N I N I N V N 
 
 sn = Oi + 02 + 03 + . . . On = 2oi 
 
 Moreover, a well-known theorem in elementary algebra tells us that 
 every symmetric function may be expressed as a function of 
 
 Sl, S2, S3, . . . Sn- 
 
 From this theorem it follows a fortiori that we are able to express 
 the constants X in the frequency curve as functions of the power 
 sums of the observations. Whil(> such a procedure is possible, 
 theoretically at least, we should, however, in most cases find it a 
 very tedious and laborious task in actual practice. It, therefore, 
 remains to be seen whether it is possible to transform these sym- 
 metrical functions of the power sums of the observations into some 
 other symmetrical functions, which are more flexible and workable 
 in practical computations and which can be expressed in terms of 
 the various values of s. 
 
 104. Semi-Invariants of Thiele. — It is the great achievement 
 of Thiele to have been the first mathematician to realize this 
 possibility and make such a transformation by introducing into 
 the theory of frequency curves a peculiar system of symmetrical 
 functions which he called semi-invariants and denoted by the 
 symbols Xi, X2, X3 . . . 
 
 Starting with the power sums, s», Thiele defines these by the 
 following identity 
 
 XiW , Xjoj' , Xaw' , „ , , „ , .■> „ , .3 
 
 Soeli !2 13 =so+— --I--2- + — ^+... (1) 
 
 which is supposed identical in respect to co. 
 
 Since s,- = So' the right hand side of the equation may also 
 be written as e"'" + e"^ + e"^ -f . . . = Se'^i'^.
 
 192 
 
 THEORY OF ERRORS AND FREQUENCY CURVES. 
 
 104 
 
 Differentiating {I) witli respect to co we have 
 
 Soe 
 
 XlO) \~Cx}- /V3W' 
 
 TT"^ |2 "^ |3 ■^• 
 
 X2C0 X3W- 
 
 A, + -r|-+^- + 
 
 ] 
 
 S3 
 
 So + pj-CO + 12CO- + TT^CO'^ + 
 
 X2CO X3 
 
 ]■ 
 
 . So 83 o , S4 3 , 
 
 = Si + ^w + 12 w- + i^w"* + 
 
 Multiplying out and equating the various coefficients of equal 
 powers of co we finally have: — 
 
 Si = XiSo 
 
 52 = XiSi + X2S0 
 
 53 = XiSo + 2X2S1 + X3S0 
 
 54 = X1S3 + 3 X2S2 + 3 X3S1 + X4S0 
 
 where the coefficients follow the law of the binomial theorem. 
 Solving for X we have 
 
 Xi = si : So 
 
 X2 = (.S9S0 - si) : So 
 
 X3 = (S3S0 - 3s2SiSo + 2s?) : So 
 
 Xi = (.s'4.sb — 4S3S1S5 — 3s2-% + 12s2sTso — 6si) : Sq 
 
 The se:ni-invariants X in respect to an arl)itrary origui and 
 unit are definetl hv the relation 
 
 ■%e 
 
 Xico Xow^ X.iw' 
 
 IT "^12"'^ IF "*■• • 
 
 = f"'^-\- f." ^-\- c"^"^ -\. 
 
 where Oi, 02, 03 . . . are the individual observations. 
 
 Let us now change to another coordinate system with another 
 unit and origin defined by the following linear transformation 
 
 Oi = aoi + c 
 
 The semi-invariants in this new system are given by the relation 
 
 Soe 
 
 TT |2 ^ 13 ■•"• • 
 
 = e'""+e"-""+ ... 
 
 = e 
 
 (aoi + r)co_j_ (ao: + c)aj 
 
 +
 
 104] SEMI-INVARIANTS OF THIELE. 193 
 
 Since the various values of X' do not depend upon the quantity 
 CO we may without changing the value of the semi-invariants re- 
 place CO by CO : a in the above equations which give 
 
 \ CO X CO- X,CO' CO , , ^co , , ^co 
 
 a{i a-\Z a-'l-i a i "■ i "■ , 
 
 soe = e -\- e -\- e +••.. 
 
 rcoT -| 
 
 <> CO "iCO O.iCO 
 
 rco Xico X;co- Xsco' 
 
 = e Soe 
 
 Taking tlic logaiithins on both sides of the equation we have 
 Xi'co X/co- X;/co"^ ceo Xito Xoco"- Xjco^ 
 
 'W' a^ a^ ~^^ir Tr IT 
 
 Diff(M('ntiating successivel}^ with respect to co we have 
 
 Xi' X/co Xs'co'- c Xoco X.-iCo'- 
 
 IJa a- If a' a \1 \A 
 
 \>' , X.j'co X4'co- XjCo- 
 
 X:/ Xi'cO 
 
 4+ -^ + . . . = X3+ X,'a;+. . . 
 a"* a' 
 
 Letting co = we therefore have 
 
 — - = — + Xi, or X/ = aXi + c 
 a a 
 
 — =■ = X>, or X/ = a'-'X) 
 a- " 
 
 X3' 
 
 — =- = X;!, or X3' = a^Xs 
 
 from which we deduce the following relations 
 
 Xi ( a.r + c) = a Xi (x) + c 
 
 Xr(a.r -\- c) = a^Xrix) for r > 1 
 
 We shall for the present leave the semi-invariants and only ask 
 the reader to bear in mind the above relations between X and s,
 
 194 THEORY OF ERRORS AND FREQUENCY CURVES. [ 105 
 
 of which we shall later on make use in determining the constants 
 in the frequency curve (p{x). 
 
 Before discussino; the generation of the total frequency curve it 
 will, however, be necc^ssary to demonstrate some auxiliary math- 
 ematical formulae from the theory of definite integrals and integral 
 equations which will be of use in the following discussion as 
 mathematical tools with which to attack the collected statistical 
 data or the numerical observations. 
 
 105. The Fourier Integral Equation. — One of these tools is 
 found in tiie celebrated integral theorem of Fourier, which was 
 the first integral equation to be successfully treated. We shall in 
 the following demonstration adhere to the elegant and simple 
 solution In' M. Charlier. Charlier in his proof supposes that a 
 function, F(co), is defined through the following convergent series. 
 
 F{oj) = a [f(o) 4- /( a)c'^'^' +/(2a)e''^"^ + . . . 
 
 + /(- a)^-'''^' + /(-2a)e-'«'^' + . . . ] 
 
 771 =0O 
 
 or F{co) = a^/(am)e'^"'"'' 
 
 TO = - OO (2) 
 
 where i = \/- 1 
 
 We then see by a well known theorem of Cauchy that the 
 
 integral 
 
 + 00 
 
 /(co) = ff(x)e''"dx'- ^^^ 
 
 is finite and convergent. If we now let ma = x and let a = 
 as a limiting value, a becomes equal to dx and /(am) = fix). 
 Consequently we may write 
 
 Urn F((x)) = I{co). 
 a = o 
 
 Multiplying (2) by e~''°'^'do) and integrating between the 
 limits — 7r/ a and + tt / a we get on the left an expression of the 
 
 form I F(co)c~'"'^'^V/co and on the right a sum of definite integrals 
 
 -x/a 
 
 of which, however, ail but the term containing /(?• a) as a factor 
 will vanish. This particular term reduces to 
 
 'See Goursat: Mathetnalical Analysis (English Translation, New York), 
 page 364.
 
 106] FREQUENCY FUNCTION SOLUTION OF EQUATION. 195 
 
 a I f(ra)di 
 
 lea or 2ir/(ra) 
 
 - TT a 
 
 Hence we have 
 
 +x/a 
 
 /(ra) = -^ fFioj) e-'-'^"'dco ^^^^ 
 
 -•rc/a 
 
 By letting a converge toward zero and by the substitution 
 ra = X this equation reduces to 
 
 + 00 
 
 fix) = ^//(co)e— -rfc- (^^^ 
 
 fco 
 
 We then have, if we introduce a new function (p(o}) defined by 
 the simple relation : 
 
 ■\/2t ^(co) = lim F{c>)), or 
 
 a= 
 
 
 (5a) 
 
 (5b) 
 
 •^^^^ = ^ J'^^^^^~'""^ 
 
 CO 
 
 Charlier has suggested the name conjugated Fourier function of 
 f{x) for the expression (pi 03). 
 
 The equations (5a) and (5b) are known as integral equations of 
 the first kind. The expression e^"'(or e""^"') is known as the 
 nucleus of the equation. If in (5b) we know the value of <p{o3) we 
 are able to determine f{x). Inversely, if we know f{x) we may 
 find if^iod) from (oa). 
 
 106. Frequency Function as the Solution of an Integral Equa- 
 tion.^We are now in a position to make use of the semi-invariants 
 of Thiele, which hitlun-fo in our discussion have appeared as a 
 rather disconnected and alien member. On page 191 we saw 
 that the semi-invariants could be expressed by the relation 
 
 Xi , X2 , . Xs , . 
 
 _ V 
 
 11"- ' 12"- ' |H 
 
 e~ ~ = 2u e 
 
 when Oiii = 1, 2, 3, ... ) denotes the individual observations.
 
 196 THEORY OF ERRORS AND FREQUENCY CURVES. [ 106 
 
 The definition of the semi-invariants does not necessitate that 
 all the o's must be different. If some of the o's are exactly alike it 
 is self-evident that the terna e"*" must be repeated as often as 
 Oi occurs among all of the observations. If therefore N<p(oi) 
 denotes the absolute frequency of Oi where (p{Oi) Is the relative 
 frequency function, then the definition of the semi-invariants 
 may be written as: — 
 
 J>j(p{Oi)e- - - = ^ ) (pjode"''^ 
 
 For continuous variates, x, the above sums are transformed into 
 definite integrals of the form 
 
 X, , X, ,, X3 ,, +^ +°? 
 
 |1 |2 ^13 ^ 
 
 e" 
 
 j (p{x)dx = I <p{x)e^"dx 
 
 Let us now substitute the quantity \/-laj, or ?'co, for co in the 
 above identity. We then have : — 
 
 Xi X' Xs +°° +°° 
 
 jT ^"'^ \2 '""'"^ 13 ^''^^'^ ■ 
 
 e~ " 
 
 / (p(x)dx = J <p(x)e"'"dx 
 
 under the supposition that this transformation holds in the com- 
 plex region in which the function is defined. 
 
 In this equation the definite integrals are of special importance. 
 
 + 00 
 
 The factor I (p(x)dx is, of course, equal to unity according to 
 
 — oo 
 
 the simple considerations set forth on page 189. The integral 
 on the right hand side of the equation is, however, apart from the 
 constant factor '\/27r nothing more than the <p(o)) function in the 
 conjugate Fourier function if we let (p{x) = f{x), and 
 
 e- - - = \/27r(p(co) 
 
 According to (5b) we may, therefore* write f{x) or (p(x) as 
 
 — 00 
 
 as the most general form of the frequency function <p(x) expressed by 
 means of semi-invariants. (See Appendix.)
 
 107 ] NORMAL OR LAPLACEAN PROBABILITY FUNCTION. 197 
 
 107. The Normal or Laplacean Probability Function. — The 
 
 exactness with which (p{x) is reproduced depends, of cours(% upon 
 the number of X's we decide to consider in the above formula. 
 As a first approximation we may omit all X's above the order 2 
 or all terms in the exponent with indices higher than 2. Bearing in 
 mind that r = — 1 we therefore have as a first approximation 
 
 + 00 
 
 ^o{x) = 75- / e L^ c?aj. 
 
 — <x> 
 
 This definite integral was first evaluated by Laplace by means of 
 the following <^legant analysis. Using the well known Eulerean 
 relation for complex quantities the above integral may be written as 
 
 -|-00 +°° 
 
 e 2 cos [(Xi — :r)coJ(/co+ ?" \ e ^" sin [(Xi — x)aj] rfco 
 
 — 00 — 00 
 
 The imaginary number vanishes because the factor e '^ 
 is an even function and sin [(Xi — x)o}\ an uneven function, and 
 the area from — ^^ to will therefore equal the area from to 
 -|- 00 ^ but be opposite in sign, which reduces the total area from 
 — 00 to + °° or the integral in question to zero. 
 
 In regard to the first term, similar conditions hold except that 
 cos [(Xi — x)co] is an even function and the integral may hence 
 be written as 
 
 -r 
 
 + 00 
 
 - hi 2 
 
 ' COS {r(i))(I CO where r = \i — x 
 
 Regarding the parameter r as a variable and differentiating / in 
 respect to this variable we have 
 
 dl 2 r, . -^co^^ w ^ / 
 J- = — I {— hocoe - ) sm (rw) ao) 
 
 From this we have by partial integration: — 
 
 (U 
 dr 
 
 4-0O 00 
 
 — e - '^ sin {rco)da3 ~ ^ I e - '^ cos {rco)da) 
 \2 L J X2 «/ 
 
 ^ rT 1 dJ r 
 
 = — — or __=- — 
 X2 / dr X2
 
 198 THEORY OF ERRORS AND FREQUENCY CURVES. [ 107 
 
 From \\hicli we find 
 
 log / = ~ 2V. "*" '^^ ^ 
 where log .1 is a constant. Hence we have: — 
 
 — r- 
 
 I = Ae^' 
 In order to determine ^4 we let r = and we have 
 
 00 
 
 This finally gives the expression for ^„(.r) in the following form: 
 
 1 -^xr 
 
 \/27rXi 
 
 as a preliminary approximation for the frequency curve (p{x). 
 
 The first mathematical deduction of this approximate expression 
 for a fi-cquency curve is found in the monumental work by Laplace 
 on Probabilities, and the function (Po(x) entering in the expres- 
 sion (po{x)dx, which gives the probability that the variate will fall 
 between x — ^dx and x + \dx, is therefore known as the Lapla- 
 cean probability function or sometimes as the Normal Frequency 
 Curve of Laplace. The same curve was, as we have niention(>d, 
 also deduced independently by Gauss in connection with his studies 
 on the distribution of accidental errors in precision measurements. 
 
 Laplace's probability function, (po{x), possesses some remark- 
 able properties which it might be well worth while to consider. 
 Introducing a slightly different system of notation by writing 
 Xi = M and \/\o = a, (p„{x) reduces to the following form: 
 
 J - (x - My:2(7'' 
 
 which is the form introduced by Pearson. 
 
 The frequency curve, (Po{x), is here expressed in reference to a 
 Cartesian (•f)()rdinate system with origin at the zero point of the 
 natural number system and whose unit of measurement is also 
 equivalent to the natural number unit. It is, however, not neces- 
 sary to use this system in preference to any other system. In fact, 
 we may choose arbitrarily any other origin and any other unit
 
 108] hermite's polynomials. 199 
 
 standard without altering the properties of the curve. Suppose, 
 therefore, that we take M as the origin and a as the unit of the 
 system. The frequency function then reduces to 
 
 where z = (x — M) : a 
 
 Since the integral of (Poiz) from — oo to + «> equals unity 
 the following equation must necessarily hold. 
 
 + 00 
 
 -t-' 
 
 This latter result may, however, be deduced independently of 
 the fact that <Poiz) happens to be a probability function. The 
 above definite integral is a form well known from the calculus and 
 equals ^/2T. It serves therefore as an independent check of our 
 calculations. 
 
 108. Hermite's Polynomials.— The Laplacean Probahility Curve 
 possesses, however, some otlier remarkable properties which are of 
 great use in expanding a function in a series. Starting with <Po{z) 
 we may by repeated differentiation obtain its various derivatives. 
 Denoting such derivatives by (pi{z), ipiiz), (fsiz) . . . respectively 
 we have the following relations.^ 
 
 (Po[z) = e 
 
 (Piiz) = ~ z(po(z) 
 
 ip-iiz) = (z- - l)(po(z) 
 
 (pz{z) = - {z^ — ^z)(Pq{z) 
 
 (pi{z) = (z^ — 6z- + 3)(po{z) 
 
 and in general for the nth derivative: — 
 
 <Pn{Z) = (—1)2 — Z 
 
 n(n - 1) (n - 2) (n - 3)2""* 
 
 2 -4 
 n(n - 1) (n - 2) (n - 3) (n - 4) (w - 5)2" " ^ 
 
 2-4-6 
 
 ■ • <^o( 
 
 ' In the followinji coiiiput.iti )iis wo have omitted temporarily the coastant 
 factor 1 : \/27rof (p„(z) and its derivatives.
 
 200 THEORY OF ERRORS AND FREQUENCY CURVES. [ 109 
 
 It can be readily seen that the derivatives of <Po{z) are repre- 
 sented throughout as products of polynomials of z and the func- 
 tion iPo{,z) itself. The various polynomials 
 
 H,{z) = 1 
 Hxiz) = z 
 H.i{z) = z^ -\ 
 H,{z) = (z' - Sz) 
 H,{z) = {z' - 6z' + 3) 
 
 and so forth are generally known as Hermite's polynomials from 
 the name of the French mathematician, Hermite, who first intro- 
 duced these polynomials in mathematical analysis. 
 
 The following relations can be shown to exist 
 
 H, + ,{z)+zHM + nH,,_,{z) =0 
 and 
 
 d'~H,Xz) zdH,Xz) 
 
 dz^ dz 
 
 nHJz) = 
 
 from which we successively may compute the various H(z). 
 
 A numerical 10 decimal place tabulation of the first six Hermite 
 polynomials for values of 2 up to 4 and progressing by intervals 
 of 0.01 is given by Jorgensen in his aforementioned 'Trekvens- 
 fiader og Korrelation." 
 
 109. Orthogonal Functions. — There exist now some very 
 important relations between the Hermite polynomials and the 
 derivatives of (Po{z), or between i/„(z) and <Pn(z). 
 
 Consider for the moment the two following series of functions 
 
 (po{z), (fiiz), (foiz), (p:i(z) (Pi(z), . . . 
 
 Ho{z),Ih{z),H.(z),H,(z) Ih{z),... 
 
 where (fjz) = (— 1)" H^{z) (Po(z) and where hm (p,Xz) = for 2 = ± oo 
 
 We shall now jorove that the two series <Pn(z) and //„(2) form 
 a biorlhogonal system in the interval — =0 to + 00 , that is to say 
 that they are 
 
 (1) real and continuous in the whole plane 
 
 (2) no on(> of them is identically zero in the plane 
 
 (3) every pair of th(Mn, (p^{z) and Hm{z) satisfy the relation 
 
 4-00 
 
 J (Pniz)Hm(z)dz = (n 5 w)
 
 109 ] ORTHOGONAL FUNCTIONS. 201 
 
 We have the self-evident relation 
 
 -j-OO 4-0O -1-00 
 
 J Hm(z)(pniz)dz = J H^(z)H„{z)(Poiz)dz = J HJz)(pJz)dz 
 
 — CO — 00 — oo 
 
 Since this relation holds for all values of m and n it is only neces- 
 sary to prove the proposition for n > m. For if it holds for n > m 
 it will according to the above relation also hold for n < m. 
 By partial integration we have: — 
 
 -<-00 -^00 +0O 
 
 J Hm(z)<Pniz)dz = H„,{z)<p„- i{z) — J //„(2)(^„_i {z)dz 
 
 when Hm(z) is the first derivative of Hm{z). 
 
 The first member on the right reduces to since (p,^ -\{z) =0 
 for 2 = =t oo and because H^ is of a lower order than ^„. We 
 have therefore: — 
 
 4-O0 4-00 
 
 J H„Xz)<p,Xz)dz = — J H'm{z)(pn - \iz)dz 
 
 — 00 00 
 
 -f-oo -f-oo 
 
 J H'm{z)ipn-l{z)dz = — J H"„Xz)<Pn - 2{z)dz 
 
 — 00 — 00 
 -f-00 -t-co 
 
 J H'in(z)(Pn~2(z)dz = - J H",n{z) if^ _ i{z)dz 
 
 _ oo — 00 
 
 Continuing this process we obtain finally an expression of the form 
 
 -l-oo 4-00 
 
 jHJz)<pMdz = (- ir+\fHj"' + '^<p„^^_,{z)dz 
 
 _ oo — 00 
 
 where HJ"" ^ ^^ (z) is the m + 1 derivative of H^ (z) and 
 n — m — 1^0. Since H^iz) is a polynomial in the m— degree 
 its m 4- 1st derivative is zero, and we have finally that 
 
 4-ao 
 j Hm(z)<Pr,{z)dz = 
 
 — 00 
 
 for all \'alues of 7n and n where n ^ m.
 
 202 THEORY OF ERRORS AND FREQUENCY CURVES. [ 1 10 
 
 YoY HI = n we proceed in exactly the same manner, but stop 
 at the in- integration. We have, ihereforc", hy replacing tn by n 
 in the al)ov(> partial integrations 
 
 J H^{z)iPn{z)dz = (— 1)" J Hn^''\z)<p, _ n{z)dz = 
 _ 00 —00 
 
 + 00 
 
 {-\r jH^^'\z)iPo{z)dz 
 
 — 00 
 
 The n- derivative of H„(2) is however nothing but a constant and 
 equal to 1^. Hence we have finally 
 
 -)-oo 00 + 
 
 J H,Xz)<Pn{z)dz = (- lT\n j e-'"-^dz = (- 1)" \n^2^ 
 
 -co' —00 
 
 The above analysis thus proves that the functions H^iz) and 
 <Pn(z) are biorthogonal to each other for all values of n different 
 from m throughout the whole plane. 
 
 110. The Frequency Function expressed as a Series. — We 
 can now make use of these relations between the infinite set of 
 biorthogonal functions H^iz) and (p^(z) in solving the problem 
 of expanding an arbitrary function (p{z) in a series of the form 
 
 (p{z) = Co<po{z) + Ci(pi{z) + 02(^2(2) + • . . , 
 
 the series to hold in the interval from -co to + «> . 
 
 If we know that (p{z) can be developed into a series of this 
 form, which after multiplication by any continuous function can 
 be integrated term for term, then we are able to give a formal 
 determiiiatipn of the coefficients c. 
 
 This foi-mal determination of any one of the c's, say c„ consists 
 in multiplying the above series by Hi{z) and integrating each 
 term from — 00 to + «> . All the terms except the one containing 
 the product Hi{z)ip,{z) vanish and we have for c,. 
 
 j <p{z)H,{z)dz I ip{z)H,{z)dz 
 
 Ci = 
 
 ip.{z)Hi{z)dz (- l)']iV2¥ 
 
 — 00 
 
 It will l)c noted that this purely formal calculation of the co-
 
 Ill] DERIVATION OF GRAM's SERIES. 203 
 
 efficients Ci is very similar to the determination of the constants in a 
 Fourier Series, where as a matter of fact the system of functions 
 
 cos z, cos 2 z, cos .3 z, . . . 
 sin z, sin 2 s, sin 3 3, . . . 
 
 is biortho^onal in the interval ^ 2 ^ 1. 
 
 But the reader must not forget that the above representation is 
 only a formal one, and wc do not know if it is valid. To prove its 
 validity we must first show that the series is convergcMit and 
 secondly that it actually represents <p{z) for all values of z. 
 
 This is by no means a simple task and it cannot be done by 
 elementary methods. A Russian mathematician, Vera Myller- 
 LebedefT, has, however, given an elegant solution by means of some 
 well-known theorems from the Fredholm integral equations. She 
 has among other things proved the following criterion: — 
 
 "Every function (p{z) which together with its first two deriva- 
 tives is finite and continuous in the interval from — °° to + «> 
 and which vanishes together with its derivatives for 2 = ± <» 
 can be developed into an infinite series of the form: — 
 
 ^{z) =y\cie-''--''H,{z) 
 
 where Hiiz) is the Hermite polynomial of order z." 
 
 111. Derivation of Gram's Series.^It is, however, not our 
 intention to follow up this treatment which is outside the scope of 
 an elementary treatise like this and shall in its place give an 
 approximate representation of the frequency function, (fiz), by 
 a method, which in many respects is similar to that introduced 
 by the Danish actuary Gram in his epoch-making work " Udvik- 
 lingsraekker," which (contains the first known systematic develop- 
 ment of a skew frecjuency function. Clram's problem in a some- 
 what modified form may l)ri(^fly be stated as follows: — Being 
 given an arbitrary relative frequency function, <p(z), continuous and 
 finite in the interval — oo ^o + °° ((^nd which vanishes for 2 = ± 0° ) 
 to determine the constant coefficients Co, Ci, co, Cs . . . in such a way 
 that the series 
 
 \^iPo{z) V<Poiz) ■\/<Po(z) \/(poiz)
 
 204 THEORY OF ERRORS AND FREQUENCY CURVES. [Ill 
 
 gives the best approximation to the quantity (p{z) : y/(po{z) in the 
 sense of the method of least squares. That is to say we wish to 
 determine the constants c in such a manner that the sum of the 
 squares of the differences between the function and the approx- 
 imate series becomes a minimum. This means that the expression 
 
 +00 
 
 r [ (p(z) ^ c,(Pi(z) -[2 
 
 must be a minimum. 
 
 On the basis of this condition we have 
 
 where the unknown coefficients c must be so determined that 
 
 +00 
 
 r r v(2) y . . 
 
 1=1 / — ^^ — C/(z) \ dzisa mmimum 
 
 y^ L V (Po(z) J 
 
 Taking the partial derivatives with respect to q we have 
 
 8c hc,J„ \^vM 8c.:/„ L J 
 
 Now since 
 
 4-00 -j-00 
 
 / [^f/(2)Jrf2= /|c/[//„(2)J+Ci2[//i(2)J 
 
 + . . . c„2 [^//„(2)]' I <Poiz)dz 
 we get 
 
 4-00 4-00
 
 Ill ] DERIVATION OF GRAM's SERIES. 205 
 
 +00 
 
 where the latter integral equals / (pi{z)Hi{z)dz = (— 1)* K\/27r 
 
 — 00 
 
 Equating to zero and solving for c, we finally obtain the follow- 
 ing value for c, 
 
 + 03 
 
 ( - 1)* r 
 
 Ci = \i^2TrJ <p(z)Hi(z)dz for ^ = 1, 2, 3, . . . 
 
 This solution is gotten by the introduction of -y/lpjzj which 
 serves to make all terms of the form Ci(pi{z) : \^ <Po{z) equal to 
 ■\/ <Po{z)CiHi{z) (f = 1, 2, 3, . . . n) orthogonal to each other in the 
 interval — «> to + oo . 
 
 In all the above expansions of a frequency series we have used 
 
 the expression <Po{,z) = e as the generating function (see 
 
 footnote on page 199), while as a matter of fact the true value of 
 
 <Po(z) is given by the equation (po(z) = e : \/27r 
 
 The definite integral on page (202) 
 
 -f-oo -|-oo 
 
 (— 1)' J Hi{z)ipi{z)dz =\ife-''-^dz = [V2^ 
 
 will therefore have to be divided by \/27r, and the value of the 
 general coefficient c, will henceforth be reduced to 
 
 -(.00 
 
 j <p{z)Hi(z)d2 
 
 Ci = 
 
 (— l)Ni 
 
 where Hi(z) is the Hermite polynomial of order i defined by the 
 relation 
 
 H-(z) = i ~ iji-m-' , ^•(^• - D {i - 2) {i -^ 3) , _ 4 
 *^ ^ 2 2-4 
 
 - i{i - 1) (/ - 2) (^• - 3) {i ~ 4) {i - 5)z'-'^ , 
 
 2.4.6 "^•**
 
 20G THEORY OF ERRORS AND FREQUENCY CURVES. [ 112 
 
 On this basis we obtain tlie following values for the first four 
 coefficients: — 
 
 + 00 
 
 Co = / <p{z)dz = 1 
 
 — 00 
 
 -(-00 
 
 ci ^ (- iyj(p{z)zdz:\l 
 
 — 00 
 
 + 00 
 
 C2 = {—ir-f(z^~l)(p(z)dz:\2 
 
 — 00 
 + 0O 
 
 C3 = (— lyfiz' — S2)(piz)dz : 13 
 
 — 00 
 
 + 00 
 
 C4 = (— lyj (z' — 62- + 3z)<p{z)dz : [4 
 
 112. Absolute Frequencies. — Wliile the above development of 
 an arbitrary frequency distribution has reference to <p(2), or the 
 relative frequency function, it is, however, equally well adapted 
 to the representation of absolute frequencies as expressed by the 
 function, F(z). If A^ is the total number of individual observa- 
 tions, or in other words the area of the frequency curve, we evi- 
 dently have 
 
 + 00 +00 
 
 N. 
 
 F(z) = Nifiz) ov j F{z)dz = N J (p{z)dz = 
 
 Since A^ is a constant quantity we may, therefore, write the 
 expansion of F(z) as follows: 
 
 F(z) = n\c,,<pJz) + ci(pi(z) + C2(p2{z) + . . . 
 
 e-'-' 
 
 where the coefficients Cj have the value 
 
 + 00 
 
 Ci = ^—^ J F{z)H,{z)dz, for i = 1, 2, 3, .
 
 112] ABSOLUTE FREQUENCIES. 207 
 
 and where 
 
 = fF(z)dz 
 
 Since all the Herniite functions are polynomials in 2 it can be 
 readily seen that the coefficients c may be expressed as fmictions of 
 the power sums or of the previously mentioned symmetrical func- 
 tions s, where 
 
 4.00 
 
 Sr = jz'F(z)dz 
 
 — CO 
 
 These particular integrals originally introduced by Thiele in the 
 development of the semi-invariants have been called by Pearson 
 the "moments" of the frequenc}^ function, F(z), and Sr is called 
 the rth moment of the variate z with respect to an arbitrary origin. 
 It can be readily seen that the moment of order zero or So Is 
 
 -j-OO -|-0O 
 
 So - fz''F(z)dz = N f(p{z)dz = N 
 
 — 00 — 00 
 
 Hence we have for the first coefficient Co 
 
 -f-=o 4-00 
 
 ?o ^ I F{z)dz : I F{z)dz = 1 
 
 — . 03 
 
 We are, however, in a position to further simplify the expres- 
 sion for F{z). 
 
 As already mentioned we are at liberty to choose arbitrarily })oth 
 the origin and the unit of the Cartesian coordinate system for the 
 frequency curve without changing the properties of this curve. 
 Now by making a proper choice^ of this Cartesian system of refer- 
 ence we can make the coefficients Ci and co vanish. In order to 
 obtain this object the origin of the system must be so chosen that 
 
 + 00 4-00 
 
 ci=^ f zF{z)dz :j F(z)dz = 
 
 — — 00 — 00 
 
 This means that the semi-invariant si : So = Xi must vanish. 
 It can be readily seen that the above expression for X: is nothing 
 more than the usual form for the mean value of a series of variates.
 
 208 THEORY OF ERRORS AND FREQUENCY CURVES. [ 112 
 
 Moi-eover, we know that tho algebraic sum (or in the ease of con- 
 tinuous variates, the integral) of the variates around the mean 
 value is always equal to zero. Hence by writing for z the expres- 
 sion {z — M) when M equals the mean value or Xi, we can always 
 make C\ vanish. 
 
 To attain our second object of making Ci vanish we must choose 
 the unit of the coordinate system in such a way that the expression 
 
 -f-CO -|-00 
 
 ci = ^^^ J F{z)H^{z)dz : ) F{z)dz = 
 
 — 00 — CO 
 
 which implies that 
 
 -|-00 +00 -f-co 
 
 U^ F{z)zHz -J F{z)dz\ : J F{z)dz = 
 — 00 — oo —I — CO 
 
 or that S2 : si — 1 =0, or when expressed in terms of the semi- 
 invariants that 
 
 X2 = (soSo — Si") : So" = 1. 
 
 But by choosing the mean as the origin of the system the term 
 61 : So is equal to and we have therefore X2 = c'- = S2 : So = 1. 
 Hence, by selecting as the unit of our coordinate system '\/\<i 
 or (7, wher(> o" is technically known as the dispersion or standard 
 deviation of the series of variates, we can make the second coeffi- 
 cient Co vanish. 
 In respect to the coefficients C3 and c\ we have now 
 
 -fco -(-00 -f-co 
 
 (- 
 
 C3 = 
 
 73^ [J z^F{z)dz - 3 J 2F(2)rf2l: J F{z)dz 
 
 I— nn _. QD -J — CD 
 
 which reduces to — \^ — , while 
 
 4-00 4-°° +°° +°° 
 
 C4=-^|4^ J z'Fiz)dz—6J z'F{z)dz+^j F{z)dz\: j F{z)dz, 
 
 '— — 00 00 — 00 —I — CO 
 
 which reduces to 
 
 — Fl! _ ^'^2 , 3.S0I _ ± [84 _ ^"1 
 HLso 60 "^ soj" K U J 
 While the coefficients of higher order may be determined with 
 equal ease it will in general be found that the majority of mod-
 
 118 J COEFFICIENTS EXPRESSED KY KF.MI-FNVARIANTS. 209 
 
 erately skow frequency distributions can be expressed by means 
 of the first 4 parameters or coefficients. 
 
 113. Coefficients expressed by Semi-Invariants. — We shall now 
 show how the same lesults for the values of the coefficients may be 
 obtained from the definition of the semi-invariants. Since we 
 have proven that a frequency function, F{z), may be expressed 
 by the series 
 
 F{z) = 1:c,<pXz) 
 
 we ma}^ from the definition of the semi-invariants write down the 
 following identity : — 
 
 XlCO X OJ^ +00 
 
 Soe = NJ e '" {Co<Po{z) + Ci<pi{z) + C2<P2(z) + . . . dz 
 
 — 00 
 
 where N is the area of the frequency curve. 
 
 The general term on the right hand side of the equation will be 
 of the form 
 
 + 00 
 
 ^^rfe'-iPr 
 
 {z)dz 
 
 where the integral may be evaluated by partial integration as 
 follows: — 
 
 + 0O +0° +0° 
 
 J (' ''^ipr{z)dz = e'^^tpr -li?) — CO j e "^<Pr _ x{z)dz, 
 and where the first term on the right vanishes leaving 
 
 + 00 +0O 
 
 J e''^<pr{z)dz = (— co)^ J e "^(^, _ i{z)dz 
 
 — 00 — 00 
 
 Continuing in the same manner we oljtain bv successive integra- 
 tions 
 
 + 00 +00 
 
 (— oiYJe '"^ <pr - i(z)dz = (— coyfe "'(Pr _ 2(z)dz 
 
 — 00 — 00 
 
 + 00 -1-00 
 
 (- ccY-fe "^ipr - 2{z)dz = (- co)^/e ^",^, _ 
 
 (z)dz
 
 210 THEORY OF ERRORS AND FREQUENCY CURVES. [114 
 
 from which we finally obtain the relation 
 
 4-00 +00 -f-oo 
 
 
 fe '''<Pr(z)dz = (- coffe <Poiz)dz = 77^ f e 'dz 
 
 — 00 — 00 ' — 00 
 
 V7r2 
 This latter integral may be written as 
 
 
 Consequently the relation between the semi-invariants and the 
 frequency function may be written as follows: — 
 
 Xiw , X.w' 
 
 Soe ~ = N \ CoCiCj + C2C0- — Csco^ + . . . e " , or 
 
 = A^ {Co — CiCO + C2CO- — C3tO^ + . . . . 
 
 -jj-+|2-(X=-l) + 
 
 Soe - 
 
 By successive differentiation with respect to co and b}' equating 
 the coefficients of equal powers of co we get in a manner similar to 
 that shown on page 192 the following results: 
 
 Co = So : N = So : So = 1 
 ci = — Xi 
 
 C2 = jir(Xo-i)+ xiM 
 
 C3="^| X3 + 3(X2-l)Xi = XiM 
 
 C4 = i^ [x, + 4X3X1 + 3(X2 — ly- + 6(Xo — 1) Xi^ + Xi'l 
 
 If we now again choose the origin at Xi or let Xi = and 
 choose y/X'z = 1 as the unit of our coordinate system we have: — 
 
 Co = 1, Ci = 0, C2 = 0, C3 = T^ X3, C4 = U Xi 
 
 114. Change of Origin and Unit. — The theoretical develop- 
 ment of the above formula! explicitly assumes that the variate, z, 
 is measured in terms of the dispersion or \/\2{z) and with Xi(z)
 
 114] CHANGE OF ORIGIN AND UNIT. 211 
 
 as the origin of the coordinate system. In practice the observa- 
 tions or statistical data are, however, invariably expressed with 
 reference to an arbitrarily chosen origin (in the niajority of cases 
 the natural zci'o of the number scale) and expressed in terms of 
 standard units, such as centimeters, grams, ycai-s, integral num- 
 bers, etc. 
 
 Let us dc^note th(^ general variate in such arbitrarily selected 
 systems of r(^f(n'enc(; by x. Our problem then consists in trans- 
 forming the various semi-invariants, Xi(x'), X2(x'), X;{(.r), X^ix), 
 ... to the system of reference with \i{z) as its origin and a/X^I^) 
 as its unit. Such a transformation may always be brought about 
 b}^ means of the linear substitution 
 
 z = a.r + b 
 
 which in a purely geometrical sense implies both a change of origin 
 and unit. On page 193 we proved the following general prop- 
 erties of the semi-invariants 
 
 Xi(2) = Xi(a.r+ b) = a\,(x) + 6 
 \{z) = \r(ax+ b) = a'Kix) 
 
 Let us now write Xi(.r) = .1/ and X..(.r) = a'-, we then have the 
 following relations: — 
 
 X,(z) = aM+b 
 X'iz) = <rcr'- 
 
 Since the coordinate system of r(^ferenc(^ nuist be chosen in such 
 a manner that \i(z) = and \/\o{z) = 1 we have 
 
 aa =1 
 
 1 ~ M . . 
 
 from which we obtain a = — and b — , which brings z on the 
 
 (7 (J 
 
 form z = [x — M) : cr, while (^,,(2) becomes 
 , 1 - [x-MY :2(72 
 
 Moreover, we hav(> \r(z) = \{x) : c'' for all values of r greater 
 than 2. We are now al)l(> to epitomize the computations of the 
 semi-invariants under the following simple rules :
 
 212 THEORY OF ERRORS AND FREQUENCY CURVES. [114 
 
 (1) Compute Xi(x) ill respect to an arbitrary orighi. The 
 numerical value of this parameter with opposite sign is the origin 
 of the frequency curve. 
 
 (2) Compute X^ix) for all values of r tKjual or greater than 2. 
 The numerical values of those parameters divided with {Vkiix))/' 
 or a'', for r = 2, 3, 4, ... are the semi-invariants of the frequency 
 curve. 
 
 Remarks on Nomenclature and Tables. — We shall now briefly dis- 
 cuss some of the geometrical properties of the Laplaeean probability curve 
 ^0(2) =e— 2-':2 and its derivatives, (pi{z) = Hi{z)(p(i(z), for i=3, 4, 5 . . 
 Writing ^^0(2) and its derivatives as: 
 
 (po{z)=e-z^-2:\/2TV 
 (Pi(z) = -zfPoiz) 
 
 (p2iz) = i-l)Hz^-l)(Po{z) 
 
 <p3iz) = {-l)Hz^-3z)(Po{z) 
 
 <Pi{z) = ( -1)4(24 -622 +3)<^o(2) 
 
 we readily notice that both ^'0(2) and all its derivatives of even order are 
 even functions of z while all the derivatives of uneven order are uneven 
 functions. 
 
 The Laplaeean probability function which occurs as a factor in all the 
 expressions is in itself a single valued positive function with a maximum 
 point at 2 =0 and a point of inflection at z = ±1 and approaches the ab- 
 scissa axis asymptotically in both positive and negative direction. At 
 2 = we have Vo(2) = l:i/27r =0.3989. At plus or minus one, <Pq{z) is less 
 than 0.25 at z = dz2, <Po(2) is nearly 0.05, at plus or minus 3 about 0.004 
 and at 2 = ±4 only 0.0001. 
 
 In regard to the third derivative, <P:i{z) =H'3(z)<^o(z) we find that it pos- 
 sesses a maximum or minimum in the noighliorhood of 2 = +0.7 and z = 
 — 0.7 respectively, it crosses the abscissa axis in the neighborhood of the 
 points z = zh 1.75 and approaches the abscissa asymptotically in l)oth posi- 
 tive and negative direction. 
 
 The fourth derivative has a major maximum point at 2 =0, it crosses the 
 abscisisa axis from positive to negative direction in the neighborhood of 
 2 = ±0.75, attains a minimum at about 2 = ±1.35, it crosses again the 
 ab.sci.ssa (this time from negative to positive direction) in th(^ neighbor- 
 hood of 2 = ±2.3, attains a secondary or minor maxinuim around 2 = ±2.86 
 and begins then to decline until it ultimately approaches the abscissa axis 
 asymptotically. 
 
 These geometrical proi)ertios of the Laplaeean frequimcy curve and its 
 derivatives are, however, muc^h more readily vizualized in the accom- 
 panying diagram which needs no further explanation. We wish, however, 
 to call the attention of the reader to the wavelike form of the various 
 curves, which is strongly reminiscent of the form of functions encountered
 
 114] 
 
 MOMENCLATURE AND TABLES 
 
 213 
 
 in harmonic analysis or in the expansions in Fourier Series, an analogy 
 which we had occasion to mention in the discussion of the orthogonal 
 properties of the Hermite polynomials and the derivatives of the Lapla- 
 cean funelidn. 
 
 FIGIHE 2 
 
 In order to facilitate practical numerical calculations it is, however, 
 necessary to have an extensive set of numerical tables for ^o{z) and its 
 derivatives. This fact was already noted by Laplace who more than 25 
 years prior to the publication of the memoirs by Gauss on the normal 
 error curve advocated the construction of a table of numerical values of 
 the integral. 
 
 1 /»z 
 
 /27rJ 
 
 Mz* 
 
 The first set of such tables was constructed by the astronomer, Kramp 
 
 and modified forms of these tables are found in nearly all treatises on least 
 
 squares and standard texts on probabilities. The most recent set of 
 
 *This fact, as pointed out by Pearson, definitely establishes Laplace's priority of 
 discovery of the probability curve.
 
 214 THKORY OF ERRORS AND FREQUENCY CURVES [114 
 
 tables of this iutofn"iil are those of Sheppard in (Tables for Bioinetricians, 
 edited l\v Karl Pearson) Avhere the variate z is expressed in units of <r, 
 or the dispersion. Sheppard has also computed a table of the numerical 
 values of ^0(2). 
 
 In order to use the Gram — Charlier expansion in serial form, it is, how- 
 ever, necessary to compute tables for the derivatives — up to the fourth 
 order. A brief table of the first G derivatives is already found in Thiele's 
 earlier treatises. Charlier was, however, the first to supply an extensive 
 set of tables to 4 decimal places for values of z up to 4 and progressing 
 by intervals of 0.01 in his Researches on the Theory of Probabilities in the 
 Meddelande for 1904. The most detailed tables are those of Jorgensen 
 in his Frekvensflnder og Korrelation which gives the values of ^0(2) audits 
 first G deri\atives to 7 decimal places for values of z up to 4 and progress- 
 ing by intervals of O.Ol.f The German astronomer Bruhns has in his 
 Kollektivmasslrhre given a set of tables to 4 decimal places of the values 
 of the definite integrals 
 
 / 
 
 (Piiz)dz for i = 0, 1, 2,3,4, 5 
 
 The Gram-Charlier series gives us the frequency function in the form 
 F{z) = 2ct^i(2) where the various coefficients a are expressed as moments 
 or semi-invariants. As we have already pointed out the derivatives of 
 uneven order are uneven functions and the derivatives of even order are 
 even functions. The addition of such terms as (-3^3(2), 05^5(2), • • • tends 
 therefore to produce asymmetry or skewness from the normal form, while 
 addition of the terras Ci<P4, c^<P(„ . . does not alter the symmetrical form 
 but tends to make it topheavy or flatten it around the neighborhood of the 
 origin or mean value of the variate 2. The coefficient cs'.S! (or X;j: 3!) is 
 technically known as the skewness, and ^4:4! (or X4:4!) as the excess of the 
 cm-ve. No particular names have as yet been proposed for the semi- 
 invariants of higher order. 
 
 tl intend to publish a similar sol of 5 dooimal plarc tables in tlui .socond volume of 
 this treatise.
 
 PART III 
 
 PRACTICAL APPLICATIONS OF 
 THE THEORY
 
 Note: — In the following pagea the factorial [^ = 1-2-3-- n is replaced 
 by the symbol «! and the exponent Hn"^ in the exponential expression e*""^^"^ 
 must be interpreted as n-:2 and not as l:2n-.
 
 CHAPTER XV. 
 
 THE NUMERICAL DETERMINATION OF THE PARAMETERS 
 
 115. General Remarks. — The previous investigations on 
 frequency functions have all been more or less of a purely 
 theoretical nature. In the present chapter we now propose 
 to show how the parameters are determined in actual practice 
 from the individual observations or statistical summaries 
 
 The determination of these unknown co-efficients or para- 
 meters can^ — as emphasized by J0rgensen in his Frekvensflader 
 og Korrelation — be looked at from two points of view. We 
 may either consider the series as infinite in which case the ques- 
 tion of determining the co-efficients becomes a problem in the 
 Theory of Functions; or we may decide to consider a finite 
 number of terms in the series and determine the coefficients so 
 that the sum of the squares of the deviations of the resulting 
 function from the observed statistical data becomes a minimum 
 in the sense of the method of least squares. In this case the 
 coefficients and not the moments or semi-invariants are repre- 
 sentative of the observations. This latter method is the classi- 
 cal method as used by Gram in his fundamental research on 
 the expansion of frequency functions in series. A brief state- 
 ment of the essential differences of the two methods may, how- 
 ever, be of advantage to the reader. 
 
 The method of moments requires that the areas of the defi- 
 
 ./■ 
 
 nite integrals of the form I x''F{x)dx must equal the areas 
 
 — X 
 
 of the observations which are expressed as power sums of the 
 form 
 
 X= M 
 
 ' T.^'F{x) 
 
 X= — =0 
 
 while the method of least squares requires that 
 
 J [F{x)-Y.c.^.ix)Ydx 
 
 215
 
 216 DETERMINATION OF PARAMETERS [116 
 
 must equal a minimum but does not necessarily impose any re- 
 strictions as to the condition of equality of the observed and the 
 computed areas as derived from the mathematical formula. 
 
 The pi'oblem of determining the parameters in the sense of 
 the method of least squares is therefore essentially a simple 
 problem in maximum and minimum and is not necessarily — 
 as some critics have imagined it to be — invariably interwoven 
 with the law of errors as expressed by the Laplacean probability 
 curve. It is, of course, true that the law of errors can be proved 
 by regarding the principles of the method of least squares as 
 an axiom, and inversely by accepting the law of errors as an 
 axiom, i. e. by assuming the deviations from the observations 
 and the functional law or mathematically determined frequency 
 curve as being due to chance or random sampling, we may 
 prove that the sum of the squares of such deviations actually 
 becomes a minimum. This peculiar relation is, however, not 
 necessary or required when we view the determination of the 
 constants as a simple problem in maxima and minima. 
 
 116. Remarks on Certain Criticisms. — Many English and American 
 
 actuaries ha\ e of late shown a tendency to ignore the method of least 
 squares and i)refer to rely entirely upon the method of moments. Thus 
 Palin Ehlerton in his otherwise usefid and instructive wcn'k onl'rcquency 
 Curves and Correlation states that the method is of little practical use, 
 while ]Mr. D. Caradog Jones in his newly ]>ul)lished /'"(7-.s/ Course in Statis- 
 tics claims the method of least scjuan's "wliich is the traditional way of 
 approaching all such problems, is shown to be impracticable in a large 
 number of cases, either because the resulting equations cannot be solved, 
 or, when they are capable of solution, because the labour involved would 
 be colossal." This objection falls, however, to the ground in the case 
 of the expansion of a frequency function in serial form because the un- 
 known parameters, with the exception of the origin (the mean) and the 
 unit (tlu; dispersion) of the co-ordinate system, all appear as coefficients 
 in true liruuir equations and hence are eminently adaptable to the treat- 
 ment by least s(iuarcs. 
 
 The attitude of these writers is probably due to the fact that they work 
 exclusively with the Pearsonian type of frexjuency curves where the 
 function, F(z), is given as a closed expression rather than as an expansion 
 in serial form. In nearly all IVarson's cur\e types there appear not more 
 than four constants wliich in a measure acciounts for the often successful 
 application of the method of monu^nts, although several of the examples 
 presented })%• Mr. .Jones in his book can scarcely lie said to be recommenda- 
 tive to Pearson's theory. On the other hand, it is a great drawback, not 
 being al)le 1o have mor(> than four constants at our disposal. Personally 
 I have encountered a large number of statistical s<'rii'S where the Pear- 
 sonian theory fails. This same fact is also noted l)y Jorgensen who on 
 Fiage -V.) of his l-'rek-rensjUnler og Korrelation states that "jeg kender flere 
 agttagelsesraekker. livor Pear.son's Teori svigter lot alt." 
 In the purely theoretical development it matters but little whether we 
 use moments or least squares in the expansion of a frecpiency function in 
 a series; a fact which is readily seen from our previous d(!monstrations. 
 In the purely practical work wo have, however, this fact to consider,
 
 116] REMARKS ON CRITICISMS 217 
 
 that the method of moments works exelusiv^ely with ansas expressed as 
 definite integrals, which are often ditlicult to deti-rmine in extremely 
 skew distributions. And it is only by successive approximations that we 
 in this manner reach a i)hiusil)le result. Moreover, unless the observa- 
 tions are very numerous, it is almost hopeless to comi)ute tlui moments 
 of higher order than the fourth, because of the very largt; (irrors arising 
 from random sanii)liug. ("iiarlier in one of liis monographs asserts that 
 it is generally ust>lt'ss to comi)ute moments of higher order tiian the 
 second when the number of individual ol)servations in the statistical 
 series is less than 1000. Thiele gives the following brief rules: 
 
 For the first and second semi-invariants rely exclusively on the observed 
 data. 
 
 For semi-invariants of higher order than 6 rely exclusively on theoretical 
 considerations. 
 
 For intermediate semi-invariants (between the 2d and 6th) rely partly 
 upon theory and partly upon the observations. 
 
 Caradog Jones, on the other hand, lustily ventures forth with moments 
 of the fourth order, based upon 241, and in some instances even as low as 
 180 indi\idual observations. It is, therefore, no wonder that some of his 
 results exhibit a somewhat poor "fit" with the original data. Another 
 criticism which may be lodged against the method of moments as used 
 by some adherents of the Pearsonian school, rather than l)y Pearson him- 
 self, is that it works with unweighted observations, and the values of the 
 extremities of the frequency cur\es are gi\'en the same weights as the 
 more numerous observations in the immediate neighborhood of the mean. 
 
 A second objection, raised among others by Elderton, is that the ex- 
 pansion in serial form sometimes gives rise to negative frequencies at the 
 extreme tail ends of the curve, due of course to the fact that we have used 
 a limited number of terms of the series. From purely practical con- 
 siderations this objection counts little, because the observations at the 
 extremeties are very few in numlier. It matters, for instance, but little 
 in ordinary calculations of assurance premiums whether the upper limit 
 of a mortality table is at 90 or at 100, and when Pearson from liis curves 
 actiudly has attemi)ted to put an ui>pfr limit to the duration of human 
 life, he has, to l)orrow an expression from the Danish biologist, Johannsen, 
 begun to handle biology (is mathematics and not with mathematics. In 
 this connection it may also be noted that the Pearson Type I curve gives 
 imaginary values beyond certain limits. When now certain followers of 
 the Pearsonian school have considered this as an advantage and tried to 
 interprete the limits as possbile values of repeated or i)resumpti\e observa- 
 tions, it seems that such disciples have stretched their poinlJ a bit too far. 
 It is not possible to see why negative results should be less plausible than 
 imaginary results. Every student of ordinary algebra knows that the 
 "iniaginary" quantities are just as valid as the so-called "real" quantities, 
 and it is prol>ably the choice of this unhai>py and ill-chosen nomenclature 
 which has gi\en rise to the abo\e extravagant claims of some of the fol- 
 lowers of Pearson. 
 
 Finally some English and American actuaries have objected to the 
 arbitrary choice of the parameters in the Oram or Charlier expansions. 
 Unless i have completely misunderstood Mr. Elderton this is one of his 
 his chief criticisms against Charlier's method. With my best intentions 
 I cannot agree to this and will even go so far as to say that Mr. Elderton's 
 criticism really speaks in favor of the methods put'forth by the Scandi- 
 navian scholars. As we have repeat I'dly emphasized in the preceding 
 paragraphs, the arbitrary choice of ci and c-2 amounts mathematically to 
 the choice of an arbitrary origin and unit in the Cartesian co-ordinate 
 system to which surely ru) mathematician will make objections. Neither 
 can objections be raised from t lu^ point of view of common sense. We 
 might as well object to the meter as a uiut of measure in preference to the 
 yard, or to reckoning the solar time from the Greenwich meridian instead 
 of the meridian of Paris.
 
 218 DETERMINATION OF PARAMETERS [117 
 
 The failur(> of tlu' luctliod ol' iiioiiuMils to coTiiputc with any degree of 
 aceunK'v luoineiits of liigher order in tlie case of the niajority of ordinary 
 obser\ations is i>robably the reason why some actuaries, especially in 
 America, have maintained that the Gram or Cliarlier A tyi>e of fre- 
 quency curAes is not ])owerful enough to represent more than moderately 
 skew i'requt'ncy distributions. 
 
 In spite of the incontrovertible fact that the most recent researches in 
 the theory of integral equations have demonstrated beyond doubt that 
 any freqiu'ucy curve can be de\("loped in convergent series by Hermite 
 polynomials in conjunction with tlie normal Laplacean frequency curve 
 an American actuary, Mr. Merwyn Davis, has taken the "bull by the 
 horns" so to speak and lioldly gone on record with the positive statement 
 that "the C'harlier series fails completely in cases of appreciable skewness." 
 With all due respect for this young matador who has so boldly entered 
 the ring to challenge the work of some of the most eminent mathemati- 
 cians in th(^ realm of integral equations I feel, however, that if Mr. Davis 
 has actually succeeded in "throwing the bull" it is only in the sense as 
 implied in the colloquial slang of his native America. In fact, we shall 
 presi'utly in some of our examples take up the challenge of Mr. Davis 
 and show that the series he so curtly rejects can — by means of a simple 
 transformation — be ixsed on decidedly skew frequency distributions with 
 even greater success than the Pearsonian curve types. 
 
 With these preliminary remarks we shall now proceed to 
 give several examples of the application of the Gram or Lapla- 
 cean — Charlier frequency series, employing either the method 
 of moments or the method of least squares in the numerical 
 determination of the constants, although preference will be 
 given to the latter method in cases of appreciable skewness or 
 excess. 
 
 It is, however, not our intention to go into details of the 
 method of least squares and its relation to error laws, except 
 in its connection with the problem of maximum and minimum. 
 Any number of standard treatises are now available on the sub- 
 ject, however, to which we may refer interested readers.* 
 
 117. Charlier's Scheme of Computations. — The general 
 formulae for the semi-invariants were given on page (192). 
 In practical work it is, however, of importance to proceed along 
 systematic lines and to furnish an automatic check for the cor- 
 rectness of the computations. Several systems facilitating such 
 work have been proposed by various writers but the most 
 simple and elegant is probably the one proposed by M. Charlier 
 and which is shown in detail with the necessary control checks 
 on the following pages. Charlier employs moments, while we 
 in the following demonstration shall prefer the use of the semi- 
 invariants. 
 
 * A particularly attractive presentation in Englisii is found in David Brunt's Com- 
 bination oj ObHrrvations (Cambridge, 1918).
 
 117] charlier's computation scheme 219 
 
 If we define the power sums of the relative frequencies ^{x) 
 
 f-\- .o r'^ CO 
 
 x'F{x)dx: I F{x)dx for r = 0,1,2,3 . . , . 
 
 we find that the expressions for the semi-invariants as given on 
 page (192) may be written as follows: 
 
 Xi =mi 
 
 /l2 = TO2— mr 
 
 /L-! = ma — Sw-iWi +2mi''' 
 
 X4 = W4 — 4???.tmi — ^m-r + 12?W2?Wi" — 6wi^ 
 
 The advantages of the Charlier scheme for the compuation 
 of the semi-invariants lies in the fact that it furnishes an auto- 
 matic check of the final results. If we expand the expression 
 {x-\-iyF{x) we have: 
 
 x'F{x) +4x^F(x) +6x-T(x) +4xF(x) -\-F{x) or 
 2](x + l)T(a:)=S4+4s3-f6s2+4si+So, 
 
 which serves as an independent control check of the computa- 
 tions. Moreover, another check is furnished by the relation 
 W4 = X +4miX3 + 6mi-X2 + 3Xo- + mV. 
 
 In order to illustrate the scheme we chose the following age 
 distribution of 1130 pensioned functionaries in a large American 
 Public Utility corporation. 
 
 Ages 
 
 No. of Pensioners 
 
 Ages 
 
 No. of Pensioners 
 
 35-:i9 
 
 1 
 
 6.5-69 
 
 28j 
 
 40-44 
 
 6 
 
 70-74 
 
 248 
 
 45-49 
 
 17 
 
 7r>-79 
 
 128 
 
 50-54 
 
 48 
 
 80-84 
 
 38 
 
 55-59 
 
 118 
 
 8.5-89 
 
 13 . 
 
 60-64 
 
 224 
 
 o\'(T 90 
 
 3 
 
 Choosing the age of 67 as a provisional origin the Charlier 
 scheme is shown in detail on next page. 
 
 The computation below gives the numerical values of the 
 frequency function which now may may be written as follows: 
 
 F(x)=1130[<^oU)+.0258.^3(x)+.0158^4(x)] 
 
 where -[Ct.^^Y 
 
 (f(i(x) = e 
 
 1.624 \ 27r
 
 220 
 
 DETERMINATION OF PARAMETERS 
 
 [118 
 
 Ages 
 35-39 
 40-44 
 45-49 
 50-54 
 55-59 
 60-66 
 65-69 
 
 F(x) 
 
 1 
 
 6 
 
 17 
 
 48 
 
 118 
 
 224 
 
 2S6 
 
 700 
 
 xF{x) 
 
 6 
 
 30 
 
 68 
 
 144 
 
 236 
 
 224 
 
 
 
 a;2F(x) 
 
 36 
 150 
 272 
 432 
 472 
 224 
 
 
 x3F{x) 
 
 216 
 
 750 
 
 1,088 
 
 1,296 
 
 944 
 
 224 
 
 
 
 x*F(x) (z+l)*F(z) 
 
 708 
 
 1,586 
 
 1,296 
 3,750 
 4,352 
 
 3,888 
 
 1,888 
 
 244 
 
 
 
 4,518 15,418 
 
 625 
 1,536 
 1,377 
 
 768 
 
 118 
 
 
 
 286 
 
 4,710 
 
 70-74 
 
 + 1 
 
 248 
 
 248 
 
 248 
 
 248 
 
 248 
 
 3,968 
 
 75-79 
 
 +2 
 
 128 
 
 256 
 
 512 
 
 1,024 
 
 2,048 
 
 10,368 
 
 80-84 
 
 +3 
 
 38 
 
 114 
 
 342 
 
 1,026 
 
 3,078 
 
 9,728 
 
 85-89 
 
 +4 
 
 13 
 
 52 
 
 208 
 
 832 
 
 3,328 
 
 8,125 
 
 90-94 
 
 +5 
 
 2 
 
 10 
 
 50 
 
 250 
 
 1,250 
 
 2,592 
 
 over 95 
 
 +6 
 
 1 
 
 6 
 
 36 
 
 216 
 
 1,296 
 
 2,401 
 
 2 430 
 
 686 
 
 1,396 
 
 3,596 
 
 11,248 
 
 37,182 
 
 Sr 1,130 
 
 — 22 
 
 2,982 
 
 -922 
 
 26,666 
 
 41,892 
 
 wir= 1.0000 -.0195 2.6378 -.8156 23.5699 
 
 Xi =mi = — 
 
 .0195 
 
 m2= 2.6378 
 
 S4= 26,646 
 
 Xi2=wi2 = 
 
 .0004 
 
 -mi2 = -.0004 
 
 4s3 = -3,688 
 
 Xi^ = mi^ = 
 
 .0000 
 
 X2= 2.6374=0-2 
 
 6s2= 17,892 
 
 Xi^ =rai'* = 
 
 .0000 
 
 ■Vxi= 1.6240=0" 
 
 4si = - 88 
 
 
 
 4.2831=0-3 
 
 «o= 1,130 
 
 
 
 6.9558=0-4 
 
 41,892 
 
 wi2mi = —.05 
 
 13, i/ismi 
 
 . = .0159 m-i^ =6.9580, moniy^ = .0010 
 
 mz = - 
 
 -.8156 
 
 nn = 23.5699 
 
 X4= 2.6450 
 
 — 3m2TOi = 
 
 .1539 
 
 — 4»i3Wi = — .0636 
 
 4miX3 = .0516 
 
 2mi3 = 
 
 .0000 
 
 -3^22 = -20.8740 
 
 6mi2X2 = 0060 
 
 X3=- 
 
 .0017 
 
 12m2TOi2 = .0127 
 
 3X22=20.8677 
 
 
 
 -6wu4 = .0000 
 X4 = 2.6450 
 
 wu" = .0000 
 
 
 23.5703 =W4 
 
 C3 = X3 -.a^ = - 
 
 -.1545 
 
 C4=X4:o-4=.3803 
 
 
 -C3:3!=.0258 
 
 
 C4:4! = +.0158 
 
 
 118. Comparison Between Observed Data and Theo- 
 retical \ allies. — The next step is now to work out the numeri- 
 cal values of F(x) for various values of x and compare such 
 values with the ones originally observed. This process is shown 
 in detail in the following scheme:
 
 119 J OBSERVED AND THEORETICAL VALUES 221 
 
 (1) (2) (3) (4) (o) (Oj (7) (8) (9) (10) Obs. 
 X I— Xi {x—\^)■.a^p^^{^) ifsiz) (p^iz) 
 
 -7 -G.9S -4.:«)0 .OOOl +.0058 +.0170 +.0001 +.000:5 .000.') 
 
 -6 r.<)s :5.()S'j .000.-) .0170 .0470 .001.') .ooos .oo-js ■_' 1 
 
 -.5 4 OS :5.0()7 .()():)() .0710 .12()7 .OOIS .OOiO .0074 .5 
 
 -4 3 OS 2.4.-)l .019S .14.-)S +.0()()2 .0(«S +.0000 .024.-) 17 17 
 
 -3 2.98 1.835 .0741 +.0.500 -.4345 +.0013 -.00()S 0()S() 48 48 
 
 -2 1.98 1.219 .1897 -.3.502 -.7036 -.0090 .01 1 1 . KiOO 118 118 
 
 -1 -0.98 -0.()03 .3320 -.5287 +.31()0 -.013() -.00.-)0 3140 219 224 
 
 +0.02 +0.012 .3989 +.0143 1.1963 +.0004 +.0189 .4182 291 286 
 
 + 1 1.02 0.628.3273 ..53.59 +.2.584 .0138 +.0041 .34.52 241 248 
 
 +2 2.02 1.244 .1835 +.3325 -.7157 +.0086 -.0113 .1808 126 128 
 
 +3 3.02 1.8()0 .0707 -.0605 -.4094 -.0015 -.()()()5 .0()27 44 38 
 
 +4 4.02 2.475.0186 .1443 +.0703 .0037 +.001 1 .0212 15 13 
 
 +5 5.02 3.091 .0034 .(UiSO .1241 .0018 .0020 .003() 3 2 
 
 +6 6.02 3.707 .0004 .0165 .045() .0004 .0007 .0007 1 1 
 
 +7 +7.02 +4.322 .0001 -.0050 .0162 -.0001 +.0003 .0003 
 
 Column (1) gives the values of the variate x reckoned from 
 the provisional origin, or the centre of the age interval 65-69. 
 
 (2) is X less the first semi-invariant, whereby the origin is shifted 
 to the mean or a. Column (3) represents the final linear trans- 
 formation : z= {x — Ai) :<T. 
 
 Columns (4), (5) and (6) are copied directly from the stand- 
 ard tables of J0rgensen or Charlier. Column (7) is (5) multiplied 
 by 0.0258 or the product - [csiPsiz) ] :3 !, while (8) is [d^^iz) J :4 !. 
 
 Column (9) is the sum of (4), (7) and (8) If we now distri- 
 bute the area N = so or 1130 PRO rata according to (9), we 
 finally reach the theoretical frequency distribution expressed 
 in 5-year age intervals and shown in column (10) alongside 
 which we have inserted the originally observed values. Evi- 
 dently the fit is satisfactory. It will be noted that the final 
 frequency series is expressed in units of 5-year age intervals. 
 This, however, is only a formal representation. By subdividing 
 the unit intervals of column (1) in 5 equal parts, and by com- 
 puting all the other columns accordingly, we get the theoretical 
 frequency series expressed in single year age intervals. 
 
 119. The Principle of Method of Least Squares.— The 
 
 following paragraph purports to give a brief exposition of the 
 determination of the coefficients in the Gram or Laplacean — 
 Charlier series in the sense of the method of least squares as a 
 strict problem of maxima and minima, wholly independent of 
 the connection between the method of least squares and the 
 error laws of precision measurements.* 
 
 *In the following demonstration I am adhering to the brief and lucid exposition 
 of the Argcmineau actuary, U. Broggi, in his e.xellent Traite d' Assurances sur la Vie.
 
 222 DETERMINATION OF PARAMETERS [119 
 
 The simple problem in maxima and minima which forms the 
 fundamental basis of the method of least squares is the follow- 
 ing: Let VI unknown quantities be determined by observations 
 in such a manner that they are not observed directly but enter 
 into certain A- /^o;/'/i functional relations, /,(.ri, 0:2, x.i, .... x„,), 
 containing the unknown independent variables, X\, X2, xs, . . . x,„. 
 Let furthei-more the number of observations on such functional 
 relations be ii (where n is greater than w). The problem is 
 then to determine the most plausible system of the values of 
 the unknown x's from the observed system. 
 
 /l (Xi, X2, x^, . . 
 
 . . Xj=0i 
 
 /2 (Xl, X2, X3, . . 
 
 . . Xj =02 
 
 /„ (Xi, X2, Xz, . 
 
 • • X „J 0„ 
 
 when /i, j-i, . . . /„ are the known functional relations and 
 oi, 0-2, . . . o„ their observed values. Such equations are known 
 as observation equations. 
 
 In order to further simplify our problem we shall also assume 
 that 
 
 1 All the equations of the system have the same weight, and 
 
 2 All the equations are reduced to linear form. 
 
 By these assumptions the problem is reduced to find m in- 
 knowns from n linear equations. 
 
 ai xi 
 
 + 6. 
 
 X2 
 
 + . . 
 
 . =01 
 
 ai x\ 
 
 + 62 
 
 Xi 
 
 + . . 
 
 . =02 
 
 03X1 
 
 + 63 
 
 Xi 
 
 + . . 
 
 . =03 
 
 a„ xi 
 
 1 + 6, 
 
 , X. 
 
 I + . 
 
 . . =0, 
 
 Since n is greater than m we find the problem over-deter- 
 mined, and we therefore seek to determine the unknown quan- 
 tites, Xi, Xj, . .x,„ in such a way that the sum of the squares 
 of the differences between the functional relations and the ob- 
 served values, becomes a minimum. This implies that the 
 expression 
 
 i = n 
 
 X(a.a:i+6.X2+ . . . -o,)' = i/'(xi, X2, . . . xj 
 i = l
 
 119j METHOD OF LEAST SQUARES 223 
 
 must be a minimum or the simultaneous existence of the equa 
 tions. 
 
 dx\ ' dx2 ' dx ,n 
 
 If we now introduce the following notation 
 
 aiX\ + h,x-2-{- . . . — 0, = "a, for ^ = li 2i 3, . . . n, 
 
 The w equations in the above system (/) evidently take on 
 the following form 
 
 Aiai + AL'a-j+ . . . +A„a„=0 
 ■A161 + X2&2+ . . . +-A„6„=0 
 
 If we now again re-substitute the expressions for "a in terms of 
 the linear relations 
 
 aiXi^b,X2+ . . . 0, = A„ for ?' = 1, 2, 3, . . .n, 
 
 and collect the coefficients of Xi, x^, . . . x„, these equations 
 may be expressed in the following symbolical form: 
 
 [aa]xi + [ab]x2 +....— [ao] = 
 [ab]xi + [bb]x2 +....- [bo] = 
 
 [ak]xi + [6^-].r2 +....+ [kkU,„ - [ko] = 
 
 where [aaj = Oi'- + ar + . . . . 
 [ab\ = Oi 61 + 02 1'2 + . . . . 
 
 is the Gaussian notation for the homogenous sum products. 
 
 The above equations are known as normal equations, and it is 
 readily seen that there is one normal equation corresponding 
 to each unknown. Our problem is therefore reduced to the 
 solution of a system of simultaneous linear equations of m un- 
 knowns. If m is a small number, or, what amounts to the same 
 thing, there are only two or three unknowns the solution can be 
 carried on by simple algebraic methods or determinants. If 
 the number of unknowns is large these methods become very 
 laborious and impractical. It is one of the achievements of the 
 great German mathematician, Gauss, to have given us a 
 method of solution which reduces this labor to a minimum and
 
 224 DETERMINATION OF PARAMETERS [120 
 
 which proceeds along well defined systematic and practical 
 lines. The method is known as the Gaussian algorithmus of 
 successive elimination. 
 
 120. Gauss' Solution of Normal Equations. — For the sake 
 of simplicity we shall limit ourselves to a system of four normal 
 equations of the form 
 
 [aa]xi + [ab]x2 + [ac]x3 + [ad]xi — [ao] = 
 
 [ab]xi + [bb]xi + [bc]x3 + [bd]xi — [bo] = 
 
 [ac]xi + [bc]x2 + [cc]x3 + [cd]Xi — [co] = 
 
 [ad]xi + [bd]x2 + [cd\x3 + [dd]xi — [do] = 
 
 The generalization to an arbitrary number of unknowns offers 
 no difficulties, however. 
 
 On account of their symmetrical form the above equations 
 may also be written in the more convenient form, viz. : 
 
 [aa]xi + [ab]x2 + [acxs + [ad]xi — [ao] = 
 
 [bb]x2 + [bc]x3 + [bd]Xi - [bo] = 
 
 [cc]x3 + [cd]Xi — [co] = 
 
 [dd]Xi - [do] = 
 
 From the first equation we find 
 
 [ao] [ab] [ac] [ad] 
 
 Xl = . - - X2 - - X3 - ~-Xi 
 
 [aa] [aa] [aa] [aa] 
 
 Substituting this value in the following equations and by the 
 introduction of the new symbol 
 
 [ik] — , ' [ak] = [ik.l] 
 [aa] 
 
 we now obtain a new system of equations of a lower order and 
 of the form 
 
 [bb.l]x2 + [bc.l]x3 + [bd.l]xi - [bo.l] = 
 
 [c'c.l]x3 + [cd.l]xi — [co.l] = 
 
 [ddA]x4 - [do.l] = 
 
 Solving for X2 we have 
 
 ^ [bo.l] _ [6c. 1 ] _ [bd.l] 
 ''' [66.1J [66.1]'^' " [66.1]'^'
 
 121] GAUSS' SOLUTION 225 
 
 Substituting in the following equations and writing 
 
 166.1] 
 
 we have 
 
 [cc.2]x3 + [cd.2]x4 = [co.2] 
 
 [dd.2]xi = [do.2], or 
 
 _ [co.2] _ [crf^l 
 "'^ " [CC.2] [cc.2f' 
 
 Moreover, by writing 
 
 \ck 21 
 [ik.2] - [ci.2\ "^^ = [ik.S]., we have finally 
 [cc.2] 
 
 [dd.S]Xi = [do.Z] 
 
 This gives us the final reduced normal equation of the lowest 
 order. By successive substitution we therefore have: 
 
 X4 = 
 
 [do.S] 
 [dd.S] 
 
 _ [co.2] _ [cd.2] 
 ""' ~ [cc.2] ~ [cc.2]''" 
 
 _ [ho.l] _ [6cll _ [6d.l] 
 ^' ~ [66.1] [66.1]^' [66.1]^' 
 
 [ao] [ab] [ac] [ad] 
 
 Xi ^ . , - , -X2 X3 - Xi 
 
 [aa] [aa] [aa] [aa\ 
 
 as the ultimate solution of the unknowns. 
 
 121. Arithmetical Application of Method. — The example 
 
 in paragraph 117 gave an illustration of the application of the 
 method of moments. As previously stated this method works 
 quite well in cases of moderate skewness, but is less successful 
 in extremely skew curves and where the excess is large. We 
 shall now give an illustration of the calculation of the para- 
 meters by the method of least squares. The example we choose 
 is the well-known statistical series by the distinguished Dutch
 
 226 DETERMINATION OF PARAMETERS [121 
 
 botanist, deVries, on the number of petal flowers in Ranunculus 
 Bulbosus.* This is also one of the classical examples of Karl 
 Pearson in his celebrated original memoirs on skew variation. 
 Although the observations of deVries lend themselves more 
 readily to the method of logarithmic transformation, which we 
 shall discuss in a following chapter, we have deliberately chosen 
 to use it here for two specific reasons Firstly it is a most strik- 
 ing illustration in refutation of the incautious criticism of the 
 Gram-Charlier series by the aforementioned Mr. Davis. Sec- 
 ondly (and this is the more important reason) it offers an ex- 
 cellent drill for the student in the practical applications of the 
 method of least squares because it gives in a very brief compass 
 all the essential arithmetical details. The observations of 
 deVries are as follows : 
 
 i. of Petals 
 
 X 
 
 F{x) =0. 
 
 5 
 
 
 
 133 
 
 6 
 
 1 
 
 55 
 
 7 
 
 2 
 
 23 
 
 8 
 
 3 
 
 7 
 
 9 
 
 4 
 
 2 
 
 10 
 
 5 
 
 2 
 
 where F(x) denotes the absolute frequencies. The observed 
 frequency distribution is well nigh as skew as it can be and rep- 
 resents in fact a one-sided curve, and should therefore — if the 
 statement by Mr. Davis is correct — show an absolute defiance 
 to a graduation by the Gram-Charlier series. 
 
 The process we shall use in the attempted mathematical 
 representation of the above series is a combination of the method 
 of semi-invariants and the method of least squares. Following 
 Thiele's advice we determine the first two semi-invariants in 
 the generating function directly fr-om the observations while 
 the coefficients of this function and its derivatives are deter- 
 mined by the least square method. 
 
 Choosing the provisional origin at 5, we obtain the following 
 values f(jr the crude moments. 
 
 *rnf|<)ulii<(lly many nwlirs will think that I have spent an luuisnal lii-avy amount 
 of arithmiiii- on such a simple (■xampli\ This criticism is true, and in actual curve 
 fitting practice we would of course re-sort to a logarithmic transformation. The ex- 
 ami)le is in this particular instance chosen as a drill for I lie si iident . I may. however 
 remark that if one weri? to us(? I'ersonian curves the arithmetical work would bo even 
 more formidable than through the application of least squares: because we would 
 have to rt^sort to mechanical quadrature formulas in order to compute the areas in 
 Pearson's curves.
 
 121] APPLICATION OF METHOD 227 
 
 So =^222, Si-140, Si =292, S3 = 806, S4 =2,752, S5 = 10,790, S6 = 
 46,072, si =207,226, from which we find that 
 
 Xo = l, X, =0.631, A2 = 0.917, 7.3=1.644, A4 = 3.377, Ar. = 5.972, 
 X6= -2.911, AT =-122.638. 
 
 All these semi-invariants with the exception of the two 
 first are howevei', so greatly influenced by random sampling 
 in the small observation series that it is hopeless to use them in 
 the determination of the constants in the Gram-Charlier series. 
 In fact an actual calculation does not give a very good result 
 beyond that of a first rough approximation. The generating 
 function, on the other hand, may be expressed by the aid of 
 the two first semi-invariants as follows: 
 
 I -z2:2 
 
 where z is given by the linear transformation: 
 
 z = (:r - 0.631 ):0. 9576. (v">^> = 0.9576). 
 
 We now propose to express the observed function F{x) or 
 (p{z) by a Gram-Charlier series of the form: 
 
 F{x) = <p{z)=^koiPQ(z)-\-k-i^i{z)+kA^i{z)-{- h(pi{z). 
 
 In this equation we know the values of the generating func- 
 tion and its derivatives for various values of the variate z as 
 found in the tables of J0rgensen and Charlier, while the quanti- 
 ties k are unknowns. On the other hand we know 6 specific 
 values of F{x) as directly observed in deVries's observation 
 series. We are thus dealing with a system of typical linear 
 observation equations of the forms described in paragraphs 
 119 and 120 and which lend themselves so admirably to the 
 treatment by the method of least squares. 
 
 From the above linear relation between x and z we can directly 
 compute the following table for the transformed variate z. 
 
 
 
 -0.688 
 
 1 
 
 +0.402 
 
 2 
 
 + 1.493 
 
 3 
 
 +2.583 
 
 4 
 
 +3.674 
 
 5 
 
 +4.764
 
 228 DETERMINATION OF PARAMETERS [121 
 
 The numerical values of s^u(2) and its derivatives as corres- 
 ponding to the above values of z can be taken directly from the 
 standard tables of J0rgensen and Charlier. We may there- 
 fore write down the following observation equations (a) 
 
 S^O 9z ^4 ^5 ^6 
 
 .314SAo 
 
 -.5472A-3 
 
 + .1207A-4 
 
 + 2.272SA-5 
 
 + .9591A-6 
 
 -133 
 
 = 
 
 .8()79Ao 
 
 +.419SAs 
 
 + .7566A-4 
 
 -1.9S36A-5 
 
 -2.9860A-6 
 
 - 55 
 
 = 
 
 .l.SOSAo 
 
 + .150f.A-3 
 
 -.7073A-4 
 
 + .4540A-5 
 
 +2.8600A-6 
 
 - 23 
 
 = 
 
 .014r,A-o 
 
 -.134GA:, 
 
 + .l()02A-4 
 
 + .2()42A-5 
 
 -1.21H()A-6 
 
 - 7 
 
 = 
 
 .OOOoAo 
 
 -.OlSOA-3 
 
 + .04S(iA-4. 
 
 - .1070A-5 
 
 + .14,S2A-6 
 
 - 2 
 
 = 
 
 .OOOUo 
 
 -.0005A-3 
 
 + .0020A.4, 
 
 - .0()4;U-5 
 
 + .054()A-6 
 
 _ 2 
 
 = 
 
 for which we now propose to determine the unknown values of 
 k by the least square method. 
 
 While this method may of course be applied directly to the 
 above data, it will generally be found of advantage to start 
 with some approximate values of the /c's. It is found in prac- 
 tice that this approximate step saves considerable labour in 
 the formation and ultimate solution of the normal equations. 
 
 Although the first approximation in the case of numerous un- 
 knowns must be in the nature of a more or less shrewd guess, 
 which facility can only be attained by constant practice in rou- 
 tine mathematical computing, we are, however, in this specific 
 instance able to tell something about the nature of the coeffi- 
 cients from purely a priori considerations. W^e know for in- 
 stance from the form of the Gram-Charlier series that the 
 coefficient ko of the generating function must be nearly equal 
 to the area of the curve, which in this particular instance is 222. 
 Moreover, a mere glance at the observed series tells us that it 
 has a decidedly large skewness in negative direction from the 
 mean coupled with a tendency of being "top heavy," indicating 
 positive excess. We can therefore assume as a first approxima- 
 tion that the coefficients of the derivatives of uneven order are 
 negative and the coefficients of derivatives of even order are 
 positive. Again, it is also seen that the coefficients of the de- 
 rivatives of higher order than the fourth must be relatively 
 small in comparison with the coefficients of the derivatives of 
 lower order, otherwise the series would not be rapidly con- 
 vergent. 
 
 From such purely common sense a priori considerations we 
 therefore guess the following first approximations, viz.: 
 
 A;'o = 222, A;'3=-10(), k\ = 20, k''o=-5, k\ = 5.
 
 121] APPLICATION OF METHOD 229 
 
 The probable values of the various k's may be written as 
 
 ki = r,k\ for i = 0, 3, 4, 5 and 6, 
 
 and our problem is therefore to find the correction factor r, with 
 which the approximate value k\ must be multiplied so as to 
 give k,. 
 
 Applying the various values of k\ to the original observation 
 equations in (a) we obtain the following schedule for the numeri- 
 cal factors of r,. 
 
 a 
 
 b 
 
 c 
 
 d 
 
 e 
 
 
 
 j 
 
 699 
 
 +547 
 
 + 24 
 
 -144 
 
 + 48 
 
 -1,330 
 
 -126 
 
 817 
 
 -420 
 
 + 151 
 
 + 99 
 
 -149 
 
 - 550 
 
 - 52 
 
 290 
 
 -151 
 
 -141 
 
 - 23 
 
 +143 
 
 - 230 
 
 -112 
 
 32 
 
 + 135 
 
 + 21 
 
 - 13 
 
 - 61 
 
 - 70 
 
 + 44 
 
 1 
 
 + IS 
 
 + 10 
 
 + 5 
 
 + 7 
 
 - 30 
 
 + 11 
 
 
 
 + 
 
 + 
 
 + 
 
 + 3 
 
 - 10 
 
 - 7 
 
 1839 +129 +65 - 46 - 9 -2,220 —242 
 
 where the additional control column s serves as a check. 
 
 The subsequent formation of the various sum-products and 
 normal equations is shown in the following schedules together 
 with the s columns as a check. 
 
 aa 
 
 ab 
 
 
 ac 
 
 
 ad 
 
 ae 
 
 
 ao 
 
 
 as 
 
 488,601 
 
 +382,353 
 
 + 
 
 16,776 
 
 - 
 
 79,686 
 
 + 33,552 
 
 - 
 
 929,670 
 
 — 
 
 88,074 
 
 667,489 
 
 -343,140 
 
 + 123,367 
 
 + 
 
 80.SS3 
 
 -121,733 
 
 — 
 
 449,350 
 
 — 
 
 42,484 
 
 84,100 
 
 - 43,790 
 
 - 
 
 40,890 
 
 - 
 
 6,670 
 
 + 41,470 
 
 - 
 
 66,700 
 
 - 
 
 32,480 
 
 1,024 
 
 + 4,320 
 
 + 
 
 672 
 
 - 
 
 416 
 
 - 1,952 
 
 - 
 
 2,240 
 
 + 
 
 1,408 
 
 1 
 
 + 18 
 
 + 
 
 10 
 
 + 
 
 5 
 
 + 7 
 
 - 
 
 30 
 
 + 
 
 11 
 
 
 
 + 
 
 + 
 
 
 
 + 
 
 
 
 + 
 
 - 
 
 
 
 
 
 
 1,241,215 
 
 239 
 
 + 
 
 99,935 
 
 - 
 
 5,884 
 
 - 48,656 
 
 -1,447,990 
 
 -161,619 
 
 
 66 
 
 
 6c 
 
 
 bd 
 
 be 
 
 
 bo 
 
 
 6s 
 
 
 299,209 
 
 + 
 
 13,128 
 
 - 
 
 63,258 
 
 + 26,256 
 
 - 
 
 727,510 
 
 — 
 
 68,922 
 
 
 176,400 
 
 - 
 
 63,420 
 
 - 
 
 41,580 
 
 + 62,580 
 
 + 
 
 231,000 
 
 + 
 
 21,840 
 
 
 22,801 
 
 + 
 
 21,291 
 
 + 
 
 3,473 
 
 - 21,593 
 
 + 
 
 34,730 
 
 + 
 
 16,912 
 
 
 18,225 
 
 + 
 
 2,835 
 
 - 
 
 1,755 
 
 - 8,235 
 
 - 
 
 9,450 
 
 + 
 
 5,940 
 
 
 324 
 
 + 
 
 180 
 
 + 
 
 90 
 
 + 126 
 
 - 
 
 540 
 
 + 
 
 198 
 
 
 
 
 + 
 
 
 
 + 
 
 
 
 + 
 
 - 
 
 
 
 _+_ 
 
 
 
 
 516,959 
 
 — 
 
 25,986 
 
 -102,130 
 
 + 59,134 
 
 - 
 
 471,770 
 
 - 
 
 24,032 
 
 
 
 
 cc 
 
 
 cd 
 
 ce 
 
 
 CO 
 
 
 cs 
 
 
 
 
 576 
 
 - 
 
 2,736 
 
 + 1,152 
 
 - 
 
 31,920 
 
 _ 
 
 3.024 
 
 
 
 
 22,801 
 
 + 
 
 14,949 
 
 - 22,499 
 
 - 
 
 83,050 
 
 - 
 
 7,852 
 
 
 
 
 19,881 
 
 + 
 
 3,243 
 
 - 20,163 
 
 + 
 
 32.430 
 
 + 
 
 15.792 
 
 
 
 
 441 
 
 - 
 
 273 
 
 - 1,281 
 
 — 
 
 1,470 
 
 + 
 
 924 
 
 
 
 
 100 
 
 + 
 
 50 
 
 + 70 
 
 - 
 
 300 
 
 -L 
 
 110 
 
 
 
 
 
 
 + 
 
 
 
 + 
 
 - 
 
 
 
 - 
 
 
 
 43,799 + 15,233 - 42,721 - 84,310 + 5,950
 
 230 
 
 DETERMINATION OF PARAMETERS 
 
 [121 
 
 dd 
 
 12.996 
 
 9,801 
 
 529 
 
 169 
 
 25 
 
 
 
 de 
 
 5,472 
 
 14,751 
 
 3.289 
 
 793 
 
 35 
 
 
 
 ■2 + 
 
 do 
 
 151.620 
 
 54,450 
 
 5,290 
 
 910 
 
 150 
 
 
 
 ds 
 + 14,364 
 - 5,148 
 + 2.576 
 572 
 + 55 
 
 + 
 
 23,520 
 
 - 22,684 
 
 + 
 
 103,220 
 
 + 
 
 11,275 
 
 
 ee 
 
 
 eo 
 
 
 es 
 
 
 2,304 
 
 - 
 
 63,840 
 
 - 
 
 6,048 
 
 
 22,201 
 
 + 
 
 81.9.50 
 
 + 
 
 7,748 
 
 
 20.449 
 
 - 
 
 32.890 
 
 - 
 
 16.016 
 
 
 3,721 
 
 + 
 
 4,270 
 
 - 
 
 2.684 
 
 
 49 
 
 - 
 
 210 
 
 + 
 
 77 
 
 
 9 
 
 - 
 
 30 
 
 - 
 
 21 
 
 48,733 
 
 10,750 - 16,944 
 
 We may now write the normal equations in schedule form as 
 follows : 
 
 (1) 
 
 (b) 
 (2) 
 (c) 
 (3) 
 id) 
 (4) 
 (e) 
 
 Original Normal Equations 
 
 (a) 1,241,215 - 239 + 99935 
 
 5884 
 
 48656 -1447990 
 
 + 
 
 - 
 
 19 
 
 + 
 
 1 + 
 
 9 + 
 
 278 
 
 +516959 
 
 - 25986 
 
 -102130 
 
 + 59134 
 
 - 
 
 471770 
 
 + 8046 
 
 - 47 4 
 
 - 3917 
 
 - 
 
 116582 
 
 + 43799 
 
 + 15233 
 
 - 42721 
 
 - 
 
 84310 
 
 
 + 28 
 
 + 231 
 
 + 
 
 6865 
 
 
 + 23520 
 
 - 22684 
 
 + 
 
 103220 
 
 
 
 + 1907 
 
 + 
 
 56761 
 
 
 
 -f 48733 
 
 - 
 
 10750 
 
 (o) 
 
 .00019 
 
 + .08051 -.00474 -.03920 -1.16659 
 
 The sum-products from the observation equations are shown 
 in the rows marked (a), (6), (c), (d) and (e). The row marked 
 (5) and printed in italics is formed by dividing each of th 
 figures in row (a) with 1,241,215. The row marked (1) con- 
 tains the products of the figures in row [a) multiplied with the 
 factor .00019. Row (2) is the products of the factor 0.08051 
 and the figures in row (a), while row (3) is the product of the 
 factor — 0.00474 and the figures in row (a). The products in 
 row (4) are formed in the same manner by means of the factor 
 -0 03920. 
 
 We next subtract row (1) from row (b), row (2) from row (c), 
 row (3) from row (d), and so forth, which results in the follow- 
 ing schedule, which is known as the first reduction equation.
 
 121] 
 
 APPLICATION OF METHOD 
 
 231 
 
 (a) 
 (1) 
 (6) 
 (2) 
 (c) 
 (3) 
 (d) 
 
 First Reduction Equation 
 
 +516959 
 
 - 25907 
 + 130/, 
 + 35753 
 
 -102131 
 
 + 5130 
 + 15707 
 + SO 177 
 + 23492 
 
 + 59125 
 
 - 2.970 
 
 - 38804 
 
 - 11681 
 
 - 22915 
 4- 6763 
 + 40826 
 
 472048 
 23711 
 32272 
 93258 
 96355 
 53988 
 6751 1 
 
 (4) 
 
 -.05023 
 
 .19756 +.114.37 - .91313 
 
 The above equations are treated in a similar manner as the 
 original normal equations, and we have therefore the 2d reduc- 
 tion equation of the form: 
 
 (a) 
 (1) 
 (&) 
 (2) 
 
 Second Reduction Equation 
 
 + 34451 
 
 + 10577 
 
 - 35834 
 
 + 
 
 8561 
 
 + 32 /t7 
 
 - 11002 
 
 + 
 
 2628 
 
 + 3315 
 
 - 11234 
 
 + 
 
 3097 
 
 
 + 37273 
 
 — 
 
 8905 
 
 
 + 40064 
 
 — 
 
 13523 
 
 + .30702 
 
 -1.04014 
 
 + 
 
 .24850 
 
 : Equation 
 
 
 
 
 + 08 
 
 232 
 
 + 
 
 469 
 
 
 + 791 
 
 - 
 
 1600 
 
 
 + 2791 
 
 — 
 
 4618 
 
 (3) 
 
 (a) 
 (1) 
 
 (2) 
 
 -3.41170 +6.8.9706 
 
 Fourth Reduction Equation 
 
 + 2000 - 3018 
 
 The solution for the unknown r's may now be shown as fol- 
 lows: 
 
 re =3018:2000 = 1.5090 
 
 ro= -6.8971 -(1.5090)( -3.4117) = -1.7488 
 
 n=- 0.2485 - ( - 1.7488) (0.3070) - (1.5090) ( - 1.0401) = 1.8580 
 
 ra = 0.9131- (1.8580)(-0.0502)-(-1.7488)(-0.1976)- (1.5090) 
 (0.1144) =0.4884 
 
 ro -- 1.1666 - (0.4884) ( - 0.0002) - (1.8580) (0.0805) - ( - 1.7488) 
 ( -0.0047) - (1.5090) ( -0.0394) = 1.0679
 
 232 DETERMINATION OF PARAMETERS [121 
 
 From the above values of r and by means of the relation 
 
 A•,= ,^A^2for^=0, 3,4, 5and6 
 
 we can easily determine the most probable values of A;, with 
 which the original observation equations as shown on page (228) 
 must be multiplied so as to satisfy the observed values of F{x) 
 in the sense as implied in the method of least squares. 
 This results in the following arrangement: 
 
 z 
 
 A-o<^o(^) 
 
 A-3^3(Z) 
 
 k-i<Pi{2) 
 
 A-5^5(-') 
 
 A-6^6(2) 
 
 ^ki<Pi 
 
 Obs. 
 
 -0.688 
 
 74.6 
 
 +26.7 
 
 + 4.4 
 
 + 19.7 
 
 + 7.2 
 
 132.6 
 
 133 
 
 +0.402 
 
 87.2 
 
 -20.5 
 
 +28.1 
 
 -17.2 
 
 — 22.4 
 
 55.2 
 
 55 
 
 1.493 
 
 31.0 
 
 - 7.3 
 
 -26.2 
 
 + 3.9 
 
 +21.5 
 
 22.9 
 
 23 
 
 2.583 
 
 3.4 
 
 + 6.6 
 
 + 3.9 
 
 + 2.3 
 
 - 9.1 
 
 7.1 
 
 7 
 
 3.674 0.2 + 0.9 +1.8 - 0.9 + 1.0 3.0 2 
 
 +4.764 0.0 + 0.0 + 0.1 + 0.0 + 0.0 0.1 2 
 
 The agreement between the calculated values and the origi- 
 nally observed series leaves evidently little to be desired in the 
 way of a satisfactory "fit." 
 
 If we limited ourselves to three terms of the series and put 
 
 F(x) = (p{z)^ ^knpiiz) for « = 0, 3 and 4 
 
 and then determined kn, ky, and A;4 b y the method of least squares 
 the final result would be of the f oi'm : 
 
 <r(2) =264.2(^0(2) -89.9sr;,(z) -5.2(^^(2), 
 
 for which the calculated values of the frequency function would 
 be as follows: 
 
 X 
 
 F(x) 
 
 Obs. 
 
 Pearson 
 
 5 
 
 131.6 
 
 133 
 
 136.9 
 
 6 
 
 55.2 
 
 55 
 
 48.5 
 
 7 
 
 24.5 
 
 23. 
 
 22.6 
 
 8 
 
 15.5 
 
 7 
 
 9.6 
 
 9 
 
 1.6 
 
 2 
 
 3.4 
 
 10 
 
 0.2 
 
 2 
 
 0.8 
 
 The fit is evidently not so close as when we use 6 terms, but 
 it is by no means a poor fit and does not require nearly so much 
 arithmetical work as the larger number of terms in the fre- 
 quency series. In this connection it is of interest to compare 
 the present graduation with the result reached by Pearson, 
 which is also shown in the above table. Taken all in all there
 
 121] DETERMINATION OF PARAMETERS 233 
 
 is no doubt in my mind that the serial expension gives far better 
 results than the Pearsonian methods and does not entail nearly 
 so much labour as these. 
 
 Note on Adjusted Moments. — The theoretical moments are given 
 in the form of definite integrals while the observations always give us the 
 moments on the form 
 
 n= ■» n + l'ia 
 n= X n — ^-^a 
 
 where a is the class interval of the observations. In order to determine 
 the semi-invariants it is, however, required to know the continuous 
 moments 
 
 Mr =fx'<P{x)dx 
 
 The values of s being given in the form of finite sums and not as defi- 
 nite integrals are therefore subject to certain adjustments if we wish to 
 express them as continuous moments. The necessary adjustments can, 
 however, easily be performed by well-known formulas from the theory 
 of mechanical quadrature if the frequency function and its derivatives 
 vanish for x = — =» and x = + » . The English mathematician, Sheppard, 
 has among others developed the following simple formulas for the transi- 
 tion from s to M : 
 
 il/o =So, Ml =Sl, Mi =S2 — T^S2. Mi =S3 —7 Si 
 
 ,r a2 7« 4 r)«2 7a2 
 
 A/4 =S4 — -J so I .7T7) *'"' '"5 ~^'^ — (^ *'3 +4© *i 
 
 The Sheppard adjustments again emphasizes the fact that the method 
 of moments works with curve areas instead of curve ordinatcs, which 
 necessarily must lead to some sort of mechanical quadrature formula 
 unless we are able to evaluate the indefinite integrals of the expressions 
 for the frequency functions. If we use curve ordinates to calculate the 
 specific numerical values of Pearson's frequency functions we are liable 
 to encounter large errors. This fact is among other things pointed out 
 by Caradog Jones who in mentioning the use of ordinates points out that 
 "it must be remembered the resulting values are only a first approxima- 
 tion to the observed frequencies and a better series is obtained if, by 
 using some good quadrature formula, we calculate the AREAS for the 
 suecessi\'e groups between the curve, the l)ounding ordinates, and the 
 axis of X." This is one of the great drawbacks to the otherwise elegant 
 Pearsonian types of frequeiKn' curves because it entails a large amount 
 of arithmetical work to compute specific numerical values from the final 
 formulas as determined by Pearson's curve types. 
 
 Any reader who will take the trouble to consult the original memoirs 
 by Pearson and Elderton and the recently published treatise by Mr.
 
 234 DETERMINATION OF PARAMETERS [121 
 
 Caradog Jones will tlirro find ample evidence of the large amount of 
 tedious arithmetical work involved in the application of mechanical 
 quadrature formulas. The recently suggested finite difference equation 
 formulas by the American mathematican, Carver, while emphasizing the 
 ditliculty of applying the Pearson system, do not tend materially to shorten 
 the arithmetical work very much, and Mr. Carver must in the final in- 
 stance resort to mechanical quadratiu*e. * 
 
 All these difficulties are, however, eliminated in the case of the deter- 
 mination of the frequency function in serial form by means of the method 
 of least squares where we work equally well with ordinates as with areas. 
 A firrther advantage of the Gram — Charlier expansion in series is found 
 in the fact that standard tables of the generating function and its deriva- 
 tives as well as the definite integrals of these functions have been published 
 both by Briihns, Charlier and Jorgensen. Speaking from a purely per- 
 sonal point of view I wish to state that tlu-ough a long and varied experi- 
 ence in i>ractical curve fitting to the most diverse kinds of statistical data 
 I have had occasion to use both the Pearsonian and the Gram — Charlier 
 tyi)e of curves, and while I fully recognize the theoretical elegance and 
 apparent simplicity of the Pearson system, I feel nevertheless that from 
 the point of view of the practical computer the older system as devised 
 by the Scandinavian investigators is to be preferred in comparison with 
 the methods advocated by the followers of the distinguished founder of 
 the English Biometric school. 
 
 *Mr. Carver s able anri intorfslinp; analysis by means of finite diderence equations 
 is, however, to a Kreat extent antcccded by Iho much earlier Danisii memoirs of Opper- 
 man and dram where the finite diirin^nee e(|uation methods arc diseussed.
 
 CHAPTER XVI 
 
 LOGARITHMICALLY TRANSFORMED FREQUENCY 
 FUNXTIONS 
 
 122. Transformation of the Variate. — While it is always 
 possible to express all frequency curves by an expansion in 
 Hermite polynomials, the numerical labor when carried on by 
 the method of least squares often involves a large amount of 
 arithmetical work if we wish to retain more than four or five 
 terms of the series. Other methods lessening the arithmetical 
 work and making the actual calculations comparatively simple 
 have been offered by several authors and notably by Thiele, 
 who in his works discusses several such methods. Among 
 those we may mention the method of the so-called free func- 
 tions and orthogonal substitution, the methods of correlates 
 and the adjustment by elements. The chapters on these 
 methods in Thiele's work are among some of the most import- 
 ant, but also some of the most difficult in the whole theory of 
 observations and have not always been understood and appre- 
 ciated by the mathematicians, chiefly on account of Thiele's 
 peculiar style of writing. A close study of the Danish scholar's 
 investigations is, however, well worth while, and Thiele's work 
 along these lines may still in the future become as epochmaking 
 in the theory of probability as some of the researches of the 
 great Laplace. The theory of infinite determinants as used by 
 M. Fredholm in the solution of integral equations is another 
 powerful tool which offers great advantages in the way of 
 rapid calculation. All these methods require, however, that 
 the student must be thoroughly familiar with the difficult 
 theory upon which such methods rest, and they have for this 
 reason been omitted in an elementary work such as the present 
 treatise. 
 
 We wish, however, to mention another method which in the 
 majority of cases will make it possible to employ the Gram or 
 Laplacean — Charlier curves in cases with extreme skewness 
 or excess. We have here reference to the method of logarithmic 
 transformation of the variate, x. 
 
 235
 
 236 TRANSFORMED FREQUENCY CURVES [124 
 
 123. The General Theory of Transformation. — ^One of 
 the simplest transformations is the previously mentioned linear 
 transformation of the form z^f{x) =ax -(- h, by which we can make 
 two constants, Ci and ci vanish. Other transformations sug- 
 gest themselves, however, such as f{x) =ax--\-bx-\-c, f{x) = V x, 
 f{x) =logx and so forth. For this reason I propose to give a 
 brief developm.ent of the general method of transformations of 
 the statistical variates, mainly following the methods of Char- 
 lier and J0rgensen. 
 
 Stated in its most general form our problem is: If a fre- 
 quency curve of a certain variate is given by F{x) what will be 
 the frequency curve of a certain function of x, say /(a:)? 
 
 The equation of the frequency curve is y^F{x), which 
 means that F{x)dx is the probability that x falls in the interval 
 between x — Yzdix and x-\-'^-?d^- The probability that a new 
 variate z after the transformation z =f{x), ovx(z) =x, falls in the 
 interval z — }4dz and z-\- y^dz is therefore simply 
 
 F[x{z)]x\z)dz = F{x)dx, 
 
 which gives in symbolic form the equation of the transformed 
 frequency curve. 
 
 The frequency for z =f(x) is of course the same as for x. The 
 ordinates of the frequency curve, or rather the areas between 
 corresponding ordinates, are therefore not changed, but the 
 abcissa axis is replaced by f{x). Equidistant intervals of x 
 will therefore not as a rule — except in the linear transformation 
 — correspond to equidistant intervals of /(z). 
 
 If, for instance, the frequency curve F{x) is the Laplacean 
 normal curve 
 
 1 _ j-2 --zcri 
 F(x)= — ^e ■ 
 
 o-V27r 
 
 and if we let z =f(x) =x- or x = \ z, we have evidently 
 
 1 e-^:-2<^^ 
 
 F(z)=— -— 
 
 ayj 2t 2\ Z 
 
 124. Logarithmic Transformation. — Of the various trans- 
 formations the logarit h mlc is of special importance. 1 1 happens 
 that even if the variate ./• forms an extremely skew frequency 
 distribution its logarithms will be nearly normally distributed.
 
 124] LOGARITHMIC TRANSFORMATION 237 
 
 This fact was already noted by the eminent German psycholo- 
 gist, Fechner, and also mentioned by Bruhns in his Kollek- 
 tivmasslehre. But neither Fechner nor Bruhns have given a 
 satisfactory theoretical explanation of the transformation and 
 have limited themselves to use it as a practical rule of thumb. 
 
 Thiele discusses the method under his adjustment by ele- 
 ments, but in a rather brief manner. The first satisfactory 
 theory of logarithmic transformation seems to have been given 
 first by J0rgensen and later on by Wicksell.* J0rgensen first 
 begins with the transformation of the normal Laplacean fre- 
 quency curve. Letting z = logx and bearing in mind that the 
 frequency of x equals that of logx we have 
 
 z =f(x) = logx, or x = x {z) = e' ami dx — e'dz 
 
 The continuous power sums or moments of the rth order 
 around an arbitrary origin take on the form 
 
 /-(- >i 1 / J — m\.2 f^x Ifl'x.li — m\'l 
 
 x'-e -^ " ' dx={n\' 27r)-wJ x'e-^~''~''^ dx 
 
 — X o 
 
 r + ^ -:^('—"lY 
 The change in the lower limit in the second integral from 
 
 — 3c to zero arises simply from the fact that the logarithm of 
 
 zero equals minus infinity and the point — » is thus by the 
 
 transformation moved up to zero. 
 
 By a straightforward transformation (see appendix) we may 
 
 write the above integral as 
 
 M,= e e-'''"dt = Ne 
 
 f^ 
 
 \ 27r 
 
 Changing from moments to semi-variants by means of the 
 well-known relations 
 
 AO — Mo 
 
 \i = Mi:Mo 
 
 l2 - (M 2M0 - M,-) :Mo' 
 
 *Thelaw of errors. leadiiiK to the geometric mean as the most probable value of the 
 vanate as discovered by Sir Donald McAHster in 1879 mav, however, be considered 
 as a toreruiu.er ot Jiirgeiisen's work.
 
 238 TRANSFORMED FREQUENCY CURVES [125 
 
 X3 = (M3Mo=- 3M2MiMo+2Mi^):Mo' 
 
 we have 
 
 Ai — e 
 
 X, = e-"'+-"'\e"'-l) 
 
 These equations give the semi-invariants expressed in terms 
 of m and n. On the other hand if we know the semi-invariants 
 from statistical data or are able to determine these semi-in- 
 variants by a priori reasoning we may find the parameters m 
 and n. 
 
 125. The Mathematical Zero. — A point which we must 
 bear in mind is that the above semi-invariants on account of 
 of the transformation are calculated around a zero point which 
 corresponds to a fixed lower limit of the observations. 
 
 Very often the observations themselves indicate such a lower 
 limit beyond which the frequencies of the variate vanish 
 In the case of persons engaged in factory wj.'k t'i3.'3 is in most 
 countries a well-defined legal age limit below which it is illegal 
 to employ persons for work. Another example is of erad in 
 the number of alpha particles radiated from certain radioa3tive 
 metals. Since the number of particles radiated in a certain 
 interval of time must either be zero or a whole positive number 
 it is evident that —1 must be the lower limit because we can have 
 no negative radiations. Analogous limits exist in the age 
 limit for divorces and in the amount of moneys assessed in the 
 way of income tax. 
 
 The lower limit allows, however, of a more exact mathe- 
 matical determination by means of the following simple con- 
 siderations. It is evident that this lower limit must fall below
 
 126] MATHEMATICAL ZERO 239 
 
 the mean value of the frequency curve. Let us suppose that 
 it is located at a certain point, a, at a distance of yj units from 
 the mean M = /.i{x) — y; = a; and let us furthermore as a be- 
 ginning place the origin at Xi(x), in which case Xi of course 
 equals zero. By shifting the origin to a, which implies a 
 translation of >7 units in negative direction, the original variate 
 (x) is transformed into x + y;^ and Xi will now equal rj while 
 the semi-invariants of higher order remain the same as before 
 the transformation because of the well known relation 
 
 Xr(a: — >:) = XrC^) for r >1 
 
 We may therefore write the previously given relations 
 between the X's and m and n as follows: 
 
 X2 = yi-(e"'-l) or e"' = l+Xo:-,72 
 
 which reduces to /^syj^ — S/^2-yj- — /i'^^ = 0. 
 
 The solution of this cubic equation which has one real and 
 two imaginary roots gives us the value of >: or Xi — a and thus 
 determines the mathematical zero or lower limit. We have in 
 fact: 
 
 n^ = log( 1 + X-i :y;~) and 
 
 w = log -<■ — 1.5«-, while 
 
 126. Logarithmically Transformed Frequency Series. — 
 
 We have already shown that the generalized frequency curve 
 could be written as 
 
 F{x)=c,<f,{x)- -^j +~— gj-+ 
 
 where the Laplacean probability function 
 ■ ^o{x)=^==e ^"^ 
 is the generating function with M and a as its parameters.
 
 240 TRANSFORMED FREQUENCY CURVES [126 
 
 The suggestion now immediately arises to use an analogous 
 series in the case of the logarithmic transformation. In this 
 case the frequency curve, F{x), with a lower limit would be ex- 
 pressed as follows : 
 
 „, , , , . ki'i'iix} , k-i'l-iix) ks^six) . 
 Fix) =koi'o{x) 1 -, 
 
 1! 2! 3! 
 
 while the generating function now is 
 
 nV 2ir 
 
 where m and 7i are the parameters. 
 
 Using the usual definition of semi-invariants we then have 
 
 Xico + Xoj 2 -f-XgajS-f- 
 
 TT 2r ^r SiOJ S2C0- Ssco'' 
 
 o 
 
 The general term on the right hand side integral is of the form 
 
 /» 
 
 o 
 
 where the integral may be evaluted by partial integration as 
 follows: 
 
 e""* ,(a;) dx = e'^'^.-.ix) ] - co J e""*,-i(x) dx 
 
 o ^ o • 
 
 Since both <I>o(x) and all its derivatives are supposed to 
 vanish for x=0 and a: = oo the first term to the right becomes 
 zero and 
 
 By successive integrations we then obtain the following 
 recursion formula
 
 126] LOGARITHMIC TRANSFORMATION 241 
 
 o o 
 
 o 
 
 O O 
 
 Or finally 
 
 o 
 
 Expanding e^'^ in a power series we have 
 
 
 ■V ^TT 
 
 o o 
 
 The general term in this expansion is of the form 
 
 V^2- r! J ' 
 o 
 
 ' x'^e "'■ " -• dx 
 
 which according to the formulas given on page (237) reduces to: 
 (_^)-'e'"''"+'^+'^"'^'-+'^V:r! 
 Hence we may write 
 
 r = 00 
 
 / e^'^^J,x)dx = {-o)y y'jB ' ^ '^'- ' ^ ' co'':r! 
 
 o r =0 
 
 Consequently the relation between the semi-invariants and 
 the frequency function 
 
 Fix) =kofo{x) -^\4^x(x) +^'*2(x) --%3(x) + . . . .
 
 242 TRANSFORMED FREQUENCY CURVES [126 
 
 can be expressed by the following recursion formula 
 -If —2!" "31 , Sioj 820;- S?a) 
 
 V =0 s = o r = o 
 
 The constants k are here expressed in terms of the unadjusted 
 moments or power sums, s. It is readily seen that the Sheppard 
 corrections for adjusted moments, M, also apply in this case. 
 We are, therefore, able to write down the values of the k's from 
 the above recursion formula in the following manner 
 
 M4 = A:4e'"+^'^"' +4^:36-'"+-"' +6A:2e^'"+*-^"' +4ii:ie'*'"+^'^' 
 
 It is easy to see that it is not possible to determine the gener- 
 ating function's parameters m and n from the observations. 
 These parameters like M and rr in the case of .the Laplacean 
 normal probability curve must be chosen arbitrarily. If 
 m and n are selected so as to make A;i and kz vanish we have 
 
 M2 = A;oe=''"+*-^"' 
 the solution of which gives 
 
 ^n2 ^ M.M, ,^ _ _M7_ ^ Mo'M2 
 
 while
 
 127] PARAMETERS DETERMINED BY LEAST SQUARES 243 
 
 This theory requires the computation of a set of tables of the 
 generating function 
 
 1 irlog x—mT^ 
 
 and its derivatives. For ^\\){x) itself we may of course use the 
 ordinary tables for the normal curve <i?o(2:) when we consider 
 
 log x — m 
 
 z = 
 
 n 
 
 I have calculated a set of tables of the derivatives of <to(x) 
 and hope to be able to publish the manuscript thereof in the 
 second volume of this treatise. 
 
 127. Parameters Determined by Least Squares. — The 
 
 above development is based upon the theory of functions and 
 the theory of definite integrals. We shall now see how the 
 same problem may be attacked by the method of least squares 
 after we have determined by the usual method of moments the 
 values of m and n in the generating function <^o(2:). 
 
 Viewed from this point of vantage our problem may be 
 stated as follows: 
 
 Given an arbitrary frequency distribution, of the variate z 
 with z = ([og X — m) m and where x is reckoned from a zero 
 point or origin a, which is situated r, units below the mean and 
 defined by the relation 
 
 r]^\:i — Zy]'-\'f = \2, where -n^Xv-a; 
 
 to develop F{z) into a frequency series of the form 
 
 F{z) =k^<Pi){z)+k:i^:i{z)-\-kA<Pi{z)+ . . , .-\-kn<Pn(z), 
 
 where the k's must be determined in such a way that the ex- 
 pression 
 
 i = n 
 
 gives the best approximation to F{z) in the sense of the method 
 of least squares. 
 
 Stated in this form the frequency function is reduced to the 
 ordinary series of Gram or the A type of the Charlier series, 
 already treated in the earlier chapters.
 
 244 TRANSFORMED FREQUENCY CURVES [128 
 
 128. Application to Graduation of a Mortality Table. — 
 
 As an illustration of the theory to a practical problem we pre- 
 sent the following frequency distribution by 5-year age intervals 
 of the number of deaths (or -c?,. by quinquennial grouping) 
 in the recently published American-Canadian Mortality of 
 Healthy Males, based on a radix of 100,000 entrants at age 15. 
 
 Frequency Distribution of Deaths by Attained Ages in 
 American-Canadian Mortality Table 
 
 Ages 
 
 ^dx 
 
 1st Component 
 
 2d Component 
 
 15- 19 
 
 1,801 
 
 120 
 
 1,681 
 
 20- 24 
 
 1,996 
 
 230 
 
 1,766 
 
 25- 29 
 
 2,089 
 
 440 
 
 1,649 
 
 30- 34 
 
 2,120 
 
 790 
 
 1,330 
 
 35- 39 
 
 2,341 
 
 1.370 
 
 971 
 
 40- 44 
 
 2.911 
 
 2,270 
 
 641 
 
 45- 49 
 
 3,937 
 
 3,570 
 
 367 
 
 50- 54 
 
 5,527 
 
 5,400 
 
 127 
 
 55- 59 
 
 7,723 
 
 7,722 
 
 1 
 
 60- 64 
 
 10,383 
 
 10,383 
 
 
 65- 69 
 
 12,987 
 
 12,987 
 
 
 70- 74 
 
 14.535 
 
 14,535 
 
 
 75- 79 
 
 13,807 
 
 13,807 
 
 
 80- 84 
 
 10,328 
 
 10,328 
 
 
 85- 89 
 
 5,464 
 
 5,464 
 
 
 90- 94 
 
 1,757 
 
 1,757 
 
 
 95- 99 
 
 278 
 
 278 
 
 
 00-104 
 
 16 
 
 16 
 
 
 100,000 91,467 8,533 
 
 The curve represented by the c/.^ column is evidently a com- 
 posite frequency function compounded of several series. From 
 a purely mathematical point of view the compound curve may 
 be considered as being generated in an infinite number of ways 
 as the summation of separate component frequency curves. 
 From the point of view of a practical graduation it is, however, 
 easy to break this compound death curve up into two separate 
 components. A mere glance at the d^ curve itself suggests a 
 major skew frequency curve with a maximum point somewhere 
 in the age interval from 70 75 and minor curve (practically 
 one-sidedj for the younger ages. 
 
 Let us therefore break the -dj. column up into the two so far 
 perfectly arbitrary parts as shown in the above table and then 
 try to fit those two distributions to logarithmically transformed 
 A curves.
 
 128] GRADUATION OF A MORTALITY TABLE 245 
 
 Starting with the first component the straightforward com- 
 putation of the semi-invariants is given in the table below with 
 the provisional mean chosen at age 67. 
 
 Frequexcy Distribution of Deaths in Ameriean Mortality Table 
 
 First Component 
 
 Ages X F(x) xF(x) xiF(x) x^F(x) 
 
 104-100 
 
 — 7 
 
 16 
 
 112 
 
 784 
 
 5,488 
 
 99- 95 
 
 - 6 
 
 278 
 
 1,668 
 
 10,008 
 
 60,048 
 
 94- 90 
 
 — 5 
 
 1,757 
 
 8,785 
 
 43,925 
 
 219,625 
 
 89- 85 
 
 - 4 
 
 5,464 
 
 21.856 
 
 87,424 
 
 349,696 
 
 84- 80 
 
 - 3 
 
 10,328 
 
 30,984 
 
 92,952 
 
 278,856 
 
 79- 75 
 
 - 2 
 
 13.807 
 
 27,614 
 
 55,228 
 
 110,456 
 
 74- 70 
 
 - 1 
 
 14,535 
 
 14,535 
 
 14,535 
 
 14,535 
 
 69- 65 
 
 - 
 
 12,987 
 
 
 
 
 
 
 
 
 V 
 
 59,172 
 
 105,554 
 
 304,856 
 
 1,038,704 
 
 64- 60 
 
 + 1 
 
 10.383 
 
 10,383 
 
 10.383 
 
 10,383 
 
 59- 55 
 
 + 2 
 
 7.723 
 
 15,446 
 
 30,892 
 
 61,784 
 
 54- 50 
 
 + 3 
 
 5,400 
 
 16.200 
 
 48,600 
 
 145,800 
 
 49- 45 
 
 + 4 
 
 3.570 
 
 14,280 
 
 57,120 
 
 228,480 
 
 44- 40 
 
 + 5 
 
 2,270 
 
 11,350 
 
 56,750 
 
 283,750 
 
 39- 35 
 
 + 6 
 
 1,370 
 
 8.220 
 
 49,320 
 
 295,920 
 
 34- 30 
 
 + 7 
 
 790 
 
 5.530 
 
 38,710 
 
 270,970 
 
 29- 25 
 
 + 8 
 
 440 
 
 3.520 
 
 28,160 
 
 225,280 
 
 24- 20 
 
 + 9 
 
 230 
 
 2.070 
 
 18,630 
 
 167.670 
 
 19- 15 
 
 + 10 
 
 120 
 
 1.200 
 
 12,000 
 
 120.000 
 
 
 V 
 
 ^ 
 
 32,296 
 
 88.199 
 
 350.565 
 
 1,810,037 
 
 sr 91,468 -17.355 655,421 771,333 
 
 Computing the semi-invariants by means of the usual for- 
 mulas in paragraph 104, we have: 
 
 /.,= -17355:91468= -0.18974,or mean at age 67+5(0.19) or at 
 
 age 67.95 
 
 /.2 = 655421 :91468 - /-i'- = 7.1296 
 
 Xa = 771333:91468 -3>-,>-+2X,'' = 12.4981 
 
 In order to determine the mathematical zero or the origin 
 we have to solve the following cubic: 
 
 /'^i'/:'* — o'/.-rc = X2'\ or 
 
 12.489-/:' - 152.511-/:- = 362.47 
 
 the positive root of which is equal to 12.39. The zero point 
 is therefore found to be situated 12.39 5-year units from the 
 mean or at age 67.95 + 5(12.39), i. e. very nearly at age 130,
 
 246 TRANSFORMED FREQUENCY CURVES [129 
 
 which we henceforth shall select as the origin of the co-ordinate 
 system of the first component. We have furthermore 
 
 12.39 = e"'^^-^"', and 1.1296 = e~"''--"'\e''' -1) =(12.39)-^(e"'-l), 
 
 the solution of which gives rz- = 0.04436, ?i = 0.2106, m = 2.4504, 
 all on the basis of a 5-year interval as unit. If we wish to 
 change to a single calendar year unit we must add the natural 
 logarithm of 5, or 1.6094, to the above value of m, which gives us 
 7;; =4.0598, while n remains the same. The above computa- 
 tions furnish us with the necessary material for the logarithmic 
 transformation of the variate x which now may be written as 
 
 2 = [log (130 -x)- 4.0598] :0.2106, 
 
 where x is the original variate or the age at death. 
 
 Having thus accomplished the logarithmic transformation 
 we may henceforth write the generating function as 
 
 1 lr log(130-x)-4.05 9812 -i 
 
 ^ (^\ — ^L 0.2106 J /^\ -.-^^-^ 
 
 .2106V 27r ^ V27r 
 
 We express now F{x) by the following equation. 
 
 Fix) =ko'io{x) -\-k:i^:i(x) -\-h^i{x) -{- .... 
 
 or in terms of the transformed z: 
 
 <piz) =h)(P){z) -{-hip-iiz) ^ki<Pi(z) -It . . . . , 
 
 and proceed to determine the numerical values of k by the 
 method of least squares. 
 
 129. Formation of Observation Equations. — The values 
 of <po{z) and its 3rd and 4th derivatives may be written down 
 directly from the tables of J0rgensen or Charlier for various 
 values of z as shown in detail in the following scheme on 
 the following pages.* 
 
 <l) 
 
 (-') 
 
 {'■'.) 
 
 (4) 
 
 (5) 
 
 (6) 
 
 (7) 
 
 (8) 
 
 (0) 
 
 Agf 
 
 z 
 
 (fltlZI 
 
 ip:i(z) 
 
 <P4,(Z) 
 
 koVi) 
 
 A-;i(4) 
 
 ^•4(5) 
 
 Fiiz) 
 
 1.5 
 
 +:i.257 
 
 + .0020 
 
 -.0491 
 
 + .1029 
 
 + 14 
 
 + 10 
 
 - 1 
 
 23 
 
 
 
 :^.2\:^ 
 
 .002:', 
 
 .0.537 
 
 .10S4 
 
 17 
 
 10 
 
 1 
 
 20 
 
 7 
 
 :',.\7() 
 
 .002(; 
 
 .O.ISG 
 
 .1140 
 
 19 
 
 12 
 
 I 
 
 30 
 
 8 
 
 3.127 
 
 .oo:]o 
 
 .0()37 
 
 .1208 
 
 22 
 
 14 
 
 1 
 
 3.-) 
 
 *The values of z, co-ordinate wit li lliose of x arc computed for ail integral values of 
 X from 1.5 to 100 in a(-cordanre with the previously established relation, viz. ^=|log 
 (130-z)-4.0(i»8l:0.2106. All logarithms on base e.
 
 129] 
 
 OBSERVATION EQUATIONS 
 
 247 
 
 (1) 
 
 (2) 
 
 (3) 
 
 (4) 
 
 (5) 
 
 (0) 
 
 (7) 
 
 (8) 
 
 (9) 
 
 Agex 
 
 z 
 
 ^o(^) 
 
 (^3(2) 
 
 ^^4(2) 
 
 Ao(3) 
 
 A;3(4) 
 
 /b4(5) 
 
 Fi(z) 
 
 9 
 
 3.085 
 
 .0034 
 
 .0688 
 
 .1249 
 
 25 
 
 15 
 
 
 39 
 
 20 
 
 3.041 
 
 .0039 
 
 .0744 
 
 .1290 
 
 29 
 
 16 
 
 
 44 
 
 1 
 
 2.999 
 
 .0044 
 
 .0803 
 
 .1331 
 
 32 
 
 17 
 
 
 48 
 
 2 
 
 2.955 
 
 .0051 
 
 .0850 
 
 .1361 
 
 38 
 
 18 
 
 
 57 
 
 3 
 
 2.911 
 
 .0057 
 
 .0919 
 
 .1382 
 
 42 
 
 19 
 
 
 60 
 
 4 
 
 2.866 
 
 .0065 
 
 .0981 
 
 .1391 
 
 48 
 
 21 
 
 
 68 
 
 25 
 
 2.821 
 
 .0074 
 
 .1044 
 
 .1390 
 
 54 
 
 22 
 
 
 75 
 
 6 
 
 2.776 
 
 .0085 
 
 .1104 
 
 .1367 
 
 63 
 
 24 
 
 
 86 
 
 7 
 
 2.730 
 
 .0096 
 
 .1168 
 
 .1328 
 
 71 
 
 25 
 
 
 86 
 
 8 
 
 2.683 
 
 .0110 
 
 .1229 
 
 .1264 
 
 81 
 
 26 
 
 
 10() 
 
 9 
 
 2.637 
 
 .0123 
 
 .1286 
 
 .11.39 
 
 91 
 
 27 
 
 
 117 
 
 30 
 
 2..587 
 
 .0140 
 
 .1340 
 
 .1072 
 
 103 
 
 28 
 
 
 130 
 
 1 
 
 2..542 
 
 .0150 
 
 .1387 
 
 .0943 
 
 116 
 
 29 
 
 
 144 
 
 2 
 
 2.494 
 
 .0178 
 
 .1420 
 
 .0763 
 
 131 
 
 30 
 
 
 160 
 
 3 
 
 2.445 
 
 .0201 
 
 .1462 
 
 .0.376 
 
 149 
 
 31 
 
 
 179 
 
 4 
 
 2.396 
 
 .0226 
 
 .1486 
 
 .0340 
 
 166 
 
 32 
 
 
 
 19S 
 
 35 
 
 2.346 
 
 .0255 
 
 .1496 
 
 + .0039 
 
 188 
 
 32 
 
 - 
 
 220 
 
 6 
 
 2.296 
 
 .0286 
 
 .1489 
 
 -.0275 
 
 210 
 
 32 
 
 + 
 
 242 
 
 7 
 
 . 2.245 
 
 .0320 
 
 .1464 
 
 .0622 
 
 236 
 
 31 
 
 1 
 
 268 
 
 8 
 
 2.193 
 
 .0360 
 
 .1423 
 
 .0983 
 
 265 
 
 30 
 
 1 
 
 296 
 
 9 
 
 2.142 
 
 .0402 
 
 .1393 
 
 .1399 
 
 296 
 
 29 
 
 1 
 
 326 
 
 40 
 
 2.089 
 
 .0450 
 
 .1281 
 
 .1864 
 
 331 
 
 27 
 
 2 
 
 260 
 
 1 
 
 2.036 
 
 .0502 
 
 .1170 
 
 .2355 
 
 369 
 
 25 
 
 2 
 
 39() 
 
 2 
 
 1.982 
 
 .0559 
 
 .1030 
 
 .2875 
 
 411 
 
 22 
 
 3 
 
 43(> 
 
 3 
 
 1.928 
 
 .0621 
 
 .0859 
 
 .3412 
 
 452 
 
 18 
 
 3 
 
 47S 
 
 4 
 
 1.873 
 
 .0690 
 
 .0656 
 
 .3965 
 
 508 
 
 14 
 
 4 
 
 526 
 
 45 
 
 1.822 
 
 .0757 
 
 .0442 
 
 .4474 
 
 557 
 
 9 
 
 4 
 
 570 
 
 6 
 
 1.762 
 
 .0845 
 
 -.0156 
 
 ..3060 
 
 622 
 
 + 3 
 
 5 
 
 630 
 
 7 
 
 1.704 
 
 .0934 
 
 + .01.34 
 
 ..3596 
 
 687 
 
 - 3 
 
 6 
 
 690 
 
 8 
 
 1.W7 
 
 .1028 
 
 .0487 
 
 .6082 
 
 7.38 
 
 10 
 
 6 
 
 7.54 
 
 9 
 
 1.589 
 
 .1129 
 
 .0853 
 
 .6419 
 
 832 
 
 18 
 
 6 
 
 820 
 
 50 
 
 1.529 
 
 .1239 
 
 .12.55 
 
 .()893 
 
 913 
 
 27 
 
 7 
 
 893 
 
 1 
 
 1.471 
 
 .1352 
 
 .1599 
 
 .7132 
 
 994 
 
 34 
 
 
 967 
 
 2 
 
 1.409 
 
 .1479 
 
 .2114 
 
 .7349 
 
 1,089 
 
 45 
 
 
 1.051 
 
 3 
 
 1.348 
 
 .1609 
 
 .2565 
 
 .7430 
 
 1,185 
 
 54 
 
 
 1,138 
 
 4 
 
 1.286 
 
 .1745 
 
 .3022 
 
 .7307 
 
 1,288 
 
 63 
 
 
 1,231 
 
 55 
 
 1.224 
 
 .1886 
 
 .3467 
 
 .7062 
 
 1,391 
 
 74 
 
 
 1,324 
 
 6 
 
 1.160 
 
 .2035 
 
 .3907 
 
 .6642 
 
 1.501 
 
 83 
 
 
 1,425 
 
 7 
 
 1.095 
 
 .2190 
 
 .4320 
 
 .6037 
 
 1,612 
 
 92 
 
 6 
 
 1,526 
 
 8 
 
 1.030 
 
 .2347 
 
 .4688 
 
 .5257 
 
 1,730 
 
 99 
 
 5 
 
 1,636 
 
 9 
 
 0.963 
 
 .2509 
 
 .5008 
 
 .4180 
 
 1,847 
 
 106 
 
 4 
 
 1,745 
 
 60 
 
 0.896 
 
 .2672 
 
 .5257 
 
 .2911 
 
 1,965 
 
 112 
 
 3 
 
 1,856 
 
 1 
 
 0.828 
 
 .2832 
 
 .5426 
 
 .1831 
 
 2,083 
 
 115 
 
 2 
 
 1,970 
 
 2 
 
 0.758 
 
 .2994 
 
 .5489 
 
 -.03.30 
 
 2,201 
 
 lit) 
 
 + 
 
 2,085 
 
 3 
 
 0.689 
 
 .3146 
 
 ..3474 
 
 + .1187 
 
 2,318 
 
 116 
 
 - 1 
 
 2,201 
 
 4 
 
 0.617 
 
 .3298 
 
 .5329 
 
 .2839 
 
 2,428 
 
 113 
 
 3 
 
 2,312
 
 248 
 
 TRANSFORMED FREQUENCY CURVES 
 
 [129 
 
 (1) 
 
 (2) 
 
 (3) 
 
 (4) 
 
 (5) 
 
 Agtv 
 
 r 
 
 (^oU) 
 
 (fai-) 
 
 ^4(Z) 
 
 65 
 
 0.543 
 
 .3443 
 
 .5056 
 
 .4537 
 
 () 
 
 0.470 
 
 .3572 
 
 .4666 
 
 .6156 
 
 7 
 
 0.396 
 
 .3689 
 
 .4152 
 
 .7686 
 
 8 
 
 0.319 
 
 .3792 
 
 .3505 
 
 .9098 
 
 9 
 
 0.243 
 
 .3873 
 
 .2768 
 
 1 .02()2 
 
 70 
 
 0.164 
 
 .3937 
 
 .1918 
 
 1.117t) 
 
 1 
 
 .084 
 
 .3975 
 
 .0999 
 
 1.17.")7 
 
 2 
 
 +0.000 
 
 .3989 
 
 + .0119 
 
 1.1968 
 
 3 
 
 -0.080 
 
 .3976 
 
 -.0952 
 
 1.1777 
 
 4 
 
 0.164 
 
 .3937 
 
 .1918 
 
 1.1176 
 
 75 
 
 0.249 
 
 .3868 
 
 .2829 
 
 1.0180 
 
 () 
 
 0.348 
 
 .3755 
 
 .3762 
 
 .8592 
 
 7 
 
 0.425 
 
 .3645 
 
 .4368 
 
 .7043 
 
 8 
 
 0.516 
 
 .3493 
 
 .4912 
 
 .5146 
 
 9 
 
 0.608 
 
 .3316 
 
 .5303 
 
 .3069 
 
 80 
 
 0.702 
 
 .3118 
 
 .5502 
 
 + .0892 
 
 1 
 
 0.798 
 
 .2902 
 
 .5473 
 
 -.1204 
 
 2 
 
 0.896 
 
 .2672 
 
 .5257 
 
 .3130 
 
 3 
 
 0.996 
 
 .2436 
 
 .48,59 
 
 .4380 
 
 4 
 
 1.098 
 
 .2185 
 
 .4302 
 
 .5899 
 
 85 
 
 1.203 
 
 .1934 
 
 .3614 
 
 .6943 
 
 6 
 
 1.309 
 
 .1694 
 
 .2854 
 
 .7358 
 
 7 
 
 1.418 
 
 .1460 
 
 .2048 
 
 .7340 
 
 8 
 
 1.529 
 
 .1240 
 
 .1255 
 
 .(5893 
 
 <) 
 
 1.644 
 
 .1034 
 
 -.0505 
 
 .6106 
 
 !K) 
 
 1.7(>2 
 
 .0845 
 
 + .01.56 
 
 .5060 
 
 I 
 
 1 .882 
 
 .0679 
 
 .0693 
 
 .3874 
 
 2 
 
 2.004 
 
 .0536 
 
 .1090 
 
 .2663 
 
 3 
 
 2.132 
 
 .0397 
 
 .1380 
 
 .1485 
 
 4 
 
 2.260 
 
 .0310 
 
 .1478 
 
 -.0483 
 
 95 
 
 2.393 
 
 .0227 
 
 .1477 
 
 + .0325 
 
 () 
 
 2..-)30 
 
 .0163 
 
 .1399 
 
 .0905 
 
 7 
 
 2.()73 
 
 .0100 
 
 .1207 
 
 .1295 
 
 8 
 
 2.S21 
 
 .0074 
 
 .1044 
 
 .1386 
 
 9 
 
 2.9»).S 
 
 .0050 
 
 .0842 
 
 .1353 
 
 100 
 
 -3.124 
 
 .0028 
 
 .0640 
 
 .1203 
 
 (6) (7) (8) (9) 
 
 A-o(3) k3{4) M5) Fi(z) 
 
 2,532 107 5 2,420 
 
 2,627 99 6 2,522 
 
 2,716 88 8 2,620 
 
 2,789 74 9 2,706 
 
 2.848 59 10 2,779 
 
 2,900 41 11 2,848 
 
 2,929 21 12 2,896 
 
 2,937 - 3 12 2,922 
 
 2,929 + 20 12 2,937 
 
 2,900 41 11 2,930 
 
 2,848 60 10 2,898 
 
 2,767 80 9 2,848 
 
 2,686 93 7 2,772 
 
 2,569 104 5 2,668 
 
 2,444 112 3 2,553 
 
 2,296 117 - 1 2,412 
 
 2,134 116 + 1 2,251 
 
 1,965 112 3 2,080 
 
 1,788 103 4 1,895 
 
 1,612 91 9 1J09 
 
 1,420 77 7 1,504 
 
 1,244 60 7 1,311 
 
 1,075 43 7 1,122 
 
 913 27 7 947 
 
 758 +11 6 775 
 
 622 - 3 5 624 
 
 500 15 4 489 
 
 394 23 3 374 
 
 292 29 1 264 
 
 228 31+0 197 
 
 167 31 - 135 
 
 120 30 1 89 
 
 76 26 1 49 
 
 54 22 1 29 
 
 37 18 1 18 
 
 21 14 1 6 
 
 Since the original observations of d^ are given in 5-year age 
 intervals it becomes necessary to sum the numerical values of 
 ipi,{z) and its derivatives by quinquennial age groupings so as to 
 form the required observation equations. We find thus for 
 instance in the age interval 55-59 the following observation 
 equation (the summation to take place from x = 55 to x = 59) 
 
 hl<pi,(z) +hlip:i{z) +ki^(p4(z) =^ip{z) =0^, or 
 1.0967A:o-h2.1390fc3 -2.9178^4 = 7722
 
 129] 
 
 OBSERVATION EQUATIONS 
 
 249 
 
 Similar equations are formed for the other age intervals, re- 
 sulting in the following tabular representation of the coefficients 
 to the various k's and the observed values of the frequency dis- 
 tribution, (p(z) 
 
 TABLE I 
 
 Tabular Arrangements of Numerical Data in the Observation 
 
 Equations 
 
 Ages 
 
 (po 
 
 «P3 
 
 <P4 
 
 
 
 15-19 
 
 .0130 
 
 - .2930 
 
 + .5730 
 
 120 
 
 20-24 
 
 .0256 
 
 - .4305 
 
 + .5758 
 
 230 
 
 25-29 
 
 .0488 
 
 - .5833 
 
 + .6508 
 
 440 
 
 30-34 
 
 .0903 
 
 - .7104 
 
 + .3694 
 
 790 
 
 35-39 
 
 .1623 
 
 - .7265 
 
 - .3240 
 
 1,370 
 
 40-44 
 
 .2822 
 
 - .4996 
 
 -1.4471 
 
 2,270 
 
 45-49 
 
 .4673 
 
 + .0896 
 
 -2.7631 
 
 3,570 
 
 50-54 
 
 .7424 
 
 + 1.0555 
 
 -3.6111 
 
 5,400 
 
 55-59 
 
 1.0967 
 
 +2.1390 
 
 -2.9178 
 
 7,722 
 
 60-64 
 
 1.4942 
 
 +2.6975 
 
 - .1066 
 
 10,383 
 
 65-69 
 
 1.8369 
 
 +2.0147 
 
 +3.7739 
 
 12,987 
 
 70-74 
 
 1.9814 
 
 + .0166 
 
 +5.7854 
 
 14,.535 
 
 75-79 
 
 1.8077 
 
 -2.1174 
 
 +3.6030 
 
 13,807 
 
 80-84 
 
 1.3307 
 
 -2.5393 
 
 -1.3721 
 
 10,328 
 
 85-89 
 
 .7362 
 
 -1.0276 
 
 -3.4(>40 
 
 5,464 
 
 90-94 
 
 .2729 
 
 .4.571 
 
 -1.3925 
 
 1,757 
 
 95-99 
 
 .0609 
 
 .5714 
 
 .6125 
 
 278 
 
 100- 
 
 .0068 
 
 .1714 
 
 .3890 
 
 16 
 
 From the above table we notice that we have 18 observation 
 equations from which to determine the three unknown para- 
 meters ko, h and ki. The number of equations being greater 
 than the number of unknowns we make use of the method of 
 least squares. While a direct application of this principle of 
 course is feasible, it will, however, be found easier to start with 
 an approximate solution for kn, ki and A-) and then apply the 
 method of least squares. It will be found that in the three age 
 intervals 6L-69, 70-74 and 75-79 where the observations are most 
 numerous the observations will be approximately satisfied by 
 the following preliminary values of k, viz. : 
 
 A:io = 7300, A;'3= -340 and kU= -50. 
 
 Multiplying the above values of k with their respective 
 columns in Table I, or in other words forming the products 
 A:Vo, kh<p A and k^ft, we obtain a new table of the following form :* 
 
 *l.asl figure omitted.
 
 250 TRANSFORMED FREQUENCY CURVES [129 
 
 
 
 TABLE 
 
 II 
 
 
 Control 
 Column 
 
 Ages 
 
 a 
 
 b 
 
 c 
 
 
 
 s 
 
 15- 19 
 
 10 
 
 10 
 
 - 3 
 
 - 12 
 
 5 
 
 20- 24 
 
 19 
 
 15 
 
 - 3 
 
 - 23 
 
 8 
 
 25- 29 
 
 36 
 
 20 
 
 - 3 
 
 - 44 
 
 9 
 
 30- 34 
 
 66 
 
 24 
 
 - 2 
 
 - 79 
 
 9 
 
 35- 39 
 
 119 
 
 25 
 
 2 
 
 - 137 
 
 9 
 
 40- 44 
 
 206 
 
 17 
 
 7 
 
 - 227 
 
 3 
 
 45- 49 
 
 343 
 
 - 3 
 
 14 
 
 - 357 
 
 - 3 
 
 50- 54 
 
 542 
 
 -36 
 
 18 
 
 - 540 
 
 -16 
 
 55- 59 
 
 801 
 
 -73 
 
 15 
 
 - 772 
 
 -29 
 
 60- 64 
 
 1,091 
 
 -92 
 
 1 
 
 -1038 
 
 -38 
 
 65- 69 
 
 1,341 
 
 -69 
 
 -19 
 
 -1299 
 
 -46 
 
 70- 74 
 
 1,446 
 
 - 1 
 
 -29 
 
 - 1454 
 
 -38 
 
 75- 70 
 
 1,320 
 
 72 
 
 -18 
 
 -1381 
 
 - 7 
 
 80- 84 
 
 971 
 
 86 
 
 7 
 
 -1033 
 
 31 
 
 85- 90 
 
 537 
 
 35 
 
 17 
 
 - 546 
 
 43 
 
 90- 94 
 
 202 
 
 -16 
 
 7 
 
 - 176 
 
 17 
 
 95- 99 
 
 45 
 
 -20 
 
 - 3 
 
 - 28 
 
 - 6 
 
 100-104 
 
 5 
 
 - 6 
 
 - 2 
 
 2 
 
 - 5 
 
 9,100 -12 6 -9148 -54 
 
 The formation of the sum-products [aa], [ah], .... [ao], 
 
 [66], [6c], [bo] and [cc], [co] proceeds now in routine manner as 
 shown in the following tables: 
 
 TABLE III 
 
 [aa] [ah] [ac] [ao] [as] 
 
 100 100 - 30 - 120 50 
 
 3()1 285 - 57 - 437 152 
 
 1,296 720 - lOS - 1,584 324 
 
 4,350 1,584 - 132 - 5,214 594 
 
 14,161 2,975 23S - 16,303 1,071 
 
 42,436 3,.-)02 1,442 - 46,762 618 
 
 117,649 - 1,029 4,802 - 122,451 - 1,029 
 
 293,764 - 19,512 9,756 - 292,680 - 8,672 
 
 641.601 - 5S,473 12,015 - 618,372 - 23,229 
 
 1,190,281 -100,372 1,091 -1,132.458 - 41,458 
 
 1,798,2S1 - <»2,.-)29 -25,479 - 1,741. 9.")9 - 61,686 
 
 2,090.916 - 1.116 -41,934 -2,1()2,4S4 - 54,948 
 
 1,742.400 95,010 -23,760 -l.,S22,92() - 9,240 
 
 942,841 83,5()(i (j,797 -1,003.043 30,101 
 
 288,369 18,795 9,129 - 293,202 23,091 
 
 40,804 - 3,232 1,414 - 35,552 3,434 
 
 2,025 - 900 - 135 - 1,260 - 270 
 
 25 - 30 10 - 10 - 25 
 
 9,211,666 - 71,016 -44,961 -9,236,811 -141,122
 
 129] OBSERVATION EQUATIONS 251 
 
 TABLE IV 
 
 [66j 
 
 
 lbc\ 
 
 
 \bo] 
 
 
 [b,\ 
 
 100 
 
 
 - 30 
 
 
 - 120 
 
 
 50 
 
 225 
 
 
 - 45 
 
 
 - 345 
 
 
 120 
 
 400 
 
 
 - 60 
 
 
 - 880 
 
 
 ISO 
 
 576 
 
 
 - 48 
 
 
 - 1,896 
 
 
 216 
 
 625 
 
 
 50 
 
 
 - 3,425 
 
 
 225 
 
 289 
 
 
 119 
 
 
 - 3,859 
 
 
 51 
 
 9 
 
 
 - 42 
 
 
 1,071 
 
 
 9 
 
 1,296 
 
 
 - 648 
 
 
 19,440 
 
 
 576 
 
 5,329 
 
 
 -1,095 
 
 
 56,356 
 
 
 2,117 
 
 8,464 
 
 
 - 92 
 
 
 95,496 
 
 
 3,496 
 
 4,761 
 
 
 1,311 
 
 
 89,631 
 
 
 3,174 
 
 1 
 
 
 29 
 
 
 1.454 
 
 
 38 
 
 5,184 
 
 
 -1,296 
 
 
 -99,432 
 
 
 - 504 
 
 7,396 
 
 
 602 
 
 
 -88,838 
 
 
 2,66() 
 
 1,225 
 
 
 595 
 
 
 -19.110 
 
 
 1,505 
 
 256 
 
 
 - 112 
 
 
 2.S16 
 
 
 - 272 
 
 400 
 
 
 60 
 
 
 560 
 
 
 120 
 
 36 
 
 
 12 
 
 
 12 
 
 
 ;;o 
 
 36,572 
 
 [cc] 
 
 9 
 
 9 
 
 9 
 
 4 
 
 4 
 
 49 
 
 196 
 
 324 
 
 225 
 
 1 
 
 361 
 
 841 
 
 324 
 
 49 
 
 289 
 
 49 
 
 9 
 
 4 
 
 - 690 
 
 TABLE V 
 
 \co] 
 
 36 
 
 69 
 
 132 
 
 158 
 
 - 274 
 
 - 1.5S9 
 
 - 4,998 
 
 - 9,720 
 -11,580 
 
 - 1,038 
 24,681 
 42,166 
 24,858 
 
 - 7,231 
 
 - 9,282 
 
 - 1,232 
 
 14 
 4 
 
 48,931 
 
 ["•1 
 
 - 15 
 
 - 24 
 
 - 27 
 
 - 18 
 IS 
 21 
 
 - 42 
 
 - 288 
 
 - 435 
 
 - 38 
 874 
 
 1,102 
 126 
 217 
 731 
 119 
 18 
 10 
 
 13,797 
 
 2,756 45,244 2.34.) 
 
 From the above tables we may now write down the following 
 scheme for the solution of the normal equations by means of 
 the Gaussian algorithmus
 
 252 
 
 TRANSFORMED FREQUENCY CURVES 
 
 [129 
 
 TABLE VI 
 
 Scheme for tiik Solution of Normal Equations 
 
 921,107 - 7102 - 4496 -923681 
 
 55 
 3657 
 
 35 
 
 69 
 
 22 
 
 276 
 
 7122 
 4893 
 4508 
 4525 
 
 -.00771 
 3602 
 
 -.00488 
 
 - 104 
 
 3 
 
 254 
 
 -1.00273 
 
 - 2229 
 
 64 
 16 
 
 
 -.02887 
 
 - .61882 
 
 251 
 
 48 
 
 Solving for the unknowns we have now 
 
 r4= 48:251 -.19123 
 
 rs = .61882 - (.19123) ( - .02887) = .62434 
 
 ro = 1.00273 - (.62434) ( - .00771) - (.19123) ( - .00488) 
 = 1.00847 
 
 We therefore find the following numerical values for the 
 probable values of ku, k.i and k\ 
 
 A-o-U-'o = (1.00847) (7300) = 7,361.8 
 k, = r,k', = (.62434)1-340) = -212.2 
 A;., = r.,A;'4 = (.19123)(- 50) = - 9.6 
 
 The next step is then to form the three columns A:ii(yro( 2), 
 k3(p:iiz) and k^ip^iz) for the individual ages from 15 and upwards. 
 The formation of -k,if,(z) gives us finally the separate values 
 by integral ages of the first component curve or Fi{.r). 
 
 If we now subtract this component from the originally ob- 
 served values of the compound curve, d,., we obtain the follow- 
 ing values (arranged in quinquennial age groups) for the second 
 component, Fii{x).
 
 129] OBSERVATION EQUDTIONS 25i 
 
 Ages /''ii(x) 
 
 0- 4 ;■■)() ) 
 
 5- 9 440 V Hvpothotical Values 
 
 10 14 1,210 J 
 
 15-19 l,()r)0 
 
 20-24 1,719 
 
 25-29 l.()10 
 
 30-34 1,309 
 
 35-39 989 
 
 40-44 715 
 
 45-59 473 
 
 50-54 247 
 
 55-59 67 
 
 10,479 
 
 It is of course possible to fit this particular curve type directly 
 to a logarithmically transformed Gram or Charlier A type of 
 frequency curves, although this will require 5 or 6 terms of the 
 series. But still greater obstacles would be encountered if we 
 were to attempt a graduation by means of Pearsonian curve 
 types. The goal can, however, be reached quite readily by the 
 introduction of a certain hypothetical device. Any reader 
 familiar with the various types of frequency curves will readily 
 notice that the above frequency distribution of Fuix) repre- 
 sents a truncated curve from which the curve segment corres- 
 ponding to ages below 15 has been eliminated. We may there- 
 fore substitute the following, so far hypothetical values for the 
 missing curve segment. For ages 0-4 the value of 50, for ages 
 5-9 the value of 440, and for the ages 10-14 the value of 1,210. 
 The thus reconstructed histograph (shown in the above tablej 
 may now be fitted to a logarithmically transformed Gram or 
 Charlier curve in the usual routine manner. The computations 
 of the relative moments w,. result in the following values, for 
 a provisional origin at the age of 17 and a unit interval of 5 years 
 
 ?Wi = 1.8380, m, = 8.6342, ms =39.8630 
 From which we find 
 ;.i = 1.838, X.2 = 5.2560, A.3=4.5824. (Mean at age 26.19.) 
 
 The equation 
 
 • ^r''-3?.2V = V 
 then becomes 
 
 4.582-/:-' -82.876>:' = 145.200, 
 
 therealrootof which is r = 19.0 (on basis of a 5 year unit.)
 
 254 
 
 TRANSFORMED FREQUENCY CURVES 
 
 [129 
 
 We furthermore find that ?? =0.120 and w= 2.9227+%, 5 = 
 4.5321, which finally brings about the transformation of the 
 variate x by means of the formula 
 
 z = [loge(a:+68.8) -4.532] :0.12 
 
 where x is expressed in unit intervals of 1 year. 
 
 The further determination of the coefficients ^n, h and k^ by 
 means of the method of least squares results in the values: 
 
 fco = 947.4, A:;, = -63.4 and A:4=-30.0. Multiplying these 
 values with their respective values oifu(z), <f:i(z) and ip-iiz) and 
 forming the corresponding sums we finally obtain the second 
 component curve. 
 
 Sooo 
 
 
 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 zaoo 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 \ 
 
 
 
 
 
 
 
 s*oo 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 \ 
 
 
 
 
 
 
 
 ZAoo 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 \ 
 
 
 
 
 
 
 azoo 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 t 
 
 \ 
 
 Z ooo 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 \ 
 
 
 
 
 
 
 1 fioo 
 fa>oo 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 l>*oo 
 
 
 
 
 
 
 
 
 y 
 
 r 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 I zoo 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 / 
 
 \ 
 
 looo 
 
 
 
 
 
 
 
 J 
 
 7 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 > 
 
 ^ 
 
 
 
 
 
 
 
 
 ) 
 
 
 
 
 
 
 
 — -_ 
 
 
 
 -^ 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 — - 
 
 ■--_ L/ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 V, 
 
 
 
 
 
 "^T '~"~~~ 
 
 — - 
 
 
 
 I 
 
 
 
 
 
 
 
 \ 
 
 s^ 
 
 
 
 
 ^ 
 
 
 
 
 
 
 
 \5 2o as 3o 3S >io .^s So ss <&o <&s 7o 75 ao ss -30 05 loo ,«s.<s,e-:s>. 
 
 FIGURE 3 
 
 Diagram showing Rraduation of <\s column in tiie .1 .l/('') table by a compound frequency curve of the 
 
 (irain-Cluirlicr ty[K'.s. 
 
 The sum of h\[x) and Fuix) as shown on page 255 and also 
 in the figure gives us the final compound frequency curve or 
 the d^ curve, from which it now is a simple matter to form the 
 l^ or irfj column and its co-ordinated column of r/^.. 
 
 Graduation of Ameiiican Male Mortality Table (AM(5)) by means 
 OF a Compound Frequency Curve 
 
 Age 
 
 FKz) 
 
 Fii(x) 
 
 dx 
 
 h 
 
 lOOOgr 
 
 15 
 
 21 
 
 302 
 
 323 
 
 100532 
 
 3.21 
 
 6 
 
 26 
 
 319 
 
 345 
 
 100209 
 
 3.44 
 
 7 
 
 30 
 
 332 
 
 362 
 
 998G-4 
 
 3.62
 
 129] MORTALITY TABLES 255 
 
 Age 
 
 F/d) 
 
 Fri{x) 
 
 (is 
 
 z; 
 
 lOOOg^ 
 
 8 
 
 35 
 
 342 
 
 377 
 
 99502 
 
 3.79 
 
 9 
 
 39 
 
 350 
 
 389 
 
 99125 
 
 3.92 
 
 20 
 
 44 
 
 354 
 
 398 
 
 98736 
 
 4.03 
 
 1 
 
 48 
 
 354 
 
 402 
 
 98338 
 
 4.11 
 
 2- 
 
 57 
 
 352 
 
 409 
 
 97936 
 
 4.18 
 
 3 
 
 60 
 
 349 
 
 409 
 
 97527 
 
 4.19 
 
 4 
 
 68 
 
 343 
 
 411 
 
 97118 
 
 4.23 
 
 25 
 
 75 
 
 336 
 
 411 
 
 96707 
 
 4.25 
 
 16 
 
 86 
 
 327 
 
 412 
 
 96296 
 
 4.28 
 
 7 
 
 95 
 
 317 
 
 413 
 
 95884 
 
 4.31 
 
 8 
 
 106 
 
 308 
 
 414 
 
 95471 
 
 4.33 
 
 9 
 
 117 
 
 297 
 
 414 
 
 95057 
 
 4.36 
 
 30 
 
 130 
 
 285 
 
 415 
 
 94643 
 
 4.38 
 
 1 
 
 144 
 
 275 
 
 419 
 
 94228 
 
 4.45 
 
 2 
 
 IGO 
 
 261 
 
 421 
 
 93809 
 
 4.49 
 
 3 
 
 179 
 
 249 
 
 428 
 
 93388 
 
 4.58 
 
 4 
 
 198 
 
 238 
 
 436 
 
 92960 
 
 4.69 
 
 35 
 
 220 
 
 226 
 
 446 
 
 92524 
 
 4.82 
 
 6 
 
 242 
 
 213 
 
 455 
 
 92078 
 
 4.94 
 
 7 
 
 2G8 
 
 201 
 
 469 
 
 91633 
 
 5.12 
 
 8 
 
 296 
 
 188 
 
 484 
 
 91154 
 
 5.31 
 
 9 
 
 323 
 
 175 
 
 501 
 
 90070 
 
 5.53 
 
 40 
 
 360 
 
 164 
 
 524 
 
 90169 
 
 5.81 
 
 1 
 
 396 
 
 152 
 
 548 
 
 89645 
 
 6.11 
 
 2 
 
 436 
 
 141 
 
 577 
 
 S9097 
 
 6.47 
 
 3 
 
 478 
 
 128 
 
 606 
 
 8S520 
 
 6.80 
 
 4 
 
 526 
 
 117 
 
 643 
 
 87914 
 
 7.32 
 
 45 
 
 570 
 
 107 
 
 677 
 
 87271 
 
 7.76 
 
 6 
 
 630 
 
 96 
 
 726 
 
 86594 
 
 8.39 
 
 7 
 
 690 
 
 87 
 
 777 
 
 85868 
 
 9.05 
 
 8 
 
 754 
 
 78 ■ 
 
 832 
 
 85091 
 
 9.7S 
 
 9 
 
 820 
 
 69 
 
 889 
 
 84259 
 
 10.55 
 
 50 
 
 893 
 
 61 
 
 954 
 
 83370 
 
 11.44 
 
 1 
 
 967 
 
 53 
 
 1020 
 
 82416 
 
 12.37 
 
 2 
 
 lor.i 
 
 47 
 
 1098 
 
 81396 
 
 13.49 
 
 3 
 
 1138 
 
 41 
 
 1179 
 
 80298 
 
 14.6S 
 
 4 
 
 1231 
 
 36 
 
 1267 
 
 79119 
 
 16.01 
 
 55 
 
 1324 
 
 30 
 
 1354 
 
 77852 
 
 17.39 
 
 6 
 
 1425 
 
 25 
 
 1450 
 
 76498 
 
 18.95 
 
 7 
 
 1526 
 
 52 
 
 1548 
 
 75048 
 
 20.62 
 
 8 
 
 1636 
 
 18 
 
 1654 
 
 73500 
 
 22.41 
 
 9 
 
 1745 
 
 15 
 
 1760 
 
 71846 
 
 24.54 
 
 60 
 
 1856 
 
 12 
 
 1868 
 
 70086 
 
 26.65 
 
 1 
 
 1970 
 
 11 
 
 1981 
 
 682 IS 
 
 29.04 
 
 2 
 
 2085 
 
 9 
 
 2092 
 
 66237 
 
 31.59 
 
 3 
 
 2201 
 
 7 
 
 2208 
 
 64145 
 
 34.42 
 
 4 
 
 2312 
 
 6 
 
 2318 
 
 61937 
 
 37.41 
 
 65 
 
 2420 
 
 5 
 
 2425 
 
 59619 
 
 40.67
 
 256 TRANSFORMED FREQUENCY CURVES fl29 
 
 Age 
 
 F/{i) 
 
 Fii{x) 
 
 dT 
 
 Ix 
 
 lOOOgi 
 
 66 
 
 2522 
 
 4 
 
 2526 
 
 57194 
 
 43.62 
 
 7 
 
 2020 
 
 3 
 
 2623 
 
 54668 
 
 47.97 
 
 8 
 
 2700 
 
 2 
 
 2708 
 
 52045 
 
 52.02 
 
 9 
 
 2779 
 
 2 
 
 2781 
 
 49337 
 
 56.37 
 
 70 
 
 2848 
 
 1 
 
 2849 
 
 46556 
 
 61.19 
 
 1 
 
 2896 
 
 
 
 2896 
 
 43707 
 
 66.25 
 
 2 
 
 2922 
 
 
 
 2922 
 
 40811 
 
 71.60 
 
 3 
 
 2937 
 
 
 
 2937 
 
 37889 
 
 77.51 
 
 4 
 
 2930 
 
 
 
 2930 
 
 [34952 
 
 83.93 
 
 75 
 
 2898 
 
 
 
 2898 
 
 32022 
 
 90.51 
 
 6 
 
 2848 
 
 
 
 2848 
 
 29124 
 
 97.80 
 
 7 
 
 2772 
 
 
 
 2772 
 
 26276 
 
 105.48 
 
 8 
 
 2662 
 
 
 
 2662 
 
 23504 
 
 113.53 
 
 9 
 
 2553 
 
 
 
 2553 
 
 20836 
 
 122.50 
 
 80 
 
 2412 
 
 
 
 2412 
 
 18283 
 
 131.95 
 
 1 
 
 2251 
 
 
 
 2251 
 
 15871 
 
 141.84 
 
 2 
 
 2080 
 
 
 
 2080 
 
 13620 
 
 152.76 
 
 3 
 
 1895 
 
 
 
 1895 
 
 11540 
 
 164.21 
 
 4 
 
 1709 
 
 
 
 1709 
 
 9645 
 
 177.19 
 
 85 
 
 1504 
 
 
 
 1504 
 
 7936 
 
 189.52 
 
 6 
 
 1311 
 
 
 
 1311 
 
 6432 
 
 203.82 
 
 7 
 
 1125 
 
 
 
 1125 
 
 5121 
 
 219.68 
 
 8 
 
 947 
 
 
 
 947 
 
 3996 
 
 236.99 
 
 9 
 
 775 
 
 
 
 775 
 
 3049 
 
 254.18 
 
 90 
 
 624 
 
 
 
 624 
 
 2274 
 
 274.41 
 
 1 
 
 489 
 
 
 
 489 
 
 1650 
 
 296.36 
 
 2 
 
 374 
 
 
 
 374 
 
 1161 
 
 322.14 
 
 3 
 
 264 
 
 
 
 264 
 
 787 
 
 335.45 
 
 4 
 
 197 
 
 
 
 197 
 
 523 
 
 376.67 
 
 95 
 
 135 
 
 
 
 135 
 
 326 
 
 414.11 
 
 6 
 
 89 
 
 
 
 89 
 
 191 
 
 465.97 
 
 7 
 
 49 
 
 
 
 49 
 
 102 
 
 480.39 
 
 8 
 
 29 
 
 
 
 29 
 
 53 
 
 547.16 
 
 9 
 
 18 
 
 
 
 18 
 
 24 
 
 780.00 
 
 100 
 
 6 
 
 
 
 6 
 
 6 
 
 1000.00 
 
 It will be of interest to compare these latter values with the 
 original values of q^ as derived by Mr. Henderson's graduation. 
 Such a comparison is shown in the appended table for quin- 
 quennial ages. 
 
 Ages lleudorson's gx Fisher's g-i 
 
 15 3.46 3.21 
 
 20 3.92 4.03 
 
 25 4.31 4.25 
 
 30 4.46 4.38 
 
 35 4.78 4.82
 
 130] ADDITIONAL EXAMPLES 257 
 
 Ages Henderson's qx Fisher's gx 
 
 40 5.84 5.81 
 
 45 7.94 7.76 
 
 50 11.58 11.44 
 
 55 17.47 ■ 17.39 
 
 60 26.68 26.65 
 
 65 40.66 40.67 
 
 70 61.47 61.19 
 
 75 91.94 90.51 
 
 80 135.74 131.95 
 
 85 197.07 189.52 
 
 90 280.35 274.41 
 
 95 387.76 414.11 
 
 100 562.50 1000.00 
 
 I think that every unbiased critic will admit that there exists 
 a satisfactory agreement between the two tables in spite of the 
 fact that we have worked throughout with basic data in 5-year 
 age groups. Moreover, the actual arithmetical work in the 
 case of the gi'aduation by means of compounded Gram or 
 Charlier curves is much simpler than the usual methods of 
 graduation by Makeham's formula and mechanical inteipola- 
 tion formulas as employed by Mr. Henderson.* Another point 
 speaking in favor of the frequency curve graduation is that our 
 resulting functions are continuous functions for which standard 
 tables of definite integrals have been prepared. It is therefore 
 possible to use the elegant and continuous method originally 
 introduced by Mr. Woolhouse in the computation of premiums 
 and policy values. Unfortunately this is not the place to treat 
 this interesting phase of the question, although we may in pass- 
 ing it mention that a gi-aduation of the kind as here presented 
 in practical computations of policy values and premiums is 
 even easier to work with than the renowned graduation formula 
 by Makeham, especially in the case of life contingencies involv- 
 ing 2 or more lives. 
 
 130. Additional Examples. — As another illustration I pre- 
 sent the following frequency distribution (arranged in groups of 
 3-year intervals) of the ages of a group of 19,274 male em- 
 ployees of the Bell System of the American Telephone and 
 
 *I do not wish to imply those remarks as a criticism of the able graduation by 
 Henderson, however.
 
 258 
 
 TRANSFORMED FREQUENCY CURVES 
 
 [130 
 
 Telegraph Company, which most kindly has been furnished 
 to me through the courtesy of this company. 
 
 Age Distribution of Male Employees in the Bell System 
 
 Ages 
 
 X 
 
 F(x) 
 
 Ages 
 
 X 
 
 nx) 
 
 13-15 
 
 
 
 1 
 
 46-48 
 
 11 
 
 380 
 
 16-18 
 
 1 
 
 9 
 
 49-51 
 
 12 
 
 272 
 
 19-21 
 
 2 
 
 745 
 
 52-54 
 
 13 
 
 186 
 
 22-24 
 
 3 
 
 2,264 
 
 55-57 
 
 14 
 
 141 
 
 25-27 
 
 4 
 
 3,828 
 
 58-60 
 
 15 
 
 110 
 
 28-30 
 
 5 
 
 3,801 
 
 61-63 
 
 16 
 
 72 
 
 31-33 
 
 6 
 
 2,711 
 
 64-66 
 
 17 
 
 43 
 
 34-36 
 
 7 
 
 1,918 
 
 67-69 
 
 18 
 
 17 
 
 37-39 
 
 8 
 
 1,339 
 
 70-72 
 
 19 
 
 14 
 
 40-42 
 
 9 
 
 884 
 
 73-75 
 
 20 
 
 3 
 
 43^5 
 
 10 
 
 533 
 
 76-78 
 
 21 
 
 2 
 
 Choosing the provisional lower limit at age 14 we find the 
 following values for the crude moments or power sums s. 
 
 So = 19,274, 81 = 112,363, s. =794,771, §3=6,790,761 
 
 The values of the semi-invariants are 
 
 Xi = 5.830, Xo = 7.2478, X3=27.4191. 
 
 • The resulting cubic expansion is therefore 
 
 27.419>7''-157.592>72= 380.731 
 
 for which the solution is >7 = 6.1185. . 
 
 We have furthermore 
 
 6.1185 =6™'^ ■•"'"' 
 
 7.2478 = e-'"'-'"'(e"'-l), or 
 
 W' = 0.1768, w = 0.4205 and ?w = 1.5462 
 
 On the basis of an interval of one year we have therefore: 
 
 z = [log(x) - 13.1) -2.6451 :0.421* 
 
 as the value of the variate in the generating function (^0(2;). 
 
 We have Tn= 1.54G2 + log 3 = 2.645.
 
 130] 
 
 ADDITIONAL EXAMPLES 
 
 259 
 
 The values of ko, h and ki as determined by the method of 
 least squares are A:o = 3064.4, k3=A5.1, A;4 = 80.5, on basis of one 
 year interval. 
 
 A comparison between the calculated and observed values 
 (the latter being shown by single ages) is given in the attached 
 diagram, which evidently is satisfactory for all practical pur- 
 poses. I wish here to mention that an attempt by some of the 
 
 FIGURE 4 
 
 Diagram showing comparison between observed and theoretical frequency distribution of active 
 
 group of male employees of the Bell System. 
 
 statistical assistants of the A. T. & T. Co. to fit the above data 
 by means of the Pearsonian curves proved futile. Personally 
 I have not as yet made an attempt to verify this negative 
 result. 
 
 As a final illustration we quote from J0rgensen's monograph 
 an application of the logarithmic transformaion of the pre- 
 viously discussed observations on the number of petal flowers 
 in Ranunculus Bulbosus. Since the variate in this instance is 
 integral, the obsei-\' ations themselves clearly indicate that there 
 must be a lower limit, or biological zero so to speak, at 4 petal 
 flowers. 
 
 The crude moments are then 
 
 5o = 222, 5i = 362, 5-2 = 794, from which we obtain
 
 260 TRANSFORMED FREQUENCY CURVES [130 
 
 m =0.0440 
 n =0.5445 
 ko= 183.2, 
 
 so that the formula reads 
 
 1 SQ O 1 Hog (X— 4)— .044-12 
 
 ^(^)._l^,-2i 0.5445 J 
 
 .5445 V 27r 
 
 The detailed calculations according to this formula are shown 
 below : 
 
 (1) (2) (3) (4) (5) 
 
 lognat(x-4) log(x-4)-m (2):n <Po(2) Fix) 
 
 .0000 - .0440 - .0810 .3989 131.8 
 
 .6932 + .6492 +1.1926 .1965 66.1 
 
 1.0986 1.0546 1.9373 .0612 20.6 
 
 1.3863 1.3423 2.4658 .0191 6.4 
 
 1.6094 1.5654 2.8756 .0064 2.2 
 
 1.7918 1.7478 3.2107 .0024 0.8 
 
 A closer fit could of course be had by adding additional terms 
 to the series, but even with one term the agreement between 
 calculated and observed values is quite satisfactory for all 
 practical purposes.
 
 CHAPTER XVII 
 
 Frequency Curves and Their Relation to the Bernoullian Series 
 
 131. The Bernoullian Series. — In Chapter IX it was 
 shown that the general term 
 
 in the point binomial ip-{-Qy, where p is the a priori probability 
 for the happening of an event £" in a single trial, represents 
 the probability that E will happen x times and the comple- 
 mentary event, E, s — x times. We also found that the maxi- 
 mum term in the Bernoullian expansion of the point binomial 
 could be written as: 
 
 1 
 
 T„^ = —= 
 
 V 2Trspq. 
 
 when s is a large number. 
 
 We wish now also to find a more simple expression for the 
 general term, <p(x), instead of the laborious expression involving 
 factorials of high order. 
 
 It is evident that (fix) represents a frequency function of an 
 integral variate x which can assume all positive integral values 
 from to s, and which satisfies the property of all frequency 
 curves that 
 
 Z^(:*^) = Z(^)pY-^=(p+9r 
 
 = 1 
 
 We may therefore write (f{x) in the form of a Gram-Charlier 
 frequency series as 
 
 (p(x) ='Xci<fi{x) for i = 0,S, 4, 5 . . . 
 
 This involves the computation of the semi-invariants \(x) 
 
 for r = 0,3, 4, 5 
 
 By the definition of the semi-invariants we have: 
 
 Soe ■ "■ 3(^(a:) =2X^(a:)e'"^, 
 
 •2QL
 
 262 FREQUENCY CURVES AND BERNOULLIAN SERIES [131 
 
 where a; = 0, 1, 2, 3 s and l(f(x) = (p+qy = 1, or 
 
 Xjto I X2a>8 I X3C03 . 
 
 soe^ "' " " "=.^(0) + <p(l)e"+c(2)e-" + . . .+^(s)e"- 
 
 which for CO =0 reduces to So = (p-\-qY = 1. 
 
 Taking the logarithm on both sides of the above equation 
 we have 
 
 Now it is easily seen that 
 
 ^i,/(co)=^~j or Dl/(a)) (pe'^+q)^spe^ 
 
 from which we find 
 
 Dlf{o^)q = pe-[s-Dlf{o^)] 
 
 Dlmq = pe-[s-Dlf{o^) -Dlfic,)] 
 
 Dlmq = pe-[s-DU{o^) -2 D:,/(a;) -D^/(a;)] 
 
 I>'/(co)9 = pe'^[s-DL/(a;) -3 D:,/(co) -3 DiJ{<^) - DtKo^)] 
 
 where 
 
 d^lV. ' 2! ' 3! 
 Letting co = we have therefore successively 
 
 T^nrr \ rf"|^CO Xoco=^ ?..C0' 
 
 'uq = p{S-\i) 
 
 'f~2q = V{s-'h\-'K2) 
 
 \zq ^p{s- Ai - 2X2 - A3) 
 
 A49 = pis - Xi - 3X2 - 3X3 - X4)
 
 131] THE BERNOULLIAN SERIES 263 
 
 or 
 
 11 = sp 
 
 X2 = spq = (t2 
 
 'k3=spq{q-p) 
 
 >v4=spg(l— G pq) 
 
 The generating function <fo{x) in the Gram-CharUer series 
 may hence be written as 
 
 fTV27r ^ 2Tr spq 
 
 while the coefficients c.j and d of the third and fourth derivatives 
 of f o(x) according to the formulas from paragraphs 113 and 114 
 take on the form 
 
 (-1)3 _ 
 
 C3= — ^^spq{q-p) = -{yj spq) \q-p):V. 
 
 (-1)4 
 
 C4= — —spq{l-Qpq)={spq) ^{.1-Qpq)-A\ 
 
 which serve as measures for skewness and excess. 
 
 Since p and q are proper fractions whose product never can 
 exceed \i it is readily seen that for large values of s both c?, and 
 Ci will become very small quantities unless either p or q should 
 be so small that the product sp (or sq) itself would be small 
 even when s is a large number, a case which we presently shall 
 discuss in detail. 
 
 Apart from this exception the expression for the frequency 
 function 
 
 F{x)=(fo{x)-\-C?iip3{x)-\-Ci(Pi{x)+ . . . 
 
 approaches therefore the normal probability curve of Laplace 
 whenever s is a large number. 
 
 When, on the other hand, s in the point binomial (p-^-qY is 
 not a large number both c.{ and d play an important role as the 
 necessary correction factors. (For p = q = y^ all the semi- 
 invariants of uneven order vanish.)
 
 264 FREQUENCY CURVES AND BERNOULLIAN SERIES [131 
 
 The normal form of the point binomial, or ^o(a:), was already 
 established by Laplace who also worked out a more accurate 
 expression for skew binomials, which expression can be shown 
 to represent the two terms <f^>[x) and C:nf3{x) 
 
 As an illustration of the above formulas we shall now try to 
 express a few Bernoullian point binomials by means of a 
 Gram-Charlier series. 
 
 Let us for instance try to express (.05 + . 95)'"" by a Gram- 
 Charlier series. We have in this case the following values for 
 the parameters 
 
 Xi = 5.0, VXo = a=2.1795, 6-3= -0.0688, C4 = 0.00625 
 
 The substitution of these values in the Gram-Charlier series 
 results in the following relative frequency distribution: 
 
 X 
 
 ^(.r) 
 
 X 
 
 Fix) 
 
 
 
 .0084 
 
 8 
 
 .0614 
 
 1 
 
 .0312 
 
 9 
 
 .0343 
 
 2 
 
 .0763 
 
 10 
 
 .0179 
 
 3 
 
 .1356 
 
 11 
 
 .0081 
 
 4 
 
 .1812 
 
 12 
 
 .0031 
 
 5 
 
 .1865 
 
 13 
 
 .()()()9 
 
 6 
 
 .1522 
 
 14 
 
 .0003 
 
 7 
 
 .1028 
 
 15 
 
 .0001 
 
 A similar calculation in the case of the Bernoullian binomic^i 
 r0.1+0.9;'"» gives 
 
 X, = 10, a = 3, C3 = - .0445, c^ = 0.0021 
 
 with the following distribution: 
 
 X 
 
 Fix) 
 
 X 
 
 Fix) 
 
 
 
 .0000 
 
 12 
 
 .0984 
 
 1 
 
 .0004 
 
 13 
 
 .0732 
 
 2 
 
 .0020 
 
 14 
 
 .0502 
 
 3 
 
 .0065 
 
 15 
 
 .0322 
 
 4 
 
 .0162 
 
 16 
 
 .0194 
 
 5 
 
 .03:i3 
 
 17 
 
 .0109 
 
 6 
 
 .0581 
 
 18 
 
 .0058 
 
 7 
 
 .0,S75 
 
 19 
 
 .0027 
 
 8 
 
 .1145 
 
 20 
 
 .0012 
 
 9 
 
 .1318 
 
 21 
 
 .0005 
 
 10 
 
 .1338 
 
 22 
 
 .0002 
 
 11 
 
 .1211 
 
 23 
 
 .OOOl
 
 132; poisson's exponential 265 
 
 We shall presently have occasion to compare these distribu- 
 tions with those obtained from a direct expansion of the point 
 binomial. 
 
 132. Poisson's Exponential. The Law of Small Num- 
 bers. — In certain statistical series it frequently happens that 
 the semi-invariants of higher order than zero all are equal, or 
 that 
 
 Ai = Ao = A3 = . . • . . = A^ =^ A. 
 
 We shall for the present limit our discussion to homograde 
 series where the variate is always positive and integral, and 
 where therefore the definition of the semi-invariants is of the 
 form:. 
 
 = ^(0)eO-+^.(l)ei-+^.(2)e--+^(3)e^''^+ . . . , or 
 
 e^' "' '^' =e-V'''" = iV(a:)e""for x = 0, 1,2,3 ... , 
 
 which also can be written as 
 
 (Xp^ A2^2w \ 
 l+-Y7+^r+ ; =f(0)l+v^(l)e'^+c(2)/'^+ 
 
 The coefficient of e'"'^ gives the relative frequency or the 
 probability for the occurrence of x = r, and we henceforth find 
 that 
 
 (r(x)=xp(r)= j— 
 
 This is the famous Poisson Exponential, so called after the 
 French mathematician, Poisson, who first derived this expres- 
 sion in his Recherches sur la Prohabilite des jugesments, but in 
 an entirely different manner than the one we have indicated 
 above. 
 
 The Poisson Exponential opens now a way to the treatment 
 of the point binomial in the exceptional cases where the product 
 sp (or sq) is small even when s is a very large number, or when 
 more strictly speaking the expression 
 
 lim sp = X 
 where ?» is a finite number.
 
 266 FREQUENCY CURVES AND BERNOULLI AN SERIES [132 
 
 Under such conditions p (or q) must approach zero and its 
 complementary probability q (or p) must approach unity as 
 their limiting values. 
 
 The expressions for the semi-invariants as given in paragraph 
 131, i. e. 
 
 ^2 = spq 
 
 ^3 = spq(q-p) 
 
 ?.i=spq{l-6pq) 
 
 will under these conditions all approach the limit sp, and the 
 general term in the Bernoullian expansion of the point binomial 
 can therefore be expressed by means of the Poisson exponen- 
 tial. 
 
 In all cases where the semi-invariants of various orders hap- 
 pen to be equal, or very nearly equal, the formula by Poisson 
 will be preferable in place of the more general expansion by the 
 Gram-Charlier series. 
 
 As an illustration we may select the simple binomial (.001 + . 
 .999) ""^ where the semi-invariants have the following values: 
 
 ?.i=0.1, Xo = 0.0999, As = 0.099702, X, = 0.0994006, 
 
 and therefore may be considered as being nearly equal. 
 
 The general term, (f{x), in this particular point binomial can 
 therefore be written as a Poisson exponential of the form: 
 
 ,^(x)=,/.(r)=e-o-^0.1^:r! for r = 0, 1,2,3 .... 
 
 The Russian statistician, Bortkewitsch, has given in his in- 
 teresting and scholarly brochure Das Gesetz der kleinen Zahlen 
 (1898) a four decimal place table of the Poisson exponential 
 e~^ /l'':r ! for values of X from 0,1 to 10.0. The English biometri- 
 cian, Soper, in 1914 published a 6 decimal place table from 
 X = 0.1 to z*^ = 15.0. This table is found in Pearson's well-known 
 Tables for Biometricians. For the above mentioned Bernoul- 
 lian point binomial (0.0014-0.999)^'"', corresponding to the 
 Poisson exponential e""" 'O.l'ir!, we find from Soper's table the 
 following values of \l/{r).
 
 132] poisson's exponential 267 
 
 r 
 
 xjjir) 
 
 
 
 .904837 
 
 1 
 
 .090484 
 
 2 
 
 .004524 
 
 3 
 
 .000151 
 
 4 
 
 .000004 
 
 While the exponential of Poisson requires theoretically at 
 least, that the semi-invariants must all be of the same magni- 
 tude, it will, however, often be found that this exponential will 
 give a fair approximation to the true observed values of the 
 frequency curve in cases where the semi-invariants Xi, X2, X3, 
 X4 . . .do not differ greatly from each other. In this connec- 
 tion it is of interest to compare the fits of Poisson's exponential 
 and the Gram-Charlier series with the true values in the binom- 
 ial expansion in the three examples we have given above. 
 Through the courteous efforts of my translator and co-editor. 
 Miss C. Dickson, the three point binomials (0.001+0.999)'°'', 
 (0.05+0.95)1°'' and (0.10+0.90)""' have been expanded directly 
 and the results as compared with the forms of Poisson and of 
 Gram-Charlier are shown in the following tables: 
 
 Values of <fix) ix Various Point Binomials 
 TABLE I (0.001+0.999)100 
 
 
 X 
 
 
 Binomial 
 
 
 
 Poisson 
 
 
 
 
 
 
 .9048 
 
 
 
 .9048 
 
 
 
 1 
 
 
 .0906 
 
 
 
 .0905 
 
 
 
 2 
 
 
 .0045 
 
 
 
 .0045 
 
 
 
 3 
 
 
 .0001 
 
 
 
 .0002 
 
 
 
 4 
 
 
 .0000 
 
 
 
 .0000 
 
 
 
 
 TABLE II (0.05+0 
 
 .95)100 
 
 
 
 X 
 
 
 Binomial 
 
 
 Gram-Charlier 
 
 
 Poisson 
 
 
 
 
 .0059 
 
 
 
 .0084 
 
 
 .0067 
 
 1 
 
 
 .0312 
 
 
 
 .0312 
 
 
 .0337 
 
 2 
 
 
 .0812 
 
 
 
 .0763 
 
 
 .0842 
 
 3 
 
 
 .1396 
 
 
 
 .1356 
 
 
 .1404 
 
 4 
 
 
 .1781 
 
 
 
 .1812 
 
 
 .1755 
 
 5 
 
 
 .1800 
 
 
 
 .1865 
 
 
 .1755 
 
 6 
 
 
 .1500 
 
 
 
 .1522 
 
 
 .1462 
 
 7 
 
 
 .10(50 
 
 
 
 .1028 
 
 
 .1044 
 
 8 
 
 
 .0()49 
 
 
 
 .0614 
 
 
 .0653 
 
 9 
 
 
 .0349 
 
 
 
 .0343 
 
 
 .0363 
 
 10 
 11 
 12 
 
 
 .0167 
 .0072 
 .0028 
 
 
 
 .0179 
 .0081 
 .0031 
 
 
 .0181 
 .0082 
 .0034 
 
 13 
 
 
 .0003 
 
 
 
 .0009 
 
 
 .0013 
 
 14 
 
 
 .0001 
 
 
 
 .0001 
 
 
 .0005
 
 268 FREQUENCY CURVES AND BERNOULLIAN SERIES [132 
 
 TABLE ill (0.1+0.9)100 
 
 X 
 
 Binomial 
 
 Gram-Charlier 
 
 Poisson 
 
 
 
 .0001 
 
 .0000 
 
 .0000 
 
 1 
 
 .0003 
 
 .0004 
 
 .0005 
 
 2 
 
 .0016 
 
 .0020 
 
 .0023 
 
 3 
 
 .0059 
 
 .0065 
 
 .0076 
 
 4 
 
 .0159 
 
 .0162 
 
 .0189 
 
 5 
 
 .0339 
 
 .0333 
 
 .0378 
 
 6 
 
 .0596 
 
 .0581 
 
 .0630 
 
 7 
 
 .0889 
 
 .0875 
 
 .0901 
 
 8 
 
 .1148 
 
 .1145 
 
 .1125 
 
 9 
 
 .1304 
 
 .1318 
 
 .1251 
 
 10 
 
 .1319 
 
 .1339 
 
 .1251 
 
 11 
 
 .1199 
 
 .1211 
 
 .1137 
 
 12 
 
 .0988 
 
 .0984 
 
 .0948 
 
 13 
 
 .0743 
 
 .0732 
 
 .0729 
 
 14 
 
 .0513 
 
 .0502 
 
 .0521 
 
 15 
 
 .0327 
 
 .0322 
 
 .0347 
 
 16 
 
 .0193 
 
 .0194 
 
 .0217 
 
 17 
 
 .0106 
 
 .0109 
 
 .0128 
 
 18 
 
 .0054 
 
 .0058 
 
 .0071 
 
 19 
 
 .0026 
 
 .0027 
 
 .0037 
 
 20 
 
 .0012 
 
 .0012 
 
 .0019 
 
 21 
 
 .0005 
 
 .0005 
 
 .0009 
 
 22 
 
 .0002 
 
 .0002 
 
 .0004 
 
 23 
 
 .0000 
 
 .0000 
 
 .0002 
 
 These tables need no further explanation and demonstrate 
 the great graduating ability of the Posison function in the 
 case of point binomials. For although the Gram-Charlier 
 functions give better results in the last example, we must on 
 the other hand not forget that we had to determine 4 para- 
 meters, X,, a, C3 and d, while Poisson's exponential requires only 
 the determination of the single parameter X. The great draw- 
 back of the Poisson exponential lies in the fact that it is a dis- 
 crete function, which exists only for positive integral values of 
 the variate. It therefore does not lend itself so readily to in- 
 tegration as the Laplacean probability function. 
 
 It is the great achievement of the eminent Swedish astrono- 
 mer and statisitican, Charlier, to have been the first to attempt 
 to find a continuous function possessing the same powerful and 
 flexible characteristics as the Poisson exponential. Charlier 
 has introduced as a generating function a certain curve type 
 which is expressed by the following formula:
 
 133] THE LAW OF SMALL NUMBERS 269 
 
 TT 
 
 \P(x) =4 m,n(^) = / e'^^'^^cos [nsino)—xo)]do^ 
 
 o 
 
 Using the above expression as a generating function Charlier 
 has shown that any frequency curve can be expressed by the 
 series 
 
 F(x)=V^(x)-^-A^(x)+|jA^i//(x)-^jAV(x)+ 
 
 At the time when Charlier introduced this function in the 
 theory of frequency curves in a little pamphlet Weiteres uher das 
 Fehlergesetz he gave only a rather short sketch of the method. 
 He and the Danish actuary, J0rgensen, have, however, of late 
 years further developed the method so as to make it useful in 
 practical computations. J0rgensen has in this respect done some 
 very neat work in supplying a method for determining the con- 
 stants, k, in the above series by means of semi-invariants 
 
 We intend to treat these investigations in the forthcoming 
 second volume of this treatise. In the meantime, however, 
 students who encounter skew frequency distributions in their 
 work will in nearly all cases be able to overcome the practical 
 difficulties of a graduation by means of the logarithmic trans- 
 formation of the variate, treated in the earlier chapters of this 
 book. 
 
 133. The Law of Small Numbers. — In the case of integral 
 variates we wish, however, to call the reader's attention to cer- 
 tain properties of the Poisson exponential, or probability func- 
 tion, which have been apparently overlooked or misunderstood 
 by several writers, especially among the English biometricians. 
 Somehow or other the impression among those writers has been 
 that the frequency curve or probability function of Poisson is 
 invariably connected with the expansion of the point binomial. 
 Thus for instance Mr. Yule in his well-known Introduction to 
 the Theory of Statistics treats it as such, while Lucy Whitaker 
 in an article entitled "On the Poisson Law of Small Numbers" 
 in Biometrika for April, 1914, subjects the whole theorem to 
 a scathing criticism. We quote the following sentence from 
 Miss Whitaker 's article: 'Tt might be supposed, although 
 erroneously, that the Poisson-Exponential formula was capable
 
 270 FREQUENCY CURVES AND BERNOULLIAN SERIES [133 
 
 of great accuracy in addition to its great simplicity. But this 
 is to neglect the fundamental assumptions on which it is based, 
 namely : 
 
 (1) That the data actually correspond to a binomial 
 
 (2) That in the binomial q is small and n large." 
 
 It is true that Poisson in deriving his formula started from 
 the BernouUian point binomial so as to meet the cases where 
 the normal probability curve of Laplace failed to give a close 
 approximation, that is in the case of the limiting value when 
 either p or q becomes very small, but s is large enough so that 
 sp or sq remain finite; and it is probably this fact which has 
 prompted the above remarks of Miss Whitaker. But this is 
 really to put the cart before the horse. In the case of integral 
 variates we can, as shown in the preceeding paragraphs, derive 
 the Poisson probability function as a general form of frequency 
 distributions whose semi-invariants are all equal, and it is only 
 incidental that this property is possessed by the special binomial 
 limit in the case mentioned above. But this property is not a 
 general property of the binomial any more than it is the property 
 of the same point binomial function to result in a Laplacean 
 normal probability curve when the exponent s is large and 
 
 Looked at wholly from the point of view of frequency func- 
 tions it is, however, not necessary to resort to the binomial 
 limit as a base for the derivation of Poisson's formula, which 
 can be derived directly from the definition of the semi-invariants 
 when those peculiar parameters are considered as being of 
 equal magnitude irrespective of their order. Now the question 
 presents itself whether the Poisson probability curve might 
 not, like its fellow brother, the Laplacean probability curve, 
 be used as a generating function in expanding certain types of 
 frequency distributions in serial form. It is to this question 
 that the discussion is devoted in the following chapter.
 
 CHAPTER XVIII 
 
 POISSON-CHARLIER FREQUENCY CURVES FOR INTEGRAL VARIATES 
 
 134. Charlier's B Curve. — We have already seen in the pre- 
 vious chapters that the Gram-Charlier frequency curve could be 
 written as 
 
 F (x) = :i.Ci<Pi(x) =^CiHi{x)<f oix) for i = 0, 1, 2, 3, ... . 
 
 where (po{x) is the generating Laplacean probability function. 
 The idea now immediately suggests itself to use a similar 
 method of expansion in the case of the Poisson probability 
 function and to employ this exponential as a generating func- 
 tion in the same manner as the Laplacean function. We are, 
 however, in the present case of the Poisson exponential dealing 
 with a generating function which so far has been defined for 
 positive integi'al values only and, therefore, represents a dis- 
 crete function. For this reason it will be impossible to express 
 the series as the sum-products of the successive derivatives of 
 the generating function and their correlated parameters c. We 
 can, however, in the case of integral variates express the series 
 by means of finite differences and write F{x) as follows: 
 
 Fix) =coxjj(x) -\-CiAyjj{x) -^Co^-xfJ(x) + .... (/) 
 
 where xpix) =e~''^irf:x! for a: = 0, 1, 2, 3, .... , and 
 
 ^"i//(x)=l, 
 
 Ayp{x)=yp{x)-Hx-l), 
 
 A'^Pix)=Arp(x)-AHx-l) =4^{x) -2\l^(x-l) -{-4^(x-2). 
 
 The series (/) is 'known as the Poisson -Charlier frequency 
 series or Charlier's B type of frequency curves. 
 
 The semi-invariants of these frequency series are given by the 
 following relation: 
 
 e "^ 3! = J^Mix) +CiA'/'(a:)-hC2AV(x) + ] e^" 
 
 X =o 
 
 271
 
 272 POISSON-CHARLIER FREQUENCY CURVES [134 
 
 Expanding and equating the co-efficients of equal powers 
 of CO we have : 
 
 Xo = 1 = co^4'(x) or Co = 1 
 
 Xi = :ix {^(x) -hcMU) +C2AV(a:) + ) UD 
 
 ?ix'-\-?.2 = ^JcKxlf(x) -\-c,Axjj{x) -\-C2A-xIj{x) -{- ) 
 
 We now have 
 
 X\p{x) =1, and 
 
 Ixxjfix) ='^me~"'m''~^ :(x — 1)1 = 7n^xfj(x — 1) =m. 
 
 We also find from well-known formulas of the calculus of 
 finite differences that* 
 
 ^x^{x) =m^+m 
 'XxA\jf{x) = — 1 
 
 Xx^Axjj(x) = -(2m-\-l) 
 Xx^A^(x)=2 
 
 Substituting these values in (//) we obtain 
 
 Pui-+X2 = m-+m — (2?w+l)ci+2c2 
 
 By letting in = Ai we can make the coefficient Ci vanish, which 
 results in 
 
 Ci = M[/^2 — m] 
 
 where the two semi-invariants Xi and A2 are calculated around 
 the natural zero of the number scale as origin. 
 
 For the above discussion we have limited ourselves to the 
 determination of the three constants m, Co and C2. It is easy, 
 however, to find the higher parameters Cs, Ci, Cs, . . . . from 
 the relations between the moments of the Poisson function and 
 the semi-invariants of order 3, 4, 5, . . . etc. Charlier usually 
 
 *Th«'s(; formulas ran also h(? dorivcfl from the definition of tho semi-invariants and 
 tho well-known relations between moments and semi-invariants as niven on page 237, 
 when we remember that according to our dclinition all semi-invariants in the Poisson 
 exponential are »'<(ual to m.
 
 135] 
 
 NUMERICAL EXAMPLES 
 
 273 
 
 calls the parameter m the modulus and C2 the eccentricity of the 
 B curve. 
 
 135. Numerical Examples. — As an illustration of the appli- 
 cation of the Poisson-Charlier series we select the previously- 
 mentioned series of observations on alpha particles radiated from 
 a bar of Polonium as determined by Rutherford and Geiger 
 and shown on page 174 of this treatise. We are here dealing 
 with integral variates which can assume positive values only 
 and the observations are therefore eminently adaptable to the 
 treatment by Poisson-Charlier curves. Selecting the natural 
 zero as the origin of the co-ordinate system we find that the 
 first two semi-invariants are of the form 
 
 ?.i =3.8754, Ao = 3.6257, and we therefore have: 
 
 w = >.i=3.88; C2 = H[>.2-w] = -0.125 
 
 The equation for the frequency distribution of the total N = 
 2608 elements therefore becomes 
 
 F(x)=Ml//3.88(x) + ( -0.125) A-1//3.88(X)]. 
 
 The table below gives the values as fitted to the curve, F{x) : 
 
 Alpha Particles Discharged from Film of Polonium 
 
 
 
 (Rutherford and Geiger) 
 
 
 
 
 
 A' =2608, m = 3.88, 
 
 C2= -0.125 
 
 
 
 (1) 
 
 (2) 
 
 (3) 
 
 (4) 
 
 (5) 
 
 (0) 
 
 X 
 
 -^Cx) 
 
 A2'i(x) 
 
 .VX(2) 
 
 iVx(3)Xc3 
 
 (4) + (5) 
 
 
 
 .020668 
 
 +.020668 
 
 .53.9 
 
 - 6.7 
 
 47 
 
 1 
 
 .080156 
 
 +.038820 
 
 209.0 
 
 -12.7 
 
 196 
 
 2 
 
 . 1 5.5455 
 
 +.015S11 
 
 405.4 
 
 - 5.2 
 
 400 
 
 3 
 
 .201015 
 
 -.029739 
 
 524.2 
 
 + 9.7 
 
 533 
 
 4 
 
 .194967 
 
 -.051608 
 
 508-5 
 
 + 16.8 
 
 525 
 
 5 
 
 .151265 
 
 - .037()54 
 
 .394.5 
 
 + 12.3 
 
 407 
 
 6 
 
 .097850 
 
 -.009714 
 
 254.9 
 
 + 3.2 
 
 258 
 
 7 
 
 .0.^4249 
 
 + .009814 
 
 141.2 
 
 - 3.2 
 
 138 
 
 8 
 
 .026316 
 
 +.015668 
 
 68.7 
 
 - 5.1 
 
 64 
 
 9 
 
 .011351 
 
 +.012968 
 
 29.6 
 
 - 4.2 
 
 25 
 
 10 
 
 .004407 
 
 +.00,S021 
 
 11.5 
 
 - 2.6 
 
 9 
 
 11 
 
 .001555 
 
 + .004092 
 
 4.1 
 
 - 1.2 
 
 3 
 
 12 
 
 .000503 
 
 +.001S00 
 
 1.3 
 
 - 0.6 
 
 1 
 
 13 
 
 .000150 
 
 +.000699 
 
 0.4 
 
 - 0.2 
 
 
 
 14 
 
 .000042 
 
 +.000245 
 
 0.1 
 
 - 0.1 
 
 
 
 15 
 
 .000010 
 
 + .000076 
 
 0.0 
 
 - 0.0 
 
 
 
 16 
 
 .000003 
 
 + .000025 
 
 
 
 
 
 17 
 
 .000001 
 
 +.000005 
 
 
 

 
 274 POISSON-CHARLIER FREQUENCY CURVES 136 
 
 Bateman has in Philosophical Transactions (1902) given a 
 theoretical frequency distribution of the above series of observa- 
 tions wherein he develops the Poisson probability function, 
 being ignorant of the previous demonstration by Poisson. In 
 a later note he mentions that the formula was given by the 
 French mathematician in his work on probabilities, published 
 in 1837. 
 
 Bateman's calculation includes, however, only the first term 
 of the Poisson-Charlier series and is, therefore not so close as 
 the above fit. 
 
 As a second example we offer our old friend, the distribution 
 of fiower petals in Ranunculus Bulbosiis. Selecting the zero 
 point at a: = 5 and computing the semi-invariants in the usual 
 manner we obtain the following equation for the frequency 
 curve. 
 
 F{x) =222i//(x)+31.5A-i//(a;), ?w =0.631 
 
 A comparison between calculated and observed values fol- 
 lows below: 
 
 X 
 
 F(x) 
 
 Obs. 
 
 5 
 
 134.9 
 
 133 
 
 6 
 
 51.6 
 
 55 
 
 7 
 
 22.5 
 
 23 
 
 8 
 
 9.5 
 
 7 
 
 9 
 
 2.9 
 
 2 
 
 [0 
 
 0.6 
 
 2 
 
 136. Transformation of the Variate. — For integral vari- 
 ates we have shown that the Poisson frequency curve possesses 
 the important property that all its semi-invariants are equal. 
 Now while a frequency distribution of a certain integral variate, 
 X, may perhaps 7wt possess this property, it may, however, 
 very well happen after a suitable linear transformation has been 
 made, that the variate thus transformed will be subject to the 
 laws of Poisson's function. 
 
 Let z = ax — b represent the linear transformation which is 
 subject to the above laws with a series of semi-invariants all 
 equal to m. 
 
 These semi-invariants according to the properties set forth 
 in paragraph (104) are therefore
 
 136] TRANSFORMATION OF THE VARIATE 275 
 
 m = Xi (2) = aXi (x) — 6 
 m = X3(z) ^a^Xiix) 
 
 and our problem is to find the unknown parameters a, b and m. 
 Simple algebraic methods, which it will not be necessary to 
 dwell upon, give the following results: 
 
 a = Xo :?.s 
 
 As a numerical illustration of this transformation we choose 
 from J0rgensen a series of observations by Davenport on the 
 frequency distribution of glands in the right foreleg of 2000 
 female swine. 
 
 No. of Glands 0123456789 10 
 Frequency . . 15 209 365 482 414 277 134 72 22 8 2 
 
 The values of the three first semi-invariants are 
 
 Xi =8.501, P.o = 2.825, X3 =2.417, 
 
 a = 2.825:2.417 = 1.168 
 
 m = 2.825-^:2.417- = 3.859 
 
 b = (1.168) (3.501) -3.859 =0.230. 
 
 The new variable then becomes z=ax — b and the transformed 
 Poisson probability function takes on the form: 
 
 ... e-"'m' 
 
 In general, however, we will find that z is not a whole number 
 and the expression z\ therefore has no meaning from the point 
 of view of factorials at least. This difficulty may, however, 
 be overcome through the introduction of the well-known 
 Gamma Function, 1X2; + 1), which holds true for any positive
 
 276 POISSON-CHARLIER FREQUENCY CURVES [137 
 
 or negative real value of z and which in the case of integral 
 values of z reduces to V{z^\)=z\ 
 
 Hence we can write the transformed Poisson probability 
 function as 
 
 xfj{z) 
 
 r(2+i) 
 
 Tables to 7 decimal places of the Gamma Function, or rather 
 for the expression —log Fl^ + l), have been computed by J0r- 
 gensen in his aforementioned book from z= —3 to z = 15, pro- 
 gressing by intervals of 0.01. 
 
 By means of this table and the tables of ordinary logarithms 
 it is now easy to find the values of i/>(z) in the case of the example 
 relating to the number of glands in female swine. The detailed 
 computation is shown below.* 
 
 (1) 
 
 (2) 
 
 (■■i) 
 
 (4) 
 
 (o) 
 
 (6) 
 
 7 
 
 X 
 
 z — 
 
 AogViz+l) 
 
 log m^ 
 
 (3) + (4)+log<r-'« 
 
 ^{z) 
 
 Fix) 
 
 
 
 -.230 
 
 .9209 
 
 .8651 
 
 .1101-2 
 
 .0129 
 
 30.1 
 
 1 
 
 +.938 
 
 .0108 
 
 .5500 
 
 .8849-2 
 
 .0767 
 
 179.2 
 
 2 
 
 2.106 
 
 .6555 
 
 .2350 
 
 .2146-1 
 
 .1639 
 
 382.9 
 
 3 - 
 
 3.274 
 
 .0679 
 
 .9199 
 
 .3119-1 
 
 .2051 
 
 479.1 
 
 4 
 
 4.442 
 
 .3216 
 
 .6048 
 
 .2.505-1 
 
 .1780 
 
 415.8 
 
 5 
 
 5.610 
 
 .4547 
 
 .2897 
 
 .0685-1 
 
 .1171 
 
 273.6 
 
 6 
 
 6.778 
 
 .4904 
 
 .9746 
 
 .7891-2 
 
 .0615 
 
 143.7 
 
 7 
 
 7.946 
 
 .4446 
 
 .6595 
 
 .4282-2 
 
 .0268 
 
 62.6 
 
 8 
 
 9.114 
 
 .3285 
 
 .3444 
 
 .9970-3 
 
 .0099 
 
 23.1 
 
 9 
 
 10.282 
 
 .1506 
 
 .0294 
 
 .5041-3 
 
 .0032 
 
 7.5 
 
 10 
 
 11.4r)0 
 
 .9177 
 
 .7143 
 
 .9561-4 
 
 .0009 
 
 2.1 
 
 137. The Bernoullian Series Expressed by B Curves. 
 
 In the case of the Bernoullian point binomial we have ?-i = sp and 
 'Ao^spq. If we now wish to express the general term, c{x), in 
 the binomial by a Poisson-Charlier series we evidently have 
 
 f{x)=\fj(x)-\-C2A-^{x) where 
 xIj{x) =e~"'m':xl 
 
 Now m = ?-! =sp, and Co = l^iy^-o—m) = Vzispq—sp) = — ^sp^ 
 As an illustration of this expansion we may again look at the 
 
 *Tlie characteristics of the logarithms have been omitted in this talilo (except In 
 cohimn .5) and only the positive mantissas are shown. Column 7 represents tlio 2000 
 individual observations pro rated according to column 6.
 
 138] REMARKS ON MR. KEYNES' CRITICISM 277 
 
 point binomial (0.1+0.9)'"", discussed on page (268). There 
 we have: 
 
 w = 10, C2 = — ^2> while 
 
 <f{x) =i//io(x) - 3/^A-i//io(x). 
 
 The actual computation by means of this formula results in 
 the following tabular representation. 
 
 X 
 
 Poisson-Charlier 
 
 X 
 
 Poisson-Charlier 
 
 
 
 .0000 
 
 12 
 
 .0986 
 
 1 
 
 .0002 
 
 13 
 
 .0744 
 
 2 
 
 .OOIG 
 
 14 
 
 .0514 
 
 3 
 
 .0059 
 
 15 
 
 .0330 
 
 4 
 
 .0159 
 
 16 
 
 .0197 
 
 5 
 
 .0340 
 
 17 
 
 .0108 
 
 6 
 
 .0599 
 
 18 
 
 .0055 
 
 7 
 
 .0893 
 
 19 
 
 .0025 
 
 8 
 
 .1149 
 
 20 
 
 .0011 
 
 9 
 
 .1301 
 
 21 
 
 .0005 
 
 10 
 
 .1314 
 
 22 
 
 .0001 
 
 11 
 
 .1194 
 
 23 
 
 .0000 
 
 A comparison of this series with the actual expansion of the 
 point binomial by Miss Dickson on page (268) leaves little to 
 be desired in the way of exactitude. The fit to the true binomial 
 is even closer than that of Gram-Charlier series in spite of the 
 fact that only two parameters enter into the determination of 
 Poisson-Charlier's curves while four parameters are required 
 for the Gram-Charlier curves. 
 
 I*i8. Remarks on Mr. Keynes' Criticisms. — From the above dis- 
 eussiou it is (.-videut that the BeruouUiun poiut binomial can always be 
 represented without difficulty by either the Gram-Charlier or the Poisson — 
 (^harlier frequency curves. This point is of interest in connection with 
 some wholly misleading and erroneous statements regarding Laplace's 
 analysis of the Bernoullian Theorem by ZVIr. .J. M. Keynes, in his recently 
 issued ''Trcaliae on Prohahility. 
 
 On page 358 JNIr. Keynes points out the ass\Tnmetry in the Bernoullian 
 expansion and claims that the want of symmetry is generally Iteing over- 
 look(!d, "and it is not uncommon to assume that the probability of a given 
 divergence less than pm is equal to that of the same divergence in excess 
 of -pm, and, in general, that the probability of the frequency's exceeding 
 -pm in a set of m trials is equal to that of falling short of pm." 
 
 No real mathematician, and least of all Laplace, has ever claimed the 
 presence of symmetry as being general in the case of the Bernoullian
 
 278 POISSON-CHARLIER FREQUENCY CURVES [138 
 
 series. Those who ha\ e fulUu into that error are economists and statisti- 
 cians who like ]Mr. Keynes are ignorant of the true assumptions underlying 
 Bernoulli's and Laplace's demonstrations. Every mathematician knows 
 for instance that the annual loss ratios for total permanent disability bene- 
 fits in workmen's compensation assurance on the basis of a small sample 
 pajToll of say 100,000 Ki'oner can assume all possible real values from 
 zero and upwards, and losses in excess of 100,000 are indeed possible. 
 Statistics fi-om certain Scandinavian industries show that the average 
 annual loss ratio in respect to total permanent disability is about 1 Krone 
 for each 100,000 Kroner of payroll exposure. We know, however, of a 
 certain instance in the case of a small industrial establishment with a 
 pajToU of nearly 100,000 Kroner where the losses in a single year were 
 more than 20,000 Kroner. In the great majority of cases, however, the 
 annual losses are nil. We are, therefore, dealing with a decidedly skew 
 Poisson — Charlier frequency curve. 
 
 Mr. Keynes' example is in fact less striking. He considers the case of 
 throwing aces in 00 successive throws with a die, and he remarks that the 
 ace cannot appear less often than not at all, whereas it may well appear 
 more than 20 times. This, of course, is self evident and realized by 
 Laplace in his analysis, and there is no valid reason for dw^elling at length 
 on such simple matters. But the English scholar is evidently greatly 
 impressed by these very simple considerations for he continues in a most 
 charming and naive manner as follows: 
 
 "The actual measurement of this w-ant of symmetry and the determina- 
 tion of the conditions, in which it can be safely neglected, in\ olves labori- 
 ous mathematics of which I am only acquainted with one direct investiga- 
 tion, that published in the Proceedings of the London MaUieinalical Society 
 by Mr. T. C. Simmons." 
 
 How this eharmi^ng and naive statement reminds one of the playful 
 sophistries of a bright and impish, but not necessarily bad, small boy, 
 trying to offer some excuse and explanation for his mischievous pranks, 
 while he at the same time is wholly unmindful of the fact that his explana- 
 tions and excuses are the most damning evidence of his own guilt. 
 
 Here we have Mr. Keynes, a succ(>ssful wTitcr of et^onomic sul)j(H't s, posing 
 as a critic of such intellectual giants in the realm of mathematical science 
 as Bernoulli, Laplace and Poisson (which of course presupposes that he 
 must have read vi^ry carefully the various writings of those old masters) ; 
 who calmly and in the most innocent manner admits that of the "labori- 
 ous" mathematics involved in this question he is only acquainted with a 
 rather clumsy demonstration publislicd in 180(). 
 
 While I have not the slightest doubt as to the veracity of these facts 
 so far as Mr. Keynos is concerned, it will be of interest to see what the 
 actual historical facts are. Now in so far as the measurement of the 
 assymmetry or skewness of the BernouUian point binomial is concerned 
 this w'as already performed by Laplace himself, an ac(!omplishment which 
 in itself creates a degree of doubt in the reader's mind as to whether Mr. 
 Keynes really has stu(li(>d Laplace with the necessary care required of one 
 who poses as a critic of the great Freneluiuiu. Ilnrahl Westergaard, the
 
 138] REMARKS ON MR. KEYNES' CRITICISM 279 
 
 eminent Danish scholar, whose fame as a statistician surely rests on a far 
 more secure foundation than that of Keynes takes in his Slatislikens Teori i 
 Grundrids (Copenhagen, 1915), special pains to j)oint out that Laplace 
 was the first to give a mathematical measure of the sktewness in a Ber- 
 noullian frequency distribution. 
 
 The Danish actuary. Gram, long before the intellect of our recent 
 English critic saw the light, derived his general series for frequency cm'ves, 
 which of course also applies to the Bernoullian case. Thiele in his Al- 
 mindelig lagttagclseslaerc, at a time when the young Keynes probably 
 was being piloted by his nursemaid or governess, discussed the same series 
 from the point of view of semi-invariants. Later on Charlier continued 
 in dii'ect line from where Laplace and Poisson concluded their labours. 
 
 The necessary corrections to the generating functions, whether these 
 be Laplace's or Poisson's probabilitj^ curves, as derived by these Scandi- 
 navian writers are given in Chapters XVII and XVIII of this treatise. 
 Not a single one of these demonstrations requires the "laborious" mathe- 
 matics as mentioned by Mr. Keynes. The fact that our romancing 
 English economist evidently is in blissful ignorance of the fundamental 
 w^ork by the Scandinavian school is, however, no excuse for his superfi- 
 cial knowledge of the expansion of statistical series, since much of this 
 work has appeared ^n English. 
 
 Mr. Keynes' misconception of the real significance of the Law of Small 
 Numbers and his criticism of Bortkewicz may possibly also be traced to 
 his apparent ignorance of the work of the Scandinavian authors. His 
 criticism moves practically along the same lines as that of the \iews held 
 by Miss Whitaker and described on page 270 of this treatise. He, like 
 Miss Whitaker, fails to realize that the generating Poisson probability 
 function arises from the more general fact that all its semi-invariants are 
 equal, rather than from the more special fact that one of the limiting values 
 of the point binomial reduces to a Poisson frequency curve. The very 
 fact that Mr. Keynes in his large volume never mentions the semi-invari- 
 ants leads one to inquire whether those important statistical parameters 
 remain a closed book to him. * 
 
 ♦Similar immature views as those lield by Keynes and Miss Whitaker are also ex- 
 pressed by Mr. A. Mowbray in the May. 1920, Proceedings of the Casualty Actuarial 
 Societn ff America (page 107). .ludging from iiis tedious and laborious analysis Mow- 
 bray evidently never has heard of the semi-invariants.
 
 280 ABRIDGED TABLE OF LAPLACE'S 
 
 z fo{z) fziz) fiiz) foiz) <f6iz) f<foiz)dz 
 
 0.00 
 
 .3989 
 
 +0.0000 
 
 + 1.1968 
 
 -0.0000 
 
 -5.9841 
 
 .5000 
 
 .05 
 
 .3984 
 
 .0597 
 
 1.1894 
 
 .2983 
 
 5.9319 
 
 .5199 
 
 .10 
 
 .3970 
 
 .1187 
 
 1.1671 
 
 .5915 
 
 5.7763 
 
 .5398 
 
 .15 
 
 .3945 
 
 .1762 
 
 1.1304 
 
 .8743 
 
 5.5208 
 
 .5596 
 
 .20 
 
 .3910 
 
 .2315 
 
 1.0799 
 
 1.1420 
 
 5.1711 
 
 .5793 
 
 .25 
 
 .3867 
 
 .2840 
 
 1.0165 
 
 1.3900 
 
 4.7351 
 
 .5987 
 
 .30 
 
 .3814 
 
 .3330 
 
 .9413 
 
 1.6142 
 
 4.2223 
 
 .6179 
 
 .35 
 
 .3752 
 
 .3779 
 
 .8556 
 
 1.8111 
 
 3.6439 
 
 .6368 
 
 .40 
 
 .3683 
 
 .4184 
 
 .7607 
 
 1.9777 
 
 3.0122 
 
 .6554 
 
 .45 
 
 .3605 
 
 .4539 
 
 .6583 
 
 2.1117 
 
 2.3414 
 
 .6736 
 
 .50 
 
 .3520 
 
 .4841 
 
 .5501 
 
 2.2114 
 
 1.6448 
 
 .6915 
 
 .55 
 
 .3429 
 
 .5088 
 
 .4378 
 
 2.2760 
 
 .9371 
 
 .7088 
 
 .60 
 
 .3332 
 
 .5278 
 
 .3231 
 
 2.3052 
 
 - .2324 
 
 .7257 
 
 .65 
 
 .3230 
 
 .5411 
 
 .2078 
 
 2.2995 
 
 + .4555 
 
 .7422 
 
 .70 
 
 .3123 
 
 .5486 
 
 + .0937 
 
 2.2601 
 
 1.1135 
 
 .7580 
 
 .75 
 
 .3011 
 
 .5505 
 
 - .0176 
 
 2.1888 
 
 1.7298 
 
 .7734 
 
 .80 
 
 .2897 
 
 .5469 
 
 .1247 
 
 2.0880 
 
 2.2938 
 
 .7881 
 
 .85 
 
 .2780 
 
 .5381 
 
 .2260 
 
 1.9604 
 
 2.7964 
 
 .8023 
 
 .90 
 
 .2661 
 
 .5245 
 
 .3203 
 
 1.8095 
 
 3.2304 
 
 .8159 
 
 0.95 
 
 .2541 
 
 .5062 
 
 .4067 
 
 1.6387 
 
 3.5898 
 
 .8289 
 
 1.00 
 
 .2420 
 
 .4839 
 
 .4839 
 
 1.4518 
 
 3.8715 
 
 .8413 
 
 1.05 
 
 .2299 
 
 .4580 
 
 .5516 
 
 1.2529 
 
 4.0735 
 
 .8531 
 
 1.10 
 
 .2179 
 
 .4290 
 
 .6091 
 
 1.0458 
 
 4.1958 
 
 .8643 
 
 1.15 
 
 .2059 
 
 .3973 
 
 .6561 
 
 .8346 
 
 4.2403 
 
 .8749 
 
 1.20 
 
 .1942 
 
 .3635 
 
 .6925 
 
 .6230 
 
 4.2103 
 
 .8849 
 
 1.25 
 
 .1826 
 
 .3282 
 
 .7185 
 
 .4147 
 
 4.1107 
 
 .8944 
 
 1.30 
 
 .1714 
 
 .2918 
 
 .7341 
 
 .2130 
 
 3.9475 
 
 .9032 
 
 1.35 
 
 .1604 
 
 .2.549 
 
 .7399 
 
 - .0209 
 
 3.7278 
 
 .9115 
 
 1.40 
 
 .1497 
 
 .2180 
 
 .7364 
 
 + .1590 
 
 3.4595 
 
 .9192 
 
 1.45 
 
 .1394 
 
 .1815 
 
 .7243 
 
 .3244 
 
 3.1510 
 
 .9265 
 
 1.50 
 
 .1295 
 
 .1457 
 
 .7042 
 
 .4735 
 
 2.8109 
 
 .9332 
 
 1.55 
 
 .1200 
 
 .1111 
 
 ,6772 
 
 .6051 
 
 2.4481 
 
 .9394 
 
 1.60 
 
 .1109 
 
 .0780 
 
 .6440 
 
 .7181 
 
 2.0712 
 
 .9452 
 
 1.65 
 
 .1023 
 
 .0468 
 
 .6057 
 
 .8121 
 
 1.688() 
 
 .9505 
 
 1.70 
 
 .0940 
 
 + .0175 
 
 .5632 
 
 .8870 
 
 1.3079 
 
 .9554 
 
 1.75 
 
 .0863 
 
 - .0094 
 
 .5173 
 
 .9431 
 
 .9363 
 
 .9599 
 
 1.80 
 
 .0790 
 
 .0341 
 
 .4692 
 
 .9809 
 
 .5801 
 
 .9641 
 
 1.85 
 
 .0720 
 
 .0.563 
 
 .4195 
 
 1.0014 
 
 + .2450 
 
 .9678 
 
 1.90 
 
 .0656 
 
 .0760 
 
 .3693 
 
 1.0058 
 
 - .0646 
 
 .9713 
 
 1.95 
 
 .0596 
 
 .0933 
 
 .3192 
 
 .9955 
 
 .3452 
 
 .9744
 
 PROBABILITY FUNCTION AND ITS DERIVATIVES 281 
 
 z 
 
 fo(2) 
 
 ^3(2) 
 
 ^4(2) 
 
 ^5(2) 
 
 ^6(2) f<fo{z)'h 
 
 2.00 
 
 .0540 
 
 -O.IOSO 
 
 -0.2700 
 
 +0.9718 
 
 -0..5!)39 
 
 .9772 
 
 2.05 
 
 .0488 
 
 .1203 
 
 .2223 
 
 .9366 
 
 .8091 
 
 .9798 
 
 2.10 
 
 .0440 
 
 .1302 
 
 .1765 
 
 .8915 
 
 .9899 
 
 .9821 
 
 2.15 
 
 .0396 
 
 .1380 
 
 .1332 
 
 .8382 
 
 1.1362 
 
 .9842 
 
 2.20 
 
 .0355 
 
 .1436 
 
 .0927 
 
 .7784 
 
 1.2488 
 
 .9861 
 
 2.25 
 
 .0317 
 
 .1473 
 
 .0554 
 
 .7139 
 
 1.3291 
 
 .9878 
 
 2.30 
 
 .0283 
 
 .1492 
 
 - .0214 
 
 .6460 
 
 1.3788 
 
 .9893 
 
 2.35 
 
 .0252 
 
 .1495 
 
 + .0092 
 
 .5764 
 
 1.4004 
 
 .990() 
 
 2.40 
 
 .0224 
 
 .1483 
 
 .0362 
 
 .5064 
 
 1.3965 
 
 .9918 
 
 2.45 
 
 .0198 
 
 .1459 
 
 .0598 
 
 .4372 
 
 1.3701 
 
 .9929 
 
 2.50 
 
 .0175 
 
 .1424 
 
 .0800 
 
 .3697 
 
 1.3242 
 
 .9938 
 
 2.55 
 
 .0154 
 
 .1380 
 
 .0968 
 
 .3050 
 
 1.2619 
 
 .9946 
 
 2.60 
 
 .0136 
 
 .1328 
 
 .1105 
 
 .2438 
 
 1.1865 
 
 .9953 
 
 2.65 
 
 .0119 
 
 .1270 
 
 .1213 
 
 .1865 
 
 1.1007 
 
 .9960 
 
 2.70 
 
 .0104 
 
 .1207 
 
 .1293 
 
 .1338 
 
 1.0076 
 
 .9965 
 
 2.75 
 
 .0091 
 
 .1141 
 
 .1347 
 
 .0858 
 
 .9098 
 
 .9970 
 
 2.80 
 
 .0079 
 
 .1073 
 
 .1379 
 
 .0429 
 
 .8097 
 
 .9974 
 
 2.85 
 
 .0069 
 
 .1003 
 
 .1391 
 
 + .0049 
 
 .7095 
 
 .9978 
 
 2.90 
 
 .0060 
 
 .0934 
 
 .1385 
 
 - .0281 
 
 .6110 
 
 .9981 
 
 2.95 
 
 .0051 
 
 .0865 
 
 .1364 
 
 .0563 
 
 .5159 
 
 .9984 
 
 3.00 
 
 .0044 
 
 .0798 
 
 .1330 
 
 .0798 
 
 .4255 
 
 .9987 
 
 3.05 
 
 .0038 
 
 .0732 
 
 .1284 
 
 .0989 
 
 .3407 
 
 .9989 
 
 3.10 
 
 .0033 
 
 .0669 
 
 .1231 
 
 .1140 
 
 .2624 
 
 .9990 
 
 3.15 
 
 .0028 
 
 .0609 
 
 .1171 
 
 .1253 
 
 .1911 
 
 .9992 
 
 3.20 
 
 .0024 
 
 .0552 
 
 .1106 
 
 .1332 
 
 .1271 
 
 .9993 
 
 3.25 
 
 .0020 
 
 .0499 
 
 .1039 
 
 .1381 
 
 .0705 
 
 .9994 
 
 3.30 
 
 .0017 
 
 .0449 
 
 .0969 
 
 .1404 
 
 - .0213 
 
 .9995 
 
 3.35 
 
 .0015 
 
 .0402 
 
 .0899 
 
 .1403 
 
 + .0207 
 
 .9996 
 
 3.40 
 
 .0012 
 
 .0359 
 
 .0829 
 
 .1384 
 
 .0561 
 
 .9997 
 
 3.45 
 
 .0010 
 
 .0318 
 
 .0761 
 
 .1348 
 
 .0849 
 
 .9997 
 
 3.50 
 
 .0009 
 
 .0283 
 
 .0694 
 
 .1300 
 
 .1078 
 
 .9998 
 
 3.55 
 
 .0007 
 
 .0249 
 
 .0631 
 
 .1242 
 
 .12.-)4 
 
 .9998 
 
 3.60 
 
 .0006 
 
 .0219 
 
 .0.570 
 
 .1175 
 
 .1380 
 
 .9998 
 
 3.65 
 
 .0005 
 
 .0193 
 
 .0513 
 
 .1104 
 
 .1464 
 
 .9999 
 
 3.70 
 
 .0004 
 
 .0168 
 
 .0460 
 
 .1030 
 
 .1510 
 
 .9999 
 
 3.75 
 
 .0004 
 
 .0146 
 
 .0410 
 
 .0954 
 
 .1525 
 
 .9999 
 
 3.80 
 
 .0003 
 
 .0127 
 
 .0365 
 
 .0878 
 
 .1512 
 
 .9999 
 
 3.85 
 
 .0002 
 
 .0110 
 
 .0323 
 
 .0803 
 
 .1478 
 
 .9999 
 
 3.90 
 
 .0002 
 
 .0095 
 
 .0284 
 
 .0730 
 
 .1426 
 
 .99995 
 
 3.95 
 
 .0002 
 
 .0081 
 
 .0249 
 
 .0660 
 
 .1361 
 
 .99997
 
 ADDENDA. 
 
 APPENDIX AND BIBLIOGRAPHICAL NOTES. 
 
 Chapter I. 
 
 Page 3. The establishment of the relations between hypothetical judg- 
 ments and probabilities is probably first due to F. C. Lange. See also the 
 discussion in Sigwart's "Logic" (English translation, Macmillan Co., New 
 York, 1904). A defense of the "principle of insufficient reason" as opposed 
 to the view of von Kries is given by K. 8titmpf ("tjber den Begriff der mate- 
 matischen Wahrscheinlichkeit") Ber. hayr. Ak. (phil. KL), 1892. For a 
 further discussion of the philosophical aspect the reader is advised to consult 
 "Theorie und Methoden der Statistik" (Tubingen, 1913) by the Russian 
 statistician, A. Kaufmann. 
 
 Chapter II. 
 
 Page 21. An interesting account of the application of the theory of proba- 
 bilities to whist is given by Poole in "Philosophy of Whist Play" (New York 
 and London, 1883). Page 23. Example 6. This is a general case of the 
 so-called game of "Treize" or "Recontre" first di.scu.ssed by Montmort in his 
 "Essai sur les Jeux des Hazards" (1708). "Thirteen cards numbered 1, 2, 
 3, ... up to 13 are thrown promiscuously into a bag and then drawn out 
 singly; required the chance that once at least the number on a card shall 
 coincide with the number expressing the order in which it is drawn." This is 
 one of the stock problems in probability and has been discussed by nearlv all 
 the leading classical writers on the subject. 
 
 Chapter IV. 
 
 The close connection between probability and symbolic logic is admirably 
 discussed by the Italian mathematician, Peano, in various of his mathematical 
 texts. Page 42. Example 19. See also the discussion by R. Hendersox in 
 "Mortality and Statistics" (New York, 1915). 
 
 Chapter V. 
 
 38. The moral expectation has been discu.ssed by Harald Westergaard 
 in "Tids.skrift for Matematik" (1878) and in " Smaaskrif ter tilegnede C. F. 
 Krieger" (Copenhagen, 1889). 
 
 Chapter VI. 
 
 A German translation with explanatory notes of Bayes's brochure has 
 recently appeared in the scries of "Ostwald's Klassiker." 
 
 Page 74. The double integral in the numerator of (IV) is evidently of the 
 form: 
 
 // 
 
 (-4) 
 
 283
 
 284 
 
 ADDENDA. 
 
 where the contour of the field of integration (A) is defined by means of the 
 relations: 
 
 a < yiUi < 0, < 2/1 < 1 and < (/o < 1. 
 
 The field of integration is thus the area swept out by the hyperbola 
 y,(/2 = a, the straight line y-z = 1, the hyperbola yiy-i = /8 and the straight 
 line 2/1 = 1. 
 
 Changing the variables by means of the transformation: 
 
 2/12/2 = y = <piy, 2) and I - yi = z{l - y) = \p{y, z) 
 we get the following new double integral 
 
 FW^y- z)i '/'(^j z)\\J\dydz {\J\ taken as absolute value), 
 
 ss 
 
 where / is the Jacobian or functional determinant defined by the formula: 
 
 For 
 
 1^1 = 
 
 J = 
 
 dip cJip 
 dyJz 
 
 d4^dxp 
 dy dz 
 
 3<^ 3i^ (?<p d\l/ 
 ~ dy dz dz dy 
 
 2/2 
 
 ^ - c(^ 
 
 /, 2), 
 
 - 1 - z(l - 2/) ^^- 
 
 2/j = 1 - 2(1 - 2/) = >/'(2/, 2), 
 
 1 - z 2/(1 - ?/) 
 
 
 [1 - z(l - 2/)P [1 - 2(1 - yW 
 
 
 z 
 
 
 - (1 - y) 
 
 
 1 -y 
 
 1 - 2(1 - y) 
 
 The transformation in a double integral implies in general three parts 
 (1) the expression of ^(2/12/2) in terms of y, z(2) the determination of the new 
 system of limits (3) substitution of dyidy^. The solution of the third part we 
 just gave above. The solution of the two first is purely algebraically. The 
 first part is a straightforward simple problem which should present no difficulty 
 
 V 
 
 
 
 V' 
 
 =P 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 c 
 
 lu 
 
 li' 
 
 
 
 
 
 
 
 
 
 
 
 
 Hz 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 V=a 
 
 ' — ^- 
 
 a ji 1 - Q 1 
 
 1 SeeGouus.\T: " Mathematical Analysis" (New York, 1904) pages 266-67.
 
 ADDENDA. 285 
 
 whatsoever to the student and which in conjunction with (3) brings the in- 
 tegrands on the form given in formula (V). 
 
 The easiest waj' to determine the new system of Hmits is probabl}- by con- 
 structing the contour in the new field of integration. The hyperbolas 
 yoji = a and ijiiji = /3 are in the new field of integration changed into the two 
 straight lines y = a and y = which determine the limits for the variable y. 
 A mere inspection of the expressions for <p(y, z) and \p{y, z) shows that the two 
 straight lines y-i = 1 and (/i = 1 become in the new field 2 = 1 and z = 
 which are the limits for z. 
 
 The contour (Ai) simply becomes a rectangle bounded by the straight 
 lines z = 0, y=(3, z = l and y = a. The complete transformation finally 
 brings the numerator on the form as given in (V). 
 
 Page 75. The question put by Mr. Bing is simply the determination of a 
 future event by means of Bayes's Rule. The limits a and /3 become and 1 
 respectively and the contom- of the field of integration simply becomes the 
 area bounded by yiy^ = 0, /yo = 1, yiyo = 1 and yi = 1, i. e., the area enclosed 
 between the two axis, the line 2/2 = 1, the hyperbola 2/1^/2 = 1 and the Una 
 2/i = 1. The transformed contour becomes a square with side equal to unity. 
 
 Chapter VH. 
 
 Page 83. The criticism by the English empiricists is to a certain extent 
 due to a misconception of the Bernoullian Theorem. "This theorem," Venn 
 says, "is generally expressed somewhat as follows: That in the long run all 
 events will tend to occur with a frequency proportional to their objective 
 probabihties." Any one giving careful attention to the deduction of the famous 
 theorem will, however, readily notice the fallacy of such a view. Not the 
 actual absolute frequencies of the events but the mathtmalical expectations of 
 such events are proportional to the a priori mathematical probabihty p. The 
 fallacy of Mr. Venn hes in his confusing an actual event with its mathematical 
 expectation. In other words, he makes the Bernoullian Theorem appear as 
 a regular hypothetical judgment whereas as a matter of fact it is a simple 
 probability judgment. If one is to take such an erroneous view of the Ber- 
 noullian Theorem one may even be reconciled with another startling statement 
 by Venn that "If the chance (against the happening of a certain event) be 
 1,023 to 1 it undoubtedly will happen once in 1,024 trials." 
 
 For a clear presentment of the empirical methods and their relation to 
 mathematical probabilities and deductive methods see v. Bortkiewicz 
 "Kritische Betrachtungen zur theoretischen Statistik" (Jahrb. f. X.-Oe. u. 
 Stat. 3 Folge, Ed. 8, 10, 11) and "Die statistischen Generalisationen " (Sci- 
 entia, Vol. V). v. Bortkiemcz is but one of the brilliant school of Russian 
 statisticians who has made a thorough study of the philosophical aspects of 
 statistics. The induction method of J. S. Mill is carried much farther and 
 put on a far .sounder basis than that originally given by Mill in the brochure 
 " Die Statistik als Wissenschaft " by A. A. Tschuproff as well as in his Russian 
 text " Researches on the Theory of Statistics." The main ideas of the Russian 
 writers are also found in Kaufmann's "Theorie and Methoden der Statistik" 
 (Tubingen, 1913).
 
 286 ADDENDA. 
 
 Chapter IX. 
 Page 95. For a closer approximation of /;! see Forsyth, A. R., "On an 
 Approximate Expression for .r!" (Brit. Ass. Rep., 1883). Page 107. In this 
 discussion it must be remembered that the variables are independent of each 
 other. The formula: €{ka) = At (a) is self evident, but may be proved as 
 follows : 
 
 e{ka) = ksp = ke(a), e[ka — e(ka)f = kh'[a — c(a)p = kh^{a), or ({ka) = ke(a). 
 
 Page 115. See also a similar discussion by Westergaard in " Mortalitiit und 
 Morbilitiit" (Jena, 1902), page 187. 
 
 Cuapteu XI. 
 
 The still unfinished series of monographs by Charlier are found in various 
 volumes of Meddclande Jrdn Lunds Aslronomiska Obaervatorium (Lund, 
 Stockholm) and in Svoiska Aktuariefdreningens Tidsskrifl (Stockholm). 
 
 Page 137. Since all statistical characteristics to greater or less extent are 
 effected with mean errors due to sampling it is of importance to be able to 
 determine such mean errors in simple algtibraic terms. We shall for the present 
 confine ourselves to the mean and the dispersion. The mean error in the mean, 
 Mb in a Bernoullian Series is given by the formula: 
 
 -N/e2(mi) + e^im2) + ■■■ e^irriTi) ^Nspoqo a 
 
 The mean error of the dispersion is somewhat difficult to obtain by elementary 
 methods since it involves the determination of the mean error of the mean error. 
 The mean error square of the mean error square ma)' be gotten by a process 
 similar to that of Laplace in § 65-66 by the introduction of the parameter, i, 
 in the expression for o? and «■* in c[(a — xp)- — spq^-. After several reductions 
 this latter exjjression may be brought to the form: 2(spo3o)^ = 2cr^ (approx.). 
 For the diyi)cr.sion we have: 
 
 V2iV 
 
 This formula will be proven under the discussion of frequencj^ curves. 
 
 Chapter XIII. 
 
 Page 184. The Danish engineer, AndriC, discovered a similar correlation 
 formula about the same time as Bravais. 
 
 Chapter XIV. 
 
 Page 196. Viewed from the standpoint of elementary errors the expression 
 for the frequency curve, <^(.r), may be tlerived in the following fashion: — 
 Let us arrange the elementary errors in small equidistant groups or intervals 
 of magnitude, a, and assume; ihat all the elementary errors when situated in 
 the same interval are of (^qual size, an assumption which is always permissible 
 for small values of o. This means that when r is an integral positive number 
 all the errors located in the interval ra — 3^a and ra. + \^o. must be of
 
 ADDENDA. 287 
 
 equal size and equal to ra. The relative frequency or the probability of such 
 errors is first of all proportional to the interval, a, and depends in the second 
 instance upon a certain — so far unknown — function, /(ra), of the quantity 
 ra. For a particular error source, say Qv, we may therefore express the 
 probability of the occurrence of an error, ra, as 
 
 afvira) 
 
 where fv is the unknown function for which we make no other assumptions 
 than those which follow immediately from the i)roperties of mathematical 
 probabilities, i. e. 
 
 r = <x> 
 OSf,(ra) S 1 and y> , Mr a) = 1 for v = 1, 2, 3, . . . s. 
 
 r = — CO 
 
 Consider now for the moment the following expression 
 
 7* = OO 
 
 FrU) = ^j a fAray"^' where i = V^^ 
 
 The coefficient of e''°-^^ in the sum is evidently the probability for the occur- 
 rence of an error ra from the error source Qr. 
 
 The probability of the occurrence of an error ra from another error source, 
 Qu, may similarly be expressed as 
 
 Fu(a:) = ^, afuira) e'"^^ 
 
 and so on for all the s independent error sources, which we assumed to be 
 operative on our statistical object. 
 
 The probability that the resulting sum from the various combinations into 
 which the elementarj errors from the s sources may enter is found by forming 
 the product 
 
 !■ = S 
 
 * (co) = II Fv{^) = /i'lM F^{^) F^io,) . . . F,i^) 
 
 v= 1 
 
 in accordance with the multiplication theorem of mathematical probabilities 
 Writing the above products as 
 
 * (co) = a [(p (0) + (p (a) e''"' + (p (2a) e^acoi _^ ^ (3^) ^,3aa;i ^ 
 
 + <pi-a) e-'^'^^ + (p (-2a) g-^^w^ + (p ( -3a> «--^<^'^' + . . .] 
 
 we notice that the coefficient, a(p(ja), of c'"'^"' is the probability that the
 
 288 ADDENDA. 
 
 elementary errors from the s error sources will enter into such a combination 
 that their sum will fall between ra — 3^a and ra + J^a. 
 
 .Multii)lyiiifj: on both sides of the above equation by c""'""^'^' (considering 
 aoi as tile incle[)endent variable) and integrating with respect to aco between 
 the limits — x and + tc we find that 
 
 a(p (ra) = / * (co) e-'"'^"^d (aco). 
 
 In the above integral aco is the independent variable. If co is chosen as 
 the independent variable, we have 
 
 a(p {ra) = ~- \ * (co) e"'"'^'^*' f/co. 
 
 2x,7 — x/a 
 
 If now we let ra = x and let a converge towards zero, in which case 
 a = clx, we evidently find that 
 
 dx r + =° 
 
 (P (x) dx = I * (co) e-"-""' do: 
 
 2-K J _ oo 
 
 is the infinitely .small probability that the sum of the elementary errors 
 from the s sources will fall in the infinitely small interval x — Hfix and 
 X + I4dx. 
 
 It is evident that by introducing a new function i^ (co), defined by the rela- 
 tion #(co) = \/2x'A(w), the above equation reduces to (5b) on page 195 if 
 we let <p(x) =f(x). 
 
 Also by writing 
 
 — co-r — C0+ — C0+... / — 
 
 ,, lL ^|2'^^13'^^---= V2x\Kco) 
 
 we obtain the general form of the frequency curve as shown on the bottom of 
 page 196. 
 
 Chapter XVI 
 
 Page 237. The footnote mentions McAlister as one of the earliest investi- 
 gators who used the geometric mean as the most probable value of the 
 observations. I find, however, that McAIister's work was anteceded by that 
 of Thiele and Gram. Thiele as early as 1866 used the geometric mean as the 
 most probable value in a series of estimates of the distances of double stars, 
 in a Danish monograph entitled IJndersogelse af Omldbsbevftgelsen i Dohheltstjer- 
 nenyslemet Gamma Virfjinis. (Investigation on the movements of rotation in 
 the double star system Gamma Vi giiiis.)
 
 THE MATHEMATICAL THEORY OF PROBABILITIES 
 
 By ARNE FISHER 
 
 ERRATA 
 
 Page 232, line 2: 
 
 For ki = Tik^ read ki = Viki^ 
 
 Page 237, lines 16 and 17 to read: 
 
 /-(-oo /^oo If log X — m \2 
 
 x'F{x)dx = {nyl2-K)-''Njx'e ^V » ' dx 
 
 — 00 O 
 
 /-(_oo _]/£— m\2 
 
 — 00 
 
 on the assumption that z or log x is normally distributed. 
 Page 252, Ime 13: +48 to read -48.
 
 ADDENDA. 289 
 
 Page 237. The integral in question may be evaluated as follows:— 
 Let t = {z — m) -.n, ornl -= z — m and ndl = dz. 
 Hence the integral may be written as 
 
 
 l)(nt + m) -'- 
 
 ]\J'g(r+ Dm 
 
 — 00 
 
 + CO 
 
 I- Dm p 
 
 /2 
 
 ntir + 1) - 2 
 V2? ' '^ ^ '^^ 
 
 + 00 
 
 ~V^' ' J ' dt 
 
 — <x> 
 
 If we now let [/ - n{r + 1)] = u, we have dt = du, and the last expression 
 reduces to 
 
 V2^ J e - t/a = A^e e 2 , since the 
 
 + 
 
 latter integral / ^ 2 ,_ 
 
 J e " f^M = V2ic
 
 Now in the press Ready for distribution about August 1, 1922. 
 
 An Elementary Treatise on 
 Frequency Curves 
 
 And their Application to the Construction 
 of Mortality Tables 
 
 By ARXE FISHER 
 
 English translation by E. A. VIGFUSSON 
 With an Introduction by Professor RAYMOND PEARL 
 
 Department of Biometry and Vital Statistics of 
 
 the Johns Hopiiins University 
 
 (Pp. 225 +XV) 
 
 This book falls into two parts of which the first gives an elemen- 
 tary presentation of the theory of frecjiiency finictions along similar 
 lines as those developed by ]Mr. Fisher in his book on Probabilities. 
 The second part, as pointed out in Mr. Vigfusson's preface, con- 
 stitutes an entirely new departure in the analysis of mortality 
 statistics. The author has set himself the difficult task to con- 
 struct a comjilete mortality table from mortuary records by sex, 
 attained age at death and causes of death, but u-ithout knowledge 
 of the exposed to risk at various ages. The accomplishment of this 
 problem has been made possible by means of a biological hypothesis 
 and a proper classification of the causes of death upon biologiccd 
 principles. Once accepted the proposed hypothesis will make it 
 possible to study the laws of human mortality in directions which 
 hitherto have been regarded as impossible. .Mr. Fisher has applied 
 his new method to more than 25 population or occupational groups 
 and gives in this book the detailed results of some of his investiga- 
 tions in the way of 6 complete mortality tal)les for Michigan 
 Males (lf)09-191o), Massachusetts Males (1914-191(5), American 
 Locomotive Engineers (1913-1917), American Coal Miners 
 (1913-1917), Japanese Assured Males (19U-1917) and White
 
 Industrial Assured Males of the Metropolitan Lite insurance Co. 
 (1911-1910). 
 
 As a systematic treatise on frequency curves and tiieir applica- 
 tion to mortality studies this book should prove of great practical 
 ^•alue not only to students of statistical methods, hut to actu- 
 aries, statisticians, health officers, biologists and students of 
 general science as well. 
 
 Comments of Specialists 
 
 "Orthodoxy aiul discovery .ire ;is incoinpatihle intellectually as oil and water are 
 physically, a cosmic law often overlooked by our "safe and sane" scientific gentry. 
 
 This book is an outstanding feature that this law is still in operation 
 
 It may fairly be regarded as finulameutalhj the most significant advance in actuarial 
 
 theory since Ilalley It opens out wonderful possibilities of research 
 
 on the laws of inortality in directions which hitherto have been wholly impossible 
 of attack. The criterion by which the significance of a new technique in any branch 
 of science is evaluated, is just this at tlie degree to which it opens up new fields of 
 research. By this criterion Fisher's work stands in a high and secure position." 
 (Extract from Professor Pearl's Introduction.) 
 
 "Fishers novel method has injected new blood in the old body of actuarial 
 science." {C. liurrau.) 
 
 "This new and novel idea meets in reality a very frequent need. It represents 
 a supplement to the former tools of the actuary and makes possible the utilization 
 of a statistical material, which according to the requirements of the older systems 
 was considered as being of no value." 
 
 (Extract from Forsikringstidendc's report of discussion in the Norwegian Ac- 
 tuarial Society, .lune, 1920.) 
 
 "Since particularly in industrial statistics, or in general statistical in<juiries under 
 war conditions, it is easier to obtain accurate data of deaths at ages than of ex[)osed 
 
 to risk the success of the method is encouraging The subject is one of 
 
 peculiar interest at the present time." 
 
 {Journal of the Royal Statistical Socirti/, London, I'JIS.) 
 
 From the thcorclical point of view the method of Fi.sher is interesting. His 
 proposal to decompo.se the mortality according to the dift'erent cau.ses of death is 
 entirely in conformity with the spirit of modern science which aims to analyze the 
 j)heii()mena by their difTerent ial jiarts. From the practical jxiint of view the method 
 is readily applied provided one has a double entry tal)le of the mortuary records by 
 age and cause of death. 
 
 {Bulletin de l' Association drs Actuaires Suisses. The Journal of the Association 
 of Smss Actuaries, 1010.)
 
 C, ' ■' 
 
 THE LIBRARY 
 UNIVERSITY OF CALIFORNIA 
 
 Santa Barbara 
 
 THIS BOOK IS DUE ON THE LAST DATE 
 STAMPED BELOW. 
 
 Series 9482
 
 UC SOUTHERN REGIONAL LIBRARY FACILITY 
 
 A 000 801 011 8