THE MATHEMATICAL THEORY OF PROBABILITIES ■y^y^- THE MACMILLAN COMPANY NEW YORK BOSTON CHICAGO DALLAS SAN FRANCISCO MACMILLAN & CO.. Limited LONDON BOMBAY CALCUTTA MELBOURNE THE MACMILLAN CO. OF CANADA. Ltd. TORONTO THE MATHEMATICAL THEORY OF PROBABILITIES AND ITS Al'PLKWTIOX TO FREQUENCY CURVES AND STATISTICAL METHODS ARNE FISHER. TRANSLATED FROM THE DANISH BY rilARLOTTE DICKSON, B.A. (COLUMBIA) MATHEMATieAL ASSISTANT IN THE DEPARTMENT OF DEVELOPMENT AND HK8EAKCH OF THE AMERICAN TELEPHONE AND TELEGRAPH COMPANY AND \villia:\i bonynge, b.a. (BELFAST) WITH INTRODUCTORY NOTES BY M. C. RORTY AND F. W. FRANKLAND, F.I.A., F.A.S, F.S.S. VOLUME I Mathematical Probabilities, P'requency Curves, Homogradk and Heterograde Statistics SECOND EDITION GREATLY ENLARGED NEW YORK THE MACMILLAN COMPANY 1922 Copyright, 1915 and 1922, By ARNE fisher Set up and electrotj-ped. Published November, 1915. Second Edition, greatly enlarged, May, 1922. r*IN1KI> IN THE IT.SlTEll STATES Of AMEBICA y^^ UNIVERSITY OF CALIFORNIA ^ 7_3 SANTA BARBARA COLLEGE LIBRARY /9J^^ 69545 V, f INTRODUCTORY NOTE TO THE SECOND EDITION. Mr. Fisher has requested that an introduction be written to this, the second edition of his work on probabilities, which shall indicate some of the practical applications of the mathematical theory with which his treatise deals. The writer has only a limited knowledge of mathematical technique — yet it has so happened that in twenty-five years of active work as engineer, statistician and executive he has had frequent occasion to call upon the skill of trained mathematicians for the solution of practical problems involving frequency curves and pi-obal)ilities. Among such mathematicians none has been more helpful, or quicker to perceive the possibility of making valuable applications of higher mathematics to business problems, than ]\Ir. Fisher himself. For this reason it is a duty as well as a privilege to outline, at his request, certain actual practical expe- riences with mathematical applications and to indicate such possible applications for the future. The writer's initial experience with frequency curves and probabilities was in the j-ears 1902 and 1903, when it became evident, in analyzing various problems in telephone traffic, that certain peak loads, which were superimposed upon the normal seasonal, weekly, and daily fluctuations, could be accounted for only by the laws of chance. Recourse was, therefore, had to the formulie then available for approximate summations of the terms of the binomial expansion, and from these a series of curves was drawn which indicated for an}' given normal hourly traffic (as indicated by studies of seasonal, weekly, and daily variations) the probability that any given short period load would be equalled or exceeded. Practical experience with these curves soon showed that, in spite of minor errors, they were close enough to the real facts to make them of primary importance in traffic studies of all kinds, and particularly in the development of mechanical switching de- vices. Their use for such purposes has now become a conmion- place in telephone engineering. As a by-product of the preceding application there have been other interesting uses of the same probability curves. Effective studies have been made of the decrease in the total stocks of small machine parts that could be made possible by standardizing and VI INTRODUCTORY NOTE TO THE SECOND EDITION. reducing the number of types of screws, bolts, nuts, etc. The curves can also be applicnl directly to every line of business and every type of operation where prompt service must be given and where the demand arises from a large number of independent sources, and is, therefore, subject to p(^ak loads determined by the laws of chance, which may be superimposed upon other "normal" peak loads varying with the days of the week, the hours of the day, etc. Entirely separate applications of frequency curves are those necessary in actuarial work. T'hese are relatively well known. But it is less generally known that one of the most important of business problems, that of depreciation, can be treated (effectively only when approached on an actuarial basis with a full under- standing of the fi-equency curves which govern the displacement, year b}^ year, of the physical units involved. A still further use of frequency curves and the theory of probabil- ities, which is of immediate practical importance, is in connection with sampling operations. The theory of sampling has already been well developed, but adequate efforts have not yet been made by mathematicians to reduce the processes of sampling to de- pendable simple rules that can be applied by business executives and statisticians untrained in higher mathematics. In census work, and in statistical and other reports made by business or- ganizations, the waste of money, that could be avoided by an intelligent application of the theory of sampling, is very great. Not only can many reports and analyses be made much more cheaply and quickly by sampling processes, but they can also be made more accurately. Many important items of information can be determined only by trained specialists. In such cases the only procedure, that does not involve prohibitive expense in large census operations, is to tie such items, by a sampling process, to other items which ai-e susceptible of exact enumeration by relatively unskilled enumerators, and then to compute the totals for the special items from the relations of such items to the items which are completely enumerated. All of the preceding are in the field of inmiediate practicalities. When we come to the future, one of the most promising uses of mathematics is in the d(>velopment of logical processes. It is not going too far to say that all business, and most engineering opera- tions are fundamentally based on probabilities. The business man is always dealing in degrees of uncertainty, and even the engineer INTRODUCTORY NOTE TO THE SECOND EDITION. VU has only occasionally a definite set of conditions ujion which to base his computations. Where the problem is primarily- a financial one, he must balance the cost of overbuilding agamst the cost of underbuilding; and, if he combines business judgment with en- gineering skill, he will multiply the amount of each possible loss by the probability of its occurring, and will ordinarily choose, among all possible plans, the plan which involves the minimum probable loss. Here it is not inappropriate to interject the idea that the most practical logic must always be in terms of probabil- ities, and that a logic which deals, or pretends to deal, in certainties only is not alone useless, but is also harmful and misleading, when difficult problems are to be approached. Such problems can rarely, if ever, be solved except through the cumulation toward a certainty of many small probabilities established from un corre- lated, or only partialh-^ correlated, viewpoints. A final suggestion which is to-day speculative, but may assume important practical aspects in the near future, is with respect to the applications of frequency curves and probabilities to phj-sical and cosmic mathematics. In such mathematics we are forced to assume that all of our measures must arise out of the things meas- ured. When we deal with physical velocities, it would seem that our only measures of velocity can arise out of the velocities them- selves. Similar considerations hold true with respect to funda- mental measures of physical extension. Under these circum- stances we may talk in terms of infinite space and of infinite time, but we can hardly talk in terms of infinites when we are dealing with the dimensions of atomic structure and the velocities of material particles. In these cases it seems very highly probable that we are dealing with frequency distributions which we must measure and define in terms derived from such distributions them- selves. With respect to such measures some of our frequency curves may have hifinite "tails," but it is more proljable that the frequency forms are such that they can be completely defined in finite terms. Along this same line, we may even risk a closing speculation that the relative proportions of organized matter and space in the stellar universe are determined through the opera- tions of the laws of chance in establishing heterogeneities in what is otherwise a homogeneous void-filling medium. M. C. RORTY. New York City, March 2, 1922. PREFACE TO THE SECOND EDITION. At the time when the first edition of this little book was published in 1916, I ex})ected to issue a second volume shortly after, dealing with fre(|uency curves and frequency surfaces as well as the re- lated problem of co-variation (correlation). The manuscript for this volume was completed and i^rinting had already connnenced on some of the; chapters, when a series of misfortunes, not neces- sarily unexpected, oven-took the work. A major part of the manu- script while in transit to a friend in Denmark for review and cor- rections went down with a Danish vessel when torpedoed by an outlaw German submarine. A duplicate copy was for some i-eason or other withhc^ld by the British rnilitarj^ censor and not i-eturned to the writer until lonp; after the termination of the world war. My third and final copy of the manuscript, which I had submitted to an American friend for critical review was also lost in transit. The veritable nemesis which seems to have followed my efforts is, however, only a verification of the all prevailing laws of chance, which ever}^ serious minded student must face with unperturbed attitude. In fact, the al)ove misfortunes have, after all, only made me more determined to complete another collection of notes, which I eventually hope to put into proper shape for publication. In the meantime the first edition has been out of print for more than two years, and when the publisher asked me to prepare a new edition I took advantage of this opportunity to add several chapters on frequency functions and their application to het- erograde statistical series so as to give a complete treatment of statistical functions involving one variable. The book is, there- fore, twice its original size and contains the major part of what I originally intended for a second volume. The reader will readily notice that my treatment of the subject is based throughout upon the principles of the classical prol^ability theory as founded by Bernoulli, De JMoivre and above all by the great Laplace and his disciple, Poisson, I am of the opinion that these principles and their further extension by the Scandinavian statisticians and actuaries. Gram, Thiele, Westergaard, Charlier, Wicksell and Jorgensen, offer as yet the best and also the most powerful tools for the treatment of collected statistical data by means of mathematical methods. In the way of admnbration and X PREFACE TO THE SECOXI) KDITIOX. economy of thouiilit the Laplaccan methods stand unsurpassed in the whole reahn of math(Miiatical statistics. I have, therefore, in this volume limited nn' investi<>;ations to a systematic treat- ment along these lines. I hope, however, in the forthcoming second volunu^ to treat the methods of Pearson, Edgeworth, Kapteyn, Bachelier and Knibbs and show tlunr inflation to La- place's theory. Th(» reason why the Laplacean doctrine^ of freciuency curves lias l)een ignored until comixiratively recent years and has remained more or h^ss obscure is })erhaps tlue to the fact that for moi-e than a cent my it remained a theory Jiure and simple and was used l)ut sparingly in pi-actical calculations. Any statistical theory, in order to be of use in practical work, must be arranged in such a manner that it is readily adaptable to numerical compulations. Advanced mathematical computation has not been given its due reward and proper att(^ntion in our ordinaiy academic instruction. A high grade mathematical computer is indeed a ''rare bird," nmch more so in fact than a good mathematician. To arrange and plan the numerical work in connection with the theoretical fonnulae so that the detailed and painstaking work is reduced to a mininmm, and at the same time afford the pi'oper means for checking and counterclu^cking, is by no means an easy task and often requires as much ingenuity as the actual development of the theoretical fornmhip. While Gauss has always been acknowledged as one of the world's gr(^at(»st com- puters and in addition to his extensive work in pure mathematics also did much practical work in surveying, physics, and in financial and actuarial investigations, Laplace during his entire career remained a pure mathematician and apparently failed to grasp the paramount attributes required by a successful (computer. His attempt to inject himself into pul)lic life, as for instance when he secured for himself an a])pointment as minister of the interior, must be regarded as a dismal failure as admitttnl in Najjoleon's memorandum on his dismissal. The failure of Laplace to recognize fully the all-important phase of numerical computations in all observations on statistical mass phenomena is in my opinion the main reason why the Gaussian theoi\- of observations and the allied subject on the theory of least squai-es has hitherto supj^lanted the admittedly superior theory of the great Frenchman. Gauss in addition to his theory furnished an essentially useful and elegant method for performing the nec(>s- PREFACE TO THE SECOND EDITION. XI sary nuinericul (;ul(;ulat,ions, while Laplace left this decidedl}^ important aspect out of consideration altogether. It remained in reality to C'hai-iier to fiunish the Laplacean doctrine with a prac- tical method for comi)uting the various statistical parameters. And in the meantime the Gaussian methods reigned supreme while Laiila('(^'s gi-eat woi'k was neglected. The car(>ful rc^ader will readily notice that in the treatment of frequency cnives I have allowed the semi-invariants, originally introduc(xl in the theory of statistics by Thiele, to occupy a central position. In my opinion the semi-invariants represent a more powerful tool than the method of moments. I have also tried to rescue from oblivion the important and original memoir by the Danish actuary, Gram, and give to him and the French math- ematician, Ilermite, their due recognition as the earliest investi- gators of skew frequency distributions. Gram was perhaps the first investigator to make proper use of the orthogonal functional properties of the Lai)lacean normal frequency curve and its deriva- tives. By means of an application of the orthogonal properties of the Hermite polynomials and their close relation to the theory of integral equations, tlu^ \\\\o\v theory of frequency distribution can be presented in a d(>('idedly compact form; and I deem no apology necessary for having introduced in my treatment of frequency curves some of tlu^ more elementary theorems of integral equations, that youngest branch of higher analysis, which at present occupies a central position in advanced mathematics. The most recent investigations along those lin(\s have been made by the Swedish astronomer, Charlier, and his disciples, Jorgensen and Wicksell. Unfortunately these investigations have hitherto not received adequate and systematic treatment in Eng- lish and American texts on statistics, and it is my hope that the following pages may be of service in opening the eyes of English speaking statisticians to the practical utility of these methods. The examples have all been selected so as to give a complete and detailed illustration of the application of the theory to essentially practical problems. I hav(>, on the other hand, purposely refrained from giving the customary exercises, so-called, usually found in statistical texts, especially those in German and English. Although I have been a close student of and have read most of the published statistical text-books in about seven languages for the last ten j^ears, I regret to state that I have found little or no XU PREFACE TO THE SECOND EDITION. practical \a\uv in such trick exei'cises, which as a rule have but slight relation to problems occurring in daily life. Since lh(^ appearance of the first edition of this book in 1916 a nunil)er of exc(>llent statistical texts have been issued. Among these I may mention a new edition of Yule's well-known ele- mentary text, a greatl}' enlarged (xlition of Bowle3''s FAements of Statistics, the new treatise by Caradog Jones, an enlarged German translation of Charlier's Grunddragen, a very lucid Swedish text by Wicksell, the scholarly and broadly planned Statistikens Teori i Grundrids (in Danish) by Westergaard, and last but not least, the thesis b}' Jorgensen, Frekvensflader og Korrelaiion} Although an extended residence in the United States has per- haps imi^roved my barbaric Dano-English, I fear that I must still apologize to the reader for my shortcomings in rhetoric and gram- mar. Most of the serious defects have, I hope, been overcome by the diligent efforts of my co-editor and translator. Miss C. Dickson, mathematical assistant in the department of Development and Research of the American Telephone and Telegraph Company, jNIiss Dickson's work has indeed been much beyond that of mere translation. Her knowledge of the mathematical theory of prob- abilities has enabled her to suggest to me several improvements in my Danish notes. I am also under great obHgations to a number of friends and colleagues who have assisted me in the preparation of this volume. I am especially indebted to Mr. E. C Molina, the well-known jirobability expert of the American Telephone and Telegraph Company. Mr. Molina's extensive knowledge of the works of the old French masters, especially of those of Laplace, has been of the greatest value to me, and I can truthfully say that I have nowhere met a mathematician so thoroughly acquainted with the intricacies of the Theorie Anahjtiquc as Mr. Molina. My thanks are also due to Mr. F. L. Hoffman, the Statistician of the Prudential Insurance Compan}^, for the interest he took in ni>- work along those lines while I was employed as a computer in his department. To Messrs. M. C. Rorty and D. R. Belcher of the Ameiican Telephone and Telegraph Company, I beg leave to ' As a pure probability text we may mention G. Castelnuovo's, Calculo delle Prohabilita (Milano, 1919), as an exceptionally lucid and rifz;oi-ou3 treatise. The recently issued Treatise on Prohnhilily by J. M. Keynes is briefly discussed in paragraph 138 of this book. A. F. PREFA{ E TO THE SECOND EDITION. XIU express my best thanks for th(ur kind advice and encouragement in the preparation of this volume. It is indeed impossible to adequately express in a mere formal preface my ol)ligations to Mr. Ilorty in this matter. His introduc- tory note I regard as one of the highest rewards I have received in this field of endeavor where one must usually be content with the appreciation of one's peers. In this connection it is of interest to note that Mr. Rorty is the pioneer investigator in the application of the mathematical theory of probabilities to telephone engineer- ing, which has been further developed in recent years by Molina of America, Erlang and Johannsen of Denmark, Holm of Sweden, Odell and Clrinsted of Great Britain. The pioneer work by Mr. Rorty in this eminently practical fi(>ld antedates the earliest work by L^rlang in Tidsskrift for Matemalik by nearly five years. Last, but not least, I wish to convey my sincerest thanks to my Scandinavian compatriots, Westergaard, Charlier, Jorgensen, Wicksell and Guldberg from whose works I have drawn so freely. To these gentlemen and to the works of the late Messrs. Gram and Thiele of Copenhagen I really owe anything of value which may be contained in this work. Arne Fisher. New York, April, 1922. INTRODUCTORY NOTE TO THE FIRST EDITIOX. I feel it a great honor to have been asked by my friend and colleague, Mr. Arne Fisher, of the Equitable Life Assurance Society of the United States, to write an introductory note to what appears to me the finest book as yet compiled in the English language on the subject of which it treats. As an Examiner myself in Statistical ^Method for a British Colonial Government, it has been to me a heart-breaking experience, when implored by intending candidates for examination to recommend a text-book dealing with ]\Ir. Fisher's subject matter, that it has heretofore been impossible for me to recommend one in the English language which covers the whole of the ground. Until comparatively recent years the case was even worse. While in P^rench, in Italian, in German, in Danish, and in Dutch, scientific works on statistics were available galore, the dearth of such hterature in the English language was little short of a national or racial scandal. With such works as those of Yule and Bowley, in recent years, there has been some possibility for the English-speaking student to acquire part of the knowledge needed. But it is harflly necessary to point out what a very large amount of new ground is covered by Mr. Fisher's new book as compared with such works as I have referred to. Despite my professional connection with statistical and actu- arial work of a technical character my own personal interest in ]\Ir. Fisher's book is concentrated principalh' on the metaphysical basis of the Probability-theory, and it is with regard to this aspect of the subject alone that I feel qualified to comment on his achievement. With all the controversy that has gone on through many decades among metaphysicians and among writers on logic interested especially in the bases of the theories of probability and induction, between the pure empiricists of the type of J. S, ]\IilI and John Venn (at all events in the earliest edition of his work) on the one hand, and the (partly) a priori theorists who base their doctrine on the foundation of Laplace on the other hand, it has XVi IXTRODUCTORY XOTE TO THE FIRST EDITION. been a source of intense satisfaction to me, as in the main a dis- ciple of the latter group of theorists, to note the masterly way in which Mr. Arne Fisher disentangles the issues which arise in the keen and sometimes almost embittered controversy between these two schools of thought. It has always seemed to the present writer as if the very foundations of Epistcmology were involved in this controversy. The impossibility of deriving the corpus of human knowledge exclusively from emj)irical data by any logic- ally valid process — an impossibility which led Immanuel Kant to the creation of his epoch-making ])hilosophical system — is hardly anywhere made more evident than in what seems to the present writer the unsuccessful effort of thinkers like John Venn to derive from such purely empirical data the entire Theory of Probability, The logical fallacy of the process is analogous to that perpetrated by John Stuart ^Nlill in endeavoring to base the Law of Causality on what he termed an " induct io per simpliceni enumerationemy Probably there is nowhere a more trenchant and conclusive exposure of the unsoundness of this point of view, than in the Right Honorable Arthur James Balfour's monu- mental work "A Defense of Philosophic Doubt." It is there- fore satisfactory to find that IVIr. Fisher emphasizes, quite at the beginning of his treatise, that an a priori foundation for " Proba- bility " judgments is indispensable. Hardly less gratifying, from the metaphysical point of view, is Mr, Fisher's treatment of the celebrated quaestio vexata of Inverse Probabilities and his qualified vindication of Bayes' Rule against its modern detractors. Aside altogether from metaphysics, it is particularly satis- factory to note the full and clear way in which the author treats the Lexian Theory of Dispersion and of the "Stability" of sta- tistical series and the extension of this theory by recent Scandi- navian and Russian investigators, — a branch of the science which has till the ajjpca ranee of this new work not been adequately covered in English text-books. It may of course be a moot question whether the preference given by our author to Charlier's method of treating " Frequency Curves" over the method of Professor Karl Pearson is well advised. But whatever the experts' verdict may be on debatable INTRODUCTORY NOTE TO THE FIRST EDITION. XVll questions like these, the scientific world is to be congratulated on INIr. Fisher's presentment of a new and sound point of view, and he emphatically is to be congratulated on the production of a text-book which for many years to come will be invaluable both to students and to his confreres who are engaged in extending the boundaries of this fascinating science. F. W. Frankland, Member of the Actuarial Society of America, Felloiv of the Institute of Actuaries of Great Britain and Ireland, and Felloiv of the Royal Statistical Society of London. New York, October 1, 1915. PREFACr: TO THE FIRST EDITION. " Probal)ility " lias lono- ago ceased to be a mere theory of games of cliance and is everywhere, esi)ecially on tlie continent, regarded as one of the most important branches of api)Hed mathematics. This is proven by the increasing number of standard text-books in French, German, Itahan, Scan(Hnavian and Russian which have appeared during the hist ten years. 1 )uring this time the research work in the tlieory of probabihties lias receive theory of skew curves of error. As recently as 1905 Charlier finally showed that the whole theory of errors or frequency curves may be brought back to the principles of Laplace. I have treated this PREFACr: TO THE FIRST EDITION. XXI subject by the methods of both PearsoM and Charlier, although I have given tlie methods of t!ie hitter a ])re heterograde groups. In treating the ])hil()S()i)hical side of the subject I have naturally not gone into much detail. However, I have tried to emphasize the two diametrically o})posite standpoints, namely the principle of what von Kries has called the principle of "cogent reason," and the principle which Boole has aptly termed "the equal distribution of ignorance." These two principles are clearly illus- trated in the case of the so-called inverse probabilities. As far as pure theory is concerned, the theory of "inverse probabilities" is rigorous enough. It is only when making practical applications of the rule of inverse probabilities (the so-called Bayes' Rule) that many writers have made a fatal mistake by tacitly assuming the principle of " insufficient reason " as the only true rule of com- putation. Thisleads to paradoxical results as illustrated by the practical problem from the region of actuarial science in Chapter VI in this book. In a work of this character I ha\e naturally made an extended use of the higher mathematical analysis. However, the reader wdio is not versed in these higher methods need not feel alarmed on this account, as the elementary chapters are arranged in such a way that the more difficult paragraphs may be left out. I have in fact divided the treatise into two separate parts. The first part embraces the mathematical probabilities proper and their applications to homograde statistical series. This i)art, I think, constitutes what is usually given as a course in vital statistics in many American colleges, I hardly deem it worth while to give a XXll PREFACE TO THE FIRST EDITIOX. detailed discussion on the collection and arrangement of the sta- tistical data as to various frequency distributions. The mere graphical and serial representation of frequency functions by means of histographs and frequency columns is so sim])le and evident that a detailed description seems superfluous. The fitting of the various curves to analytical formulas and the determination of the various parameters seem to me of much greater impor- tance. The theory of curve fitting which is treated in the second volume is founded upon a more advanced mathematical analysis and is for this reason out of reach to the average /American student who desires to learn only the rudiments of modern statistical methods. Practical statisticians, on the other hand, will derive much benefit from these higher methods. It is a fact generally noted in mathematics that the practical application of a difficult theory is much simpler than that of a more elementary theory. This is amply proven by the appearance of an excellent little Scandinavian brochure by Charlier: "Grunddragen af den mate- matiske Statistikken." ("Rudiments of Mathematical Statis- tics.") I have always attempted to adapt theory to actual practical problems and requirements rather than to give a purely mathematical abstract discussion. In fact it has been my aim to present a theory of probabilities as developed in recent years which would ])rove of value to the practical statistician, the actuary, the biologist, the engineer and the medical man, as well as to the student who studies mathematics for the sake of mathematics alone. The nucleus of this work consisted of a number of notes written in Danish on various aspects of the theory of })robabilities, col- lected from a great number of mathematical, philosophical and economic writings in various languages. At the suggestion of my former esteemed chief, Mr. H. W. Robertson, P\A.S., As- sistant Actuary of the Equitable Life Assurance Society of the United States, I was encouraged to collect these fragmentary notes in systematic form. The rendering In l-'nglish was done by myself personally with the assistance of ^Ir. W. Bonynge. With his assistance most of the idiomatic errors due to my barbaric Dano-English have been eliminated. The notes stand, however. In tli(> main as a faithful reproduction of my original PREFACK TO THE FIRST EDITIOX. XXUl English copy. Although the resulting " Duno-English " may- have its great shortcomings as to rlietoric aiul grammar, I hope to have succeeded in exj)ressing what I wanted to say in such a manner that my possible readers may follow me without difficulty. I gladly take the opi)()rtunity of expressing my tjianks to a number of friends and colleagues who in various ways have as- sisted me in the preparation of this work. ]My most grateful thanks are due to ]Mr. F. W. Frankland, Mr. II. \V. llobertson and Mr. Wm. Bonynge not only for reading the manuscript and most of the proofs, but also for the friendly help and encourage- ment in the completion of this volume. The introductory note by Mr. Frankland, coming from the pen of a scholar who for the most of a life-time has worked with statistical-mathematical subjects and who has taken a special interest in the i)hilosophical and metaphysical aspects of the probability theory, I regard as one of the strong points of the book. My debts to Messrs. Frankland and Robertson as well as to Dr. W. Strong, Associate Actuary of the Mutual Life Insurance Company, are indeed of such a nature that they cannot be expressed in a formal preface. My thanks are also due to Mr. A. Pettigrew in correcting the first rough draught of the first three chapters at a time when my knowledge of English was most rudimentary, to ]\Ir. ]\I. Dawson, Consulting Actuary, and ^Ir. R. Henderson, Actuary of the Equit- able Life, for reading a few chapters in manuscript and making certain critical suggestions, to Professors C. Grove and W. Fite, of Columbia University, for lumierous technical hints in the working out of various mathematical formulas in Chapter VI, to Miss G. Morse, librarian of the Equitable Library, in the search of certain bibliographical material. Last but not least I wish to express my sincerest thanks to several of my Scandinavian com- patriots for allowing me to quote and use their researches on various statistical subjects. I want in this connection especially to mention Professor Charlier, of Lund, and Professors Wester- gaard and Johannsen, of Copenhagen. To The ^lacmillan Company and The New Era Printing Com- pany I beg leave to convey my sincere appreciation of their very courteous and accommodating attitude in the manufacture of XXIV PREFACE TO THE FIRST EDITION. this work. Their spirit has been far from commercial in this — from a pure business standpoint — somewhat doubtful under- taking. Arne Fisher. New York, October, 1915. TABLE OF CONTENTS. PART I. MATHEMATICAL PROBABILITIES AND IIOMOGRADE STATISTICS. Chapter I. Introduction: General Principles and Philosophical Aspects. 1. Methods of Attack ^""^^ 2. Law of C'ausality , 3. Hyi)othoticaI Ju(lj:;niont.s o 4. Hj'pothofical Disjunctive .ludgnients 4 5. General Definition of the Probability of an Event .5 6. Equally likely Cases g 7. Objective and Subjective Probabilities o Chapter II. Historical and Bibliographical Notes. S. Pioneer Writers ■,-. 9. Bernoulli, de Moivre and Bayes j9 10. Application to Statistical Data jo 11. Laplace and Modern Writers -.a Chapter III. The Mathematical Theory of Probabilities. 12. Definition of Mathematical Probability jy 13. Example 1 lo 14. Example 2 20 15. Example 3 20 16. Example 5 22 17. Example G 23 Chapter IV. The Addition arid Multiplication Theorems in Probabilities. 18. Systematic Treatment by Laplace 26 19. Definition of Technical Terms ^a 20. The Theorem of the Compl(>te or Total Probability, or the Proba- bility of "Either Or" r,j 21. Theorem of the Comi)ound Probability or the Probability of "As Well As" 28 22. Poincarc's Proof of the Addition and Multiplication Theorem 30 23. Relative Probabilities gj 24. Multiplication Theorem 32 25. Probabilit}' of Repetitions 23 1* XXV XXVI TABLE OF CONTENTS. 26. Application of the Addition and Multiplication Theorems in Problems in Probabilities 35 27. Example 12 35 28. Example 13 36 29. Example U 37 30. Example 1.5 37 31. Example 16 38 32. E.xample 17 39 33. Example IS. De Moivre's Problem 40 34. Example 19 42 35. Example 20. Tchebycheff 's Problem 46 Chapter V. Mathematical Expectation. 36. Definition, ]Mean Values 49 37. The Petrograd (St. Petersburg) Problem 51 38. Various Explanations of the Paradox. The Moral Expectation. ... 51 Chapter VI. Probability a Posteriori. 39. Bayes's Rule. A Posteriori Pr<)l)abilities 54 40. Discovery and History of the Rule 55 41. Bayes's Rule (Case I) 56 42. Bayes's Rule (Case II) 59 43. Determination of the Probabilitic^s of Future Events Based upon Actual Observations 59 44. Examples on the Application of Bayes's Rule 61 45. Criticism of Bayes's Rule 02 46. Theory versus Practice 64 47. Probabilities expressed by Integrals 67 48. Example 24 70 49. Example 25. Bing's Paradox 72 50. Conclusion 76 Chapter VII. The Law of Large Numbers. 51. A Priori and Empirical Probabilities 82 52. Extent and Usage of Both Methods 85 53. Average a Priori Probabilities 87 54. The Theory of Disjjcrsion 88 55. Historical Development of the Law of Large Numbers 89 Chapter VIII. Introductory Formulas from the Infinitesimal Calculus. 56. Special Integrals 90 57. Wallis's Expression of tt as an Infinite Product 90 58. De Moivre — Stirling's Formula 92 TABLE OF CONTENTS. XXvii Chapter IX. Law of Large Numbers. Mathematical Deduction. 59. Repeated Trials 96 60. Most Probable Value 97 61. Simple Numerical Examples 97 62. The Most Probable Value in a Series of Repeated Trials 99 63. Approximate Calculation of the Maximum Term, T,,, 101 64. Expected or Probable Value 102 65. Summation Method of Lajjlace. The Mean Error 104 66. Mean Error of Various Algebraic Expressions 106 67. Tchebycheff's Theorem 108 68. The Theorems of Poisson and Bernoulli proved by the Application of the Tchebycheffian Criterion 110 69. Bernoullian Scheme 110 70. Poisson's Scheme Ill 71. Relation between Empirical Frequency Ratios and Mathematical Probabilities 114 72. Application of the Tchebycheffian Criterion 115 Chapter X. The Theory of Dispersion and the Criterions of Lexis and Charlier. 73. Bernoullian, Poisson and Lexis Series 117 74. The Mean and Dispersion 118 74a. Mean or Average Deviation 122 75. The Lexian Ratio and Charlier Coefficient of Disturbancy 124 Chapter XI. Application to Games of Chance and Statistical Problems. 76. Correlate between Theory and Practice 127 77. Homograde and Heterograde Series. Technical Terms 128 78. Computation of the Mean and the Dispersion in Practice 130 79. Westergaard's Experiments 136 80. Charlier's Experiments I37 81. Experiments by Bonynge and Fisher 141 CHAPTER XII. Continuation of the Application of the Theory of ProbdbUities to Homograde Statistical Series. 82. General Remarks 146 83. Analogy between Statistical Data and Mathematical ProbabiUties . . 147 84. Number of Comparison and Proportional Factors 149 85. Child Births in Sweden 151 86. Child Births in Denmark 152 XXV (11 TABLE OF CONTENTS. 87. Danish Marriage Series 153 88. Stillbirths 154 89. Coal Mine Fatalities 155 90. Hcdiieod and Weighted Series in Statistics 157 91. Secular and Periodical Fluctuations 161 92. Cancer Statistics 165 93. Application of the Lexian Dispersion Theory in Actuarial Science. Conclusion 167 PART II. FREQUENCY CURVES AND HETEROGRADE STATISTICS Ch.vpter XIII. The Theory of Errors and Frequency Curves and Its Application to Statistical Series. General Remarks. 94. General Remarks. The Hypotheses of Elementary Errors 169 95. Application to Statistical Series. Definitions 173 96. (yoni[)ound Frequency Curves 176 97. Early Writers 178 98. Laplace and Gauss 179 99. (^uetelet's Studies 181 100. Opperman, Gram, and Thiele 182 101. Modern Investigations 184 Chapter XIV. The Mathematical Theory of Frequency Curves. 102. Frc(iuency Distributions 188 103. Parameters Considered as Symmetric Functions 189 104. Semi-Invariants of Thiele 191 105. '^l'h(> I'\)uri(>r Integral Ecjuation 194 106. Frequency Function as the Solut ion of an Integral Equation 195 107. The Normal or Laplaccan Proba!)ility Function 197 108. Hermite's Polynomials 199 109. Orthogonal Functions 200 110. The Frequency Function Expressed as a Series 202 111. Derivation of Gram's Series 203 112. Absolute Frequencies 206 113. Cocflicients Expres,scd by Semi-Invariants 208 11 1. ( liaiigc of Origin and Unit 210 TABLE OF CONTENTS. Xxix PART in. PRACTICAL APPLICATIONS OF THE THEORY. Ciiaptp:k XV. The Numerical Determination of the Parameters. 115. CU'iU'ial Remarks 215 1 IG. Remarks on Cri( ieisms 216 117. Charlier's Computation Scheme 218 118. Comparison between Observed Data and Theoretical Values 220 119. Principle of Method of Least Squares 221 120. Gau.ss' Solution of Normal Equations 224 121. Arithmetical Ai)pIication of Method 225 Chapter XVI. Logarithmically Transformed Frequency Functions. 122. Transformalion of the Variate 235 123. The General Theory of Transformation 236 124. Lo^aritlimic; Transformation 237 125. The Mathematical Zero 238 125. Logarithmically Transformed Frequency Series 239 127. Parameters DeterminiMl by Least Squares 243 128. Application to Graduation of Mortality Tables 244 129. Formation of Observation Equations 246 130. Additional Examples ; 257 ClIAlTKlt Xv'H. Frequency Curves and their Relation to the Bernoullian Series. 131. The Bernoullian Series 261 132. Poisson's Exponent ial 265 133. The Law of Small Numbers 270 Chapter XVIII. Poisson-Charlier Frequency Curves for Integral Variates. 134. Charlier's B Curve 271 135. Numerical Examples 273 136. Transformation of the Variate 274 137. Bernoullian Series (>xpressed as B Curves 27t) 138. Remarks on Mr. Kevnes' Criticisms 278 PART I MATHEMATICAL PROBABILITIES AND HOMOGllADE STATISTICS C'lIAITKR I. INTRODUCTION: GENERAl, 1MU.\( Il'l.KS AND PHILOSOPHICAL ASPIX'TS. 1. Methods of Attack. The subjrct of the tlieory of proba- bilities may be attacked iii two dilfereiit ways, namely in a philosophical, and in a mathematical manner. At first the subject originated as isolated mathematical ])roblems from games of chance. The pioneer writers on ])robability such as Cardano, Galileo, Pascal, Fermat, and Iluyghens treated it in this way. The famous Bernoulli was, perhaps, the first to view the subject from the philosopher's point of \iew. Laplace wrote his well- known "Essai Philosophiciue des Probabilites," wherein he terms the whole science of i)r()bability as the ai)j)lication of common sense. During the last thirty years numerous eminent jjliilo- sophical scholars such as Mill, Vemi, and Keynes of England, Bertrand and Poincare of France, Sigwart, von Kries and Lange of Germany, Kroman of Denmark, and several Russian scholars have written on the ])hilosophical aspect. In the ordinary presentation of the elements of the theory of probability as found in most English text-books, the treatment is wholly mathematical. The student is given the definition of a mathematical probability and the elementary theorems are then proved. ^Ye shall, in the following chapter, depart from this rule and first view the stibject, briefly, from a philosophical standpoint. What the student may thus lose in time we hope he may gain in obtaining a broader view of the ftmdamental principles underlying our science. At the same time, the reader who is unacquainted with the science of philosophy or pure logic, need not feel alarmed, since not even the most elementary knowledge of the principles of formal logic is required for the understanding of the following chapter. 2. Law of Causality. — In a great treatise on the Chinese civiliza- tion, Oscar Peschel, the German geographer and philosopher, makes the following remarks: "Since otir intellectual awakening, since we have appeared on the arena of history as the creators 2 1 ^ INTRODUCTION. [2 and guardians of the treasures of cidture, we have sought after only one thhig, of the presence of which the Chinese had no i(k'a, and for which they would give hardly a bowl of rice. This invisible thing we call causality. \Ye have admired a vast nund)cr of Chinese inventions, but even if we seek through their huge treasures of philosophical writing we are not indebted to them for a single theory or a single glance into the relation between cause and effect." The law of causality may be stated broadly as follows: Every- thing that happens, and everything that exists, necessarily happens or exists as the consequence of a previous state of things. This law cannot be proven. It must be taken, a priori, as an axiom; but once accepted as a truth it does away with the belief of a capricious ruling power, and even if the strongest disbeliever of the law may deny its truth in theory he invariably applies it in practice during his daily occupation in life. All future human activity is more or less influenced by past and present conditions. Modern historical writings, as for instance the works of the brilliant Italian historian, Ferrero, always seek to connect past events with ])resent social and economic conditions. Likewise great and constructive statesmen in trying to shape the destinies of nations always reckon with past and present events and conditions. We often hear the term, "a man with foresight," applied to leading financiers and states- men. This does not mean that such men are gifted with a vision of the future, but simply that they, with a detailed and thorough knowledge of past and present events, associated with the par- ticular undertaking in which they are interested, have drawn conclusions in regard to a future state of affairs. For example, when the Canadian Pacific officials, in the early eighties, chose Vancouver as the western terminal for the transcontinental railroad, at a time when practically the whole site of the present metropolis of western Canada was only a vast timber tract, they realized that the conditions then i)r('\"ailing on lliis particular spot — the excellent shipping facilities, the favorable location in regard to the Oriental trade, and the natural wealth of the sur- rounding country — would bring forth a great city, and their predictions came true. 3] HYPOTHETICAL JUDGMENTS. 3 Predictions with regard to the future must be taken seriously only when they are based upon a thorough knowledge of past and present events and conditions. Prophecies, taken in a purely biblical sense of the term and viewed from the law of causality, are mere guesses which may come true and may not. A prophet can hardly be called more than a successful guesser. Whether there iuive been persons gifted with a purely prophetic vision is a question which must be left to the theologians to wrangle over. 3. Hypothetical Judgments. — Any person with ordinary in- tellectual faculties may, however, predict certain future events with absolute certainty by a simple application of the principle of hypothetical judgment. The typical form of the hypothetical judgment is as follows: If a certain condition exists, or if a certain event takes place then another definite event will surely follow. Or if A exists B will invariably follow. iNIathematical theorems are examples of hypothetical judg- ments. Thus in the geometry of the plane we start with certain ideas (axioms) about the line and plane. From these axioms we then deduce the theorems by mere hypothetical judgments. Thus in the Euclidian geometry we find the axiom of parallel lines, which assumes that through a point only one line can be drawn parallel to another given line, and from this assumption we then deduce the theorem that the sum of the angles in a triangle is 180°. But it must be borne in mind that this proof is valid only on the assumption of the actual existence of such lines. If we could prove directly by logical reasoning or by actual measurement, that the sum of the angles in any triangle is equal to 180°, then we would be able to prove the above theorem, the so-called "hole in geometry," independently of the axiom of parallel lines. A Russian mathematician, Lobatschewsky, on the other hand, assumed that through a single point an infinite number of parallels might be drawn to a previously given line, and from this as- sumption he built up a complete and valid geometry of his own. Still another mathematician, Riemann, assumed that no lines were parallel to each other, and from this produced a perfectly valid surface geometry of the sphere. 4 INTRODUCTION. [4 As exam])les of hypothetical judj;inent we have the twia follow- ing well-known theorems from elementary geometry and algebra. If one of the angles of a triangle is divided into two parts, then the line of division intersects the opposite side. If a dccadian number is dixided by ') there is no remainder from the division. In natural science, hy])()thetical judgments are founded on certain occurrences (phenomena) which, without exception, have taken place in the same manner, as shown by repeated obser- vations. The statement that a suspended body will fall when its supi)()rt is removed is a hy])othctical judgment derived from actual ex])crieiice and o})servati()n. 4. Hypothetical Disjunctive Judgments. — In hypothetical judgments we arc always able to associate cause and effect. It happens frequently, however, that our knowledge of a certain comj^lcx of present conditions and actions is such that we are not able to tell beforehand the resulting consequences or effects of such conditions and actions, but are able to state only that either an event A\ or an event E2, etc., or an event En will happen. This represents a hypothetical disjunctive judgment whose typical form is: If A exists either Ei, E-i, E3, • • • or En will happen. If we take a die, ?'. e., a homogeneous cube whose faces are marked with the numbers from one to six, and make an ordinary throw, we are not able to tell beforehand which side will turn up. True, we have here again a previous state of things, but the conditions do not allow such a simple analysis as the cases we have hitherto considered under the ])urely hypothetical judgment. Here a multitude of causes influence the final result — the weight and centre of gravity of the die, the infinite number of possible movements of the hand which throws the die, the force of contact with which the die strikes the table, the friction, etc. All these causes are so complex that our minds are not allordcd an Op_ ])ortunity to gras]) and distinguish the im])ulses that determine the fall of the die. In other words we are not able to say, a ])riori, which face will apj)ear. We onl>' kiu)W for certain that either 1, 2, 'A, 4, ."), or (> will appear. If a line is drawn through the vertex of a triangle, it either intersects the opposite side or it does not. If a number is divided bv 5 the division either gives 5] GENERAL DEFINITION OF PROBABILITY OF AN EVENT. 5 only an integral number or leaves a remainder. If an opening is made in the wall of a vessel partly filled with water, then either the water escapes or remains in the vessel. All the above cases are examples of hyi)otlK'tical disjunctive judgments. The four cases show, however, a common characteristic. They all have a certain partial domain, where one of the mutually exclusive events is certain to happen, while the other partial domain will bring forth the other event, and the total area of action embraces both events. Taking the triangle, we notice that the lines may pass through all the points inside of an angle of 300°, but only the lines falling inside the internal vertical angle, ip, of the triangle will produce the event in question, namely the line intersecting the opposite side. There will be an outflow from the vessel only if the hole is made in that part of the wall which is touched by the fluid. All problems do not allow of such simple analysis, however, as will be seen from the following example. Suppose we have an urn containing 1 white and 2 black balls and let a person draw one from the urn. The hypothetical disjunctive judgment immediately tells us that the ball will be either black or white, but the particular domain of each event cannot be limited to the fixed border lines of the former examples. Any one of the balls may occupy an infinite numl)er of positions, and furthermore we may imagine an infinite number of movements of the hand which draws the ball, each movement being associated with a particular point of position of the ball in the urn. If we now assume each of the three balls to have occupied all possible positions in the urn, each point of position being associated with its proper movement of the hand, it is readily seen that a black ball will be encountered twice as often as a white ball in a particular point of position in the urn, and for this reason any particular movement of the hand which leads to this point of position grasps a black ball twice as often as a white ball. 5. General Definition of the Probability of an Event. — All the above examples have shown the following characteristics: (1) A total general region or area of action in which all actions may take place, this total area being associated with all possible events. 6 INTRODUCTION. [ 6 (2) A limited special domain in which the associated actions produce a special event only. If these areas and domains, as in the above cases, are of such a nature that they allow a purely quantitative determination, they may be treated by mathematical analysis. We define now, without entering further into its particular logical signifi- cance, the ratio of the second special and limited domain to the first total region or area as the probability of the happening of the event, E, associated with domain Xo. 2. We must, however, hasten to remark that it is only in a com- paratively few cases that we are able, a priori, to make such a segregation of domains of actions. This may be possible in purely abstract examples, as for instance in the example of the division of the decadian number by 5. But in all cases where organic life enters as a dominant factor we are unable to make such sharp distinctions. If we were asked to determine the proba- bility of an .r-y ear-old person being alive one year from now, we should be able to form the hypothetical disjunctive judgment: An .r-year-old person will be either alive or dead one year from now. But a further segregation into special domains as was the case with the balls in the urn is not possible. Many ex- tremely complex causes enter into such a determination; the health of the particular person, the surroundings, the daily life, the climate, the social conditions, etc. Our only recourse in such cases is to actual observation. By observing a large number of persons of the same age, x, we may, in a ])urcly em- pirical way, determine the rate of death or survival. Such a deter- mination of an unknown probability is called an emi)irical proba- bility. An empirical probability is thus a probability, into the determination of which actual experience has entered as a domi- nant factor. 6. Equally Likely Cases.^ — The main difficulty, in the appli- cation of the ab()\-e definition of probability, lies in the deter- mination of the question whether all the events or cases taking place in the general area of action may be regarded as cciually likely or not. Two diametrically ()])i)osite views have here been brought forward by writers on probabilities. Oiu> view is based upon the principle which in logic is known as tlie principle of 6] EQUALLY LIKELY CASES. 7 "insufficient reason," \vhili> the other view is bused ui)on the principle of "cogent reason." The chissical writers on the theory of probability, such as Jacob Bernoulli and Laplace, base the theory on the principle of insufficient reason exclusively. Thus Bernoulli declares the six possible cases by the throw of a die to be equally likely, since "on account of the equal form of all the faces and on account of the homogeneous structure and equally arranged weight of the die, there is no reason to assume that any face should turn up in preference to any other." In one place Laplace says that the possible cases are "cases of which we are equally ignorant," and in another place, "we have no reason to believe any particular case should hai)pen in i)reference to any other." The ()p])osite view, based on the principle of cogent reason, has been strongly endorsed in an admirable little treatise by the German scholar, Johannes von Kries.^ Von Kries requires, first of all, as the main essential in a logical theory of probability, that "the arrangement of the equally likely cases must have a cogent reason and not be subject to arbitrary conditions." \n several illustrative exam])les, von Kries shows how the principle of insufficient reason may lead to different and paradox- ical results. The following example will illustrate the main points in von Kries's criticism. Suppose we be given the follow- ing problem: Determine the probability of the existence of human beings on the planet ^Nlars. By applying the first mentioned principle our reasoning would be as follows: We have no more reason to assume the actual existence of man on the planet than the complete absence. Hence the probability for the non- existence of a human being, is equal to ^. Next we ask for the probability of the presence or non-presence of another earthly mammal, say the elephant. The answer is the same, h. Xow the probability for the absence of both man and elephant on the planet is ^ X § = 4."^ The ])robabilit\' for the absence of a third mammal, the horse, is also 2, or the probability for the absence of man, elephant, and horse is equal to (^)^ = |. Proceeding in the same manner for all mammals we obtain a very small proba- 1 "Die Princij)ien dor '\^'allrs^heinlichkeitsrechnung."l Berlin, 1886. 2 See the chapter 011 multiplication of probabilities. 8 INTRODUCTION. [6 bility for the complete absence of all mammals on ]\Iars, or a very large probability, almost equal to certainty, that the planet harbors at least one mammal known on our planet, an answer \vhich certainly does not seem plausible. But we might as well have j)ut the question from the start: what is the probability of the existence or absence of any one earthly mammal onlNIars? The principle of insufficient reason when applied directly would here give the answer §, while when applied in an indirect manner the same method gave an answer very near to certainty. An urn is known to contain white and black balls, but the number of the balls of the two different colors is unknown. What is the probability of drawing a white ball? The principle of insufficient reason gives us readily the answer: ^, while the prin- ciple of cogent reason would give the same answer only if it were known a priori that there were equal numbers of balls of each color in tlie urn before the drawing took place. Since this knowledge is not present a priori, we are not able to give any answer, and the problem is considered outside the domain of probabilities. There is no doubt that the principle advocated by von Kries is the only logical one to apply, and a recent treatise on the theory of probability by Professor Bruhns of Leipzig^ also gives the principle of cogent reason the most promi- nent place. On the other hand it must be admitted that if the principle was to be followed consistently in its very extreme it would of course exclude many problems now found in treatises on probability and limit the application of our theory consider- ably in scope. Still, however, we must agree with von Kries that it seems very foolhardy to assign cases of which we are absolutely in the dark, as being equally likely to occur. This very j)rinciple of insufficient reason is in very high degree re- sponsible for the somewhat absurd answers to questions on the so-called "inverse probabilities," a name which in itself is a great misnomer. We shall later in the chapter on "a posteriori" probabilities discuss this question in detail. At present we shall only warn the student not to judge cases of which he has no knowledge whatsoever to be equally likely to occur. The old rule "experience is the best teacher" holds here, as everywhere else. ' " KoUektivmasslehre and WiihrHcheinlichkeitsrechnung," Loipzip, 1903. 7 J OBJECTIVE AND SUBJECTIVE PROBABILITIES, 9 7. Objective and Subjective Probabilities.— In tliis connection it is interesting to note the lucid remarks by the Danish statis- tician, Westergaard. "By every well arranged game of chance, by lotteries, dice, etc.," Westergaard says, "everything is ar- ranged in such a way that the causes influencing each draw or throw remain constant as far as possible. The l)alls are of the same size, of the same wood, and have the same density; they are carefully mixed and each ball is thus apparently subject to the influences of the same causes. However, this is not so. Despite all our efforts the balls are different. It is impossible that they are of exactly mathematically spherical form. Each ball has its special deviation from the mathematical sphere, its special size and weight. No ball is absolutely similar to any one of the others. It is also impossible that they may be situated in the same manner in tlie bag. In short there is a multitude of ap- parently insignificant differences which determine that a certain definite ball and none of the other balls may be drawn from the bag. If such inequalities did not exist one of two things would happen. Either all balls would turn up simultaneously or also they would all remain in the bag. Many of these numerous causes are so small that they perhaps are invisible to the naked eye and completely escape all calculations, but by mutual action they may nevertheless produce a visible result." It thus appears that a rigorous application of the principle of cogent reason seems impossible. However, a compromise between this principle and that of the principle of insufficient reason may be effected by the following definition of equally possible cases, viz. : Equally possible cases are such cases in which we, after an exhaustive analysis of the physical laws underlying the structure of the complex of causes influencing the special event, are led to assume that no particular case will occui in preference to any other. True, this definition introduces a certain subjective element and may therefore be criticized by those readers who wish to make the whole theory of probabilities purely objective. Yet it seems to me preferable to the strict application of the principle of equal distribution of ignorance. Take again the question of the probability of the existence of human beings on the planet Mars. The principle of equal distribution of ignorance 10 INTRODUCTION. [7 readily gives us without further ado the answer ^. Modern astro- physical researches luue, however, verified physical conditions on the planet which make the presence of organic life quite possible, and according to such an eminent authority as ]\lr. Lowell, perhaps absolutely certain. Yet these physical investigations are as yet not sufficiently complete, and not in such a form that they may be subjected to a purely quantitative analysis as far as the theory of probabilities is concerned. Viewed from the stand])oint of the principle of cogent reason any attempt to determine the numerical value of the above probability must therefore be put aside as futile. This result, negative as it is, seems, however, preferable to the absolute guess of | as the probability. CHAPTER 11. HISTORICAL AND BIBLIOCRAPIIICAL XOTES. 8. Pioneer Writers. — Tlie first attempt to dofino tlie measure of a probahility of a future event is eredited to the Greek ])hilos- oplier, Aristotle. Aristotle ealls an event probable when the majority, or at least the majority of the most intellectual persons, deem it likely to happen. This definition, although not allowing a purely quantitative measurement, makes use of a subjective judgment. The first really mathematical treatment of chance, however, is given by the two Italian mathematicians, Cardano and Galileo, who both solved several problems relating to the game of dice. Cardano, aside from his mathematical occupation, was also a professional gambler and had evidently noticed that in all kinds of gambling houses cheating was often resorted to. In order that the gamester might be fortified against sucli cheating prac- tices, Cardano wrote a little treatise on gambling wherein he discussed several mathematical questions connected with the diflFerent games of dice as ])Iayt>d in the Italian gambling houses at that time. Galileo, although not a professional gambler, was often consulted by a certain Italian nobleman on several problems relating to the game of dice, and fortunately the great scholar has left some of his investigations in a short memoir. In the same manner the two great French mathematicians, Pascal and Fermat, were often asked by a professional gamester, the cheva- lier de Mere, to a])])ly their mathematical skill to the solution of different gambling problems. It was this kind of investigation which probably led Pascal to the discovery of the arithmetical triangle, and the first rudiments of the combinatorial analysis, which had its origin in ])r()bability problems, and which later evolved into an independent branch of mathematical analysis. One of the earliest works from the illustrious Dutch physicist, Huyghens, is a small pamjjhlet entitled "de Ratiociniis in Ludo Alese," printed in Leyden in the year 1657. Huyghens' tract is 11 12 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [9 the first attempt of a systematic treatment of the subject. The famous Leibnitz also wrote on chance. His first reference to a mathematical probability is perhaps in a letter to the ])hiloso- pher, Wolft", wherein he discusses the summation of the infinite series 1 — 1 + 1 — IH----. Besides he solved several problems. 9. Bernoulli, de Moivre and Bayes. — The first extensive treatise on the theory as a whole is from the hand of the famous Jacob Bernoulli. Bernoulli's book, "Ars Conjectandi," marks a revolution in the whole theory of chance. The author treats the subject from the mathematical as well as from a philo- sophical point of view, and shows the manifold applications of the new science to practical problems. Among other important theorems we here find the famous proposition which has become known as the Bernoulli Theorem in the mathematical theory of probabilities. Bernoulli's work has recently been translated from the Latin into German,^ and a student who is interested in the whole theory of probability should not fail to read this masterly work. The English mathematicians were the next to carry on the investigations. Abraham de Moivre, a French Huguenot, and one of the most remarkable mathematicians of his time, wrote the first English treatise on probabilities.^ This book was cer- tainly a worthy product of the masterful mind of its author, and may, even today,, be read with useful results, although the method of demonstration often appears lengthy to the student who is accustomed to the powerful tools of modern analysis. The high esteem in which the work by de ]Moivre is held by modern writers, is proven by the fact that E. Czuber, the eminent Austrian mathematician and actuary, so recently as two years ago translated the book into German. A certain ])roblem (see Chap. l\) still goes under the name of "The Problem of de Moivre" in the modern literature on probability. A contem- porary of de Moivre, Stirling, contributed also to the new branch of mathematics, and his name also is immortalized in the theory of probability by the formula which bears his name, and by which we are able to express large factorials to a very accurate degree of approximation. The third important English contributor is ' Ars Conjectandi, Ostwald's Klassiker No. 108, Leipzig, 1901. ^ de Moivre: "The Doctrine of Chances," London, 1781. 10] APPLICATION TO STATISTICAL DATA. 13 the Oxford clergyman, T. Bayes. Bayes' treatise, which was published after his death by Price, in Philosophical Tratu'iactioiis for 17G4, deals with the determination of the a posteriori proba- bilities, and marks a very imj)()rtant stepping stone in our whole theory, rnfortunately the rule known as Bayes' Rule has been applied very carelessly, and that mostly by some of Bayes' own countrymen; so the whole theory of Bayes has been repudi- ated by certain modern writers. A recent contril)uti()n by the Danish philosophical writer, Dr. Kroman, seems, however, to have cleared up all doubts on the subject, and to have given Bayes his proper credit. 10. Application to Statistical Data. — In the eighteenth century some of the most celebrated mathematicians investigated problems in the theory of probability. The birth of life as- surance gave the whole theory an important application to social problems and the increasing desire for the collection of all kinds of statistical data by governmental bodies all over Europe gave the mathematicians some highly interesting material to which to apply their theories. No wonder, therefore, that we in this period find the names of some of the most illustrious mathe- maticians of that time, such as Daniel Bernoulli, Euler, Nicolas and John Bernoulli, Simpson, D'Alembert and Buft'on, closely connected with the solution of problems in the theory of mathe- matical probabilities. We shall not attempt to gi\e an account of the diherent works of these scientists, l)ut shall only dwell briefly on the labors of Bernoulli and D'Alembert. In a memoir in the St. Petersburg Academy, Daniel Bernoulli is the first to discuss the so called St. Petersburg Problem, one of the most hotly debated in the whole realm of our science. We may here mention that this problem is today one of the main pillars in the economic treatment of value Bernoulli introduced in the dis- cussion of the above mentioned problem the idea of the "moral expectation," which under slightly difl'erent names appears in nearly all standard writings on economics. D'Alembert is especially remembered for the critical attitude he took towards the whole theory. Although one of the most brilliant thinkers of his age, the versatile Frenchman made some great blunders in his attempt to criticize the theories of chance. 14 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [11 Biitfon's name is remein})ored because of the needle problem, and he may properly be called the father of the so-called "ge- ometrical" or "local" probabilities. 11. Laplace and Modern Writers. — We now come to that resplendent genius in the investigation of the mathematical theory of chance, the immortal Laplace, who in liis great work, "Theorie Analytique des Probabilites," gave the final mathe- matical treatment of the subject. This massive volume leaves nothing to be desired and is still today — more than one hundred years after its first publication — a most valuable mine of in- formation and compares favorably with much more modern treatises. But like all mines, it requires to be mined and is by no means easy reading for a beginner. An elementary extract, "Essai Philosophique des Probabilites," containing the more elementary parts of Laplace's greater work and stripped of all mathematical formulas has recently appeared in an English translation. Among later French works, Cournot's "Exposition de la Theorie des Chances et des Probabilites" (1843), treated the principal questions in the application of the theory to practical problems in sociology. Li 1837 Poisson published his "Re- cherches sur les Probabilites " in which he for the first time proved the famous theorem which bears his name. Poisson and his Belgian contemporary, Quetelet, made extensive use of the theory in the treatment of statistical data. Among the most recent French works, we mention especially Bertrand's "Calcul des Probabihtes" (Paris, 1888), Poincare's "Calcul des Probabilites" (Paris, 1896), and Borel's "Calcul des Probabilites" (Paris, 1901). We especially recommend Poin- care's brilliant little treatise to every student who masters the French language, as this book makes no departure from the lively and elucidating manner in which this able mathematical writer treated the numerous subjects on which he wrote during his long and brilliant career as a mathematician. Of Russian writers, the mathematician, Tchebycheff, has given some extensive general theorems relating to the law of large numbers. Unfortunately Tchebycheff 's writings are for the most part scattered in French, German, Scandinavian and 11] LAPLACE AND MODERN WRITERS. 15 Russian journals, and tlnis are not easily accessible to the ordinary reader. A Russian artillery officer, Sabudski, has recently pub- lished a treatise on ballistics in German, wherein he extends the views formulated by Tchebycheff. Of Scandinavian writers we mention T. X. Thiele, who prob- ably was the first to ])ul)lish a systematic treatise on skew curves.^ An abridged edition of this very original work has recently been translated into English.- The Dane, Westergaard, is the author of the most extensive and thorough treatise on vital statistics which we possess at the present time. Westergaard 's work has recently been translated into German,^ and is strongly recom- mended to the student of vital statistics on account of his clear and attractive style of presenting this important subject. The Swedish mathematicians Charlier and Gylden have published a series of memoirs in different Scandinavian journals and scientific transactions. We may also, in this category, mention the numerous small articles by the eminent Danish actuary. Dr. Gram. While tlie German mathematicians in general are the most fertile writers on almost every branch of pure and applied mathe- matics, they have not shown much activity in the theory of mathematical probability except in the past ten years. But during that time there has appeared at least a dozen standard works in German. Among these, the lucid and terse treatise by E. Czuber, the Austrian actuary and mathematician, is especially attractive to the beginner on account of the systematic treatment of the whole subject.'* A very original treatment is offered by H. Bruhns in his " Kollektivmasslehre und Wahrschein- lichkeitsrechnung" (Leipzig, 1903). Among the German works, we may also mention the book by Dr. Norman Herz in "Samm- lung Schubert," and an excellent little work by Hack in the small pocket edition of "Sammlung Goschen." The theory of skew curves and correlation is presented by Lipps and Bruhns in extensive treatises. ' "Almindelig lagttagclseslaere," Copenhagen, 1884. 2 "Theory of Observations," London, 1903. 3 "Mortalitat und MorbiHtat/' Jena, 1902. *E. Czuber, "Wahrscheinlichkeitsrechnung," Leipzig, 1908 an ^^f throwing ■^" 2 16 8- 20 THE MATHEMATICAL THEORY OF PROBABILITIES [14 14. Example 2. — D'Alembert, the great French mathematician and natural philosopher and one of the ablest thinkers of his time, assigned | as the probability of throwing head at least once in two successive throws with a homogeneous coin. D'Alem- bert reasons as follows : If head appears first the game is finished and a second throw is not necessary. He therefore gives as equally possible cases (we denote head by II and tail by T) : H, TH, TT, and determines thus the probability as |. Where then is the error of D'Alembert? At first glance the chain of reasoning seems perfect. There are altogether three possible cases of which two are in favor of the event. But are the three cases equally likely? To throw head in a single throw is evi- dently not the same as to throw head in two successive throws. D'Alembert has left out of consideration the fact that a double throw is allowed. The following analysis shows all the equally I)ossible cases which nuiy occur: ////, Iir, Til, TT. Three of those cases favor the event. Hence we have: P(E) = p = We shall return to this problem at a later stage under the dis- cussion of the law of large numbers. The examples quoted have already shown that the enumer- ation of the equally likely cases requires a sharp distinction between the different combinations and arrangements of ele- ments. In other words, the solution of the ])roblems requires a knowledge of permutations and combinations. We assume here that the reader is already acquainted with the elements and formulas from the combinatorial analysis and shall therefore proceed with some more illustrations. In the following, when employing the binomial coefficients, we shall use the notation y!) instead of "'Cfc. 15. Example 3.- An urn contains a white and h black balls. A person draws /.• balls. What is the probability of drawing a white and /3 black balls? (a + /3 = /.-, a < a, ^ ^ h) 1 .") ] EXAMPLE 3. 21 k balls miiy hv drawn from the urn in as many ways as it is possible to select k elements from a + b elements, which may be done in ((i+h\ _ (a+b\ ways. Furtliermorc there are I I ^roui)s of a white and II groups of j3 black balls. Since each combination of any one <2;roui) of the first groups with any one f^rouj) of the second groups is favorable for the ?vent, we have as favorable cases: Example 4. A special case of the above problem is the fol- lowing: (jucstion which often appears in the well known game of whist. What are the respecti\e chances that 0, 1, 2, 3, 4 aces are held by a specified player? There are altogether 52 cards in the game equally distributed among 4 players. Of these cards 4 are aces and 4S are non-aces. Hence we have the fol- loAving values for a, b, Ic, a and /3. a ^ A, b = 4S, /.• = U, a = 0, 1, 2, .3, 4, /3 = 13, 12, 11, 10, 0. Substituting in the abo^•e formula we get: /4\ /4S\ /o2\ S2251 P«=(())Xll3)^ll3) = 270725 ?^^=\l)Xll2) --113; = 270725 /4\ /4Sv /:)2\ 57798 ^^^ = I2) X 111) ^ll3) = 270725 /4\ /4SV /52\ 11154 ^^=l3)><(<)) -^(13) = 270^- A hypothetical disjvuicti\e judgment immediately tells us that in 22 THE MATHEMATICAL THEORY OF PROBABILITIES. [ 16 a game of whist a specified player must either hold 0, 1, 2, o or 4 aces. Any such judgment is certain to come true. Hence by adding the 5 above computed probabilities we obtain a check for the accuracy of our calculations. The actual addition of the numerical values of /;,), [h, p-i, P:i, and p^ gives us unity which is the mathematical symbol for certainty. Gauss, the renowned German mathematician and astronomer, was an eager whist player. I )uring his f orty-eigiit years of residence in the university town of Gottingen almost every evening he played a rubber of whist with some friends among the university professors. He kept a careful record of the distribution of the aces in each game. After his death these records were found among his papers, headed "Aces in Whist." The actual records agree with the results computed above. 16. Example 5. — An urn contains n similar balls. A part of or all the balls are drawn. What is the probability of drawing an even number of balls? One ball may be drawn in as many ways as there are balls, two balls in as many ways as we may select two elements out of 72 elements, and so on. Hence we have for the total number of equally possible cases: -(;v(:)+(3)+-+(+')"(:)- We have now: and . a-.ir = i-(;v(:;)-...+(-i)»(;;). The number of favorable cases is given by the expansion: /=(:)+(:)+ The expression for / is the binomiiuil coefficients less unity. Hence we have: t = (] -\r 1)" - 1 = 2" - 1. If we add the two expansions of (1 + 1)" and (1 — 1)" and then 17] EXAMPLE 6. 23 subtract 2 we get tlic expansion for 2/. Hence we have: 2/ = [(1 + 1)" + (1 - D" - 2] .-. / = 2»-' - 1. Thus we shall have as the probability of drawing an even numi)er of balls: 2«-i _ 1 ^^^^^-^^ while for an uneven number: _ ^ = ^ 9n— 1 We notice that the probability of drawing an uneven number of balls is larger than the probability of drawing an even number. This apparently strange result is easily explained without the aid of algebra from the fact that when the urn contains one ball only, we cannot draw an even number. Hence we have p = 0, q = \. With two balls we may draw an uneven number in two ways and an even number in one way, thus p = \, and q = §. The greater weight of q remains when n is finite; only when n = CO, P = q = 2- 17. Example 6. — A box contains n balls marked 1, 2, 3, • • • n. A person draws n balls in succession and none of the balls thus drawn is put back in the urn. Each drawing is consecutively marked 1, 2, 3, • • • n on n cards. What is the probability that no ball marked a (a = 1, 2, 3, • • • n) appears simultaneously with a drawing card marked a? The number of equally possible cases is simply the number of permutations of n elements which is equal to nl The number of favorable cases is given by the total number of derangements or relative permutations of n elements, i. e., such permutations wherein the numbers from 1 to n do not appear in their natural places. The formula for such relative permuta- tions was first given by Euler in a memoir of the St. Petersburg Academy entitled "Quaestio Curiosa ex Doctrina Combina- tionis." Euler makes use of a recursion formula. A German mathematician, Lampe, has, however, derived the formula in a simpler manner in "Grunert's Archives" for 1884. 24 THE MATHEMATICAL THEORY OF PROBABILITIES. [17 Lampe denotes by the symbol ^(1) the number of permuta- tions wherein 1 does not appear in its natural place. By letting 1 remain fixed in the first place we obtain (n — 1) ! permutations of the other remaining elements, or: <^(1)„= nl- {u- 1)1 permutations where 1 is out of place. Of these permutations there are, however, a number wherein 2 appears in its natural place. If we let 2 remain fixed in this place we shall have: <^(l)„_i= {n- l)\- {n-2)\ permutations wherein 2 is in its place but 1 out of place, there remains thus: 3 + • • • + Pn. This theorem is also known as the Addition Theorem of proba- bilities. Instead of "total probability" the German scholar, Reuschle, has suggested the expressive name of the "either or" probability. The term is well selected when we remember that the event, E, will happen when either Ei, or E^ or Ez • • • or En happens. Example 7. — What is the probability to throw 8 with two dice in a single throw? The total number of ways is t = Or = 36. The event in. ques- tion E is comj)osed of the three subsidiary events favoring the combination of 8: E,: G, 2 £2: 5, 3 Ez: 4, 4. Now / ^(^') = i = i^' ^(^') = i = i^' ^(^') = li Hence ^ ' 18^18"^ 36 36' 21. Theorem of the Compound Probability or the Probability of " As Well As." — An event E may ha])pen when every one of the mutually exclusive events Ei, E-i, Ez, • ' • En has occurred previously. It is immaterial if the n subsidiary events have happened simultaneously or in succession. But it makes a difference if the events Ei, E2, Ez, • • • En are independent, or dependent on each other. 1. Independent Events. — The probability, P{E) = p, for the simultaneous or consecutive appearance of several nnitually ex- clusive events: Ei, E>, ■ • • /?„ is equal to the product: pi-p-i-pz- • • • p„ of the individual probabilities of the n events. Proof: Let the number of possible cases entering into the complex that brings forth the event E be /. Each of the t^ 21 ] THEOREM OF THE COMPOUND PROBABILITY. 29 possible cases corresi)()nding to the event Ei may occur simul- taneously with each one of the t'l cases corresponding^ to the event Eo- Thus we have altogether /i X /•> cases falling on A'l and E2 at the same time. Continuing in the same way of reasoning it is readily seen that the total number of equally possible cases resulting from the simultaneous occurrence of the events Ei, Eo, Ez, • • -En is equal to /i X /'^ X /s X • • • tn. By applying the same reasoning to the favorable cases we get as their total number: / = /iX/. X/3X •••/„. Hence the fiiuil probaliility for the hai)pening of the simultaneous or consecutive appearance of the n minor events is: P(£:) ={ = 7X7X7 X ••••-7= /^ix }hXp,x ••■vn. Example 8. — A card is drawn from a whist deck, another card is drawn from a pinochle deck. What is the probability that they both are aces? A whist deck contains o2 cards of which four are aces, a pinochle deck 48 cards with 8 aces. Denoting the probabilities of getting an ace from the whist and j)inochle decks by P(Ei) and P{,E2) respectively we have: P{E) = P{E,)P{E,) = ^^X^=~. 2. Dependent Events. — The n events £1, Eo, E3, • • • En are not independent of each other, but are related in such a way that the appearance of Ei influences E2, that event influences in turn £3, Ei event £4 and so on. The same reason holds as above, and, P{E) = p = pi X p-i X p,X •■■ Pn. But p-i means here the probability for the happening of E-i after the actual occurrence of Ei, ps the probability for the happening of E3 after Ei and E2 have pieviously happened, and so on for all n events. Example 9. — A card is drawn from a whist deck and replaced by a joker, and then a second card is drawn. What is the prob- abilitv that both cards are aces? 30 THE ADDITION AND MULTIPLICATION THEOREMS. [22 Denoting the two subsidiary events by Ei and E2 we have: P{E) = P(£i)P(E.>) 4 3 3 3 52 52 13 X 52 676 The two above theorems are known as the multiphcation theorems in probabihties. Reuschle lias also suggested the name " the as well as ])r()bal)ility." 22. Poincare's Proof of the Addition and Multiplication Theoreiti. — The French mathematician and physicist, II. Poincare, has derived the above theorems in a new and elegant manner in his excellent little treatise: " Lecons sur le Calcul des Probabilites," Paris, 1896. Poincare's proof is briefly as follows: Let El and E2 be two arbitrary events. El and E2 may happen in a difi'erent ways. El may happen but not E2 in /3 different ways. E2 may happen but not Ei in 7 different ways. Neither Ei nor E2 will happen in 5 different ways. We assume the total a + j8 + 7 + 5 cases to be equally likely to occur. The probability for the occurrence of Ei is ^' a + ^ + y + 8' The probability for the occurrence of E2 is a + 7 7>2 «+^+7+5 The probability for the occurrence of at least one of the events Ei and E2 is a + / 3 + 7 ^' a + ^ + 7 + 5- The probability for the occurrence of both Ei and E2 is a Pa = « + |8 + 7 + 5' The probability for the occurrence of Ei when E2 has already oc- curred is a Pb = , . a + 7 23 ] RELATIVE PROBABILITIES. 31 The probability for the occurrence of Eo when Ei has already oc- curred is a The probability for the occurrence of Ei when E2 has not already occurred is ^' = ^ + y' The probability for the occurrence of E2 when Ei has not already occurred is y 7 + We have now the following identical relations: Vi + P2 = P3 + Pa, Pz = P\ + P2 — Pi, i. e., the probability that of two arbitrary events at least one will happen is equal to the probability that the first will happen plus the probability that the second will happen less the prob- ability that both will happen. The particular problem which we may happen to investigate may possibly be of such a nature that the two events Ei and E^. cannot happen at the same time, in that case pi = 0, and we get: Pz = i^i + Pi. In this equation we immediately recognize the addition theorem for two mutually exclusi\-e events. By substitution of the proper values we have furthermore: Pi = p-2 ■ Pb or Pi = pi • Pq. These equations contain the theorems proved under §21, of the probability for two mutually dependent events, 23. Relative Probabilities. — We shall now finally give an alter- native demonstration of the same two theorems. It will, of course, be of benefit to the student to see the subject from as many view points as possible; moreovei;, the following remarks will contain some very useful hints for the solution of more com- plicated problems by the application of so-called " relative prob- 32 THE ADDITION AND MULTIPLICATION THEOREMS. [23 abilities "and a few elementary theorems from the calculus of lojjic. The following paragraphs are mainly based upon a treatise in the Proceedings of the Royal Academy of Saxony, by the German mathematician and actuary, F. Ilausdorff. In our fundamental definition of a mathematical probability for the ha])peninp; of an e\-ent E, expressed in symbols by P(E), as the ratio of the equally favorable and equally possible cases resulting from a general complex of causes, we were able to compute the so-called ordinary or absolute probabilities. But if we, from among the favorable cases and possible cases, select only such as bring forward a certain different event, say F, then we obtam the " relative probability " for the happening of E under the assumption that the subsidiary event, F, has occurred previously. For this relative probability we shall employ the symbol Pf{E), which reads "the relative probability of E, positi F." The following problem illustrates the meaning of relative probabilities. If an honor card is drawn from an ordinary deck of cards, what is the probalMlity that it is a king? Denoting the subsidiary event of drawing an honor card by F, and the main event of drawing a king by E, we may write the above mentioned probability in the symbolic form: Pp{E). If on the other hand we knew a priori that a king was drawn, we may also ask for the probability of having drawn an honor card. Since any king also is an honor card, we may write in symbols: Pe{F) = 1. Before entering upon the immediate determination of relative probabilities we shall first define a few symbols from the calculus of logic. We denote first of all the occurrence of an event E by E, the non-occurrence of the same event by E. Similarly we have for the occurrence and non-occurrence of other events, F, G, 11, • ■ ■ and /', d, II, • ■. E -Jr F means that at least one of the two events E and F will happen. E X /' or simply E • F means the occurrence of both E and F. From the above definition it follows immediately that E + F = E • F and /:=/•:• F + /•; • f. This last relation simply states that E will ha])|)(Mi when either E and /' happen simultaneously or when E and the non-aj^pear- ance of /' happen at the same time. If furthermore Fu Fi, Fo, 25] PROBABILITY OP^ Rp:i'ETnTONS. 33 Fi ■ • • Fn, Fn constitute the inciiihers of a complete disjunetion, i. e., mutually e\chisi\'e events, we have in ^ 28. Example 13. — An urn contains a white, h black and c red balls. A sin<:jle ball is drawn a + /3 + 7 times in succession, and the ball thus drawn is replaced before the next drawing takes place. To determine the probability that (1) there are first a white, then 13 black and finally 7 red balls, (2) the drawn balls appear in three closed groui)s of a white, 13 black nnd 7 icd balls, but the order of these groups is arbitrary, (3) that white, black and red balls appear in the same number as above, but in any order whatsoever. 1. Denoting the three subsidiary events for drawing a white, |3 black and 7 red balls by Fu F-i and Fz, and the main event for drawing the balls in the prescribed order by E, we may write the probability for the occurrence of the main event in following symbolic form involving symbolic probabilities: P{E) = P{F{)Pj.,{F,)P^,^.S.Fz). Substituting the algebraic values for P{Fi), PiF^) and P{Fz) in the expression for P{E), and then. applying Hausdorff's rule (§24) we get: g" h^ c* P{E) - Pi - ^^ _^ ^ _,_ ^)a X ^^ _^ ^ _^ ^)3 X ^^ _^ ^ ^ ^^, (a+6 + c)''+^+^' 2. In the second part of the problem the order of the three different groups is immaterial. The three subsidiary events: Fi, Fi and F^, may therefore be arranged in any order whatsoever. The total number of arrangements is 3! = 6. The probability of the happening of any one of these arrangements separately is the same as the probability comi)uted luider (1). By ai)})lying the addition theorem we get therefore as the probability of the occurrence of this event: 6 a''6^cy ^'' ~ (a+~6 + c)»-"^+^* 3. The third part is more easily solved by a direct application of the definition of a mathematical probability. Th(> order of the balls drawn is liere immaterial. Of each individual com- 30] EXAMPLE 15. 37 bination of a white, jS black balls and 7 red balls it is possible to form (a + /3 + y)\la\&\y\ different i)ennutations as the total number of faA'orable cases. The above number of equally j)os- sible cases is here {a -\- h -\- cY^^^'^. Hence we have: (a + /? + 7) ! a'^h^c* Vz = TVoTTl X a 1/3 17! ^^ (a+ fe + c)"+^+^" 29. Example 14. — In an urn are n l)alls amon<; which are a white and ^i black. What is the prt)bability in three successive drawings to draw (1) first two white and then one black ball, (2) two white and one black ball in any order whatsoever? (a+jS^w). The probability to draw first one white, then another white and finally a black ball is: ^ « (« - 1) _J^__ ^^ n{n - 1) {n - 2)' The probability for any of the other arrangements is the same, or we have for (2) 3a {a-\) ^ j)2 = 3pi = — X n {n - 1) (w - 2) • 30. Example 15.— What is the chance to throw a doublet of 6 at least once in n consecutive throws with two dice? (Pascal's Problem.) Chevalier de ]\Iere, a French nobleman and a great friend of all games of chance, went more deeply into the complex of causes in different games than most of the ordinary gamblers of his time. Although not a proficient mathematician he understood suffi- cient, nevertheless, to give some very interesting problems for which he got the ideas from the gambling resorts he frequented. De ^Nlere was a friend of the great French mathematician and philosopher, Blaise Pascal, and went to him whenever he w^anted information on some apparently obscure point in the different games in which he participated. The chevalier had from patient observation noticed that he could profitably bet to throw a six at least once in four throws witii a single die. He reasoned now that the number of throws to throw a doublet at least once with two dice ought to be proportional to the corresponding equal number of possible cases with a single die. For one die there are 38 THE ADDITION AND MULTIPLICATION THEOREMS. [31 6 possible cases, for two 36. Thus de iNIere thought he could s./icly bet to throw a doublet of 6 in 24 throws with two dice. An actual trial by several games of dice proved extremely disastrous to the finances of the nobleman, who then went to Pascal for an explanation. Pascal solved the problem by a direct application of the definition of a mathematical probability. We shall, however, solve it by an application of the multiplication theorem. The probability to get a doublet of G in a single throw is -gQ. The probability of not getting a double six is therefore 1 — ^q = 11 . The probability of the ha])pening of this event n con- secutive times is (f g)". Thus the probability of getting a double six at least once in Ji throws with two dice becomes: p = 1 — (ft)"- Solving this equation for n we shall have: ^ log (1 - p) log 35 — log 30 ' for p = 2 ^'6 shall have: log 2 n = = 24. G • • •. log 3G — log 35 " ' First for 25 throws we may bet safely one to one while for 24 throws such a bet was unfavorable. This shows the fallacy of de Mere's reasoning. 31. Example 16. — An urn,.l, contains a balls of winch a are white, another similar urn, B, contains b balls of which /3 are white. A single ball is drawn from one of the two urns. What is the probability that the ball is white? The beginner may easily make the following error in the solution of this problem. The probability to get a white ball from A is a/a, from B, fi/b. Thus the total probability to get a white ball is: a /a + j3,'b. This result is, liowe\er, wrong, for we may, by selecting proper values for a, b, a and /3, obtain a total probability which in numerical value is greater than unity. Thus if a = 7, b = 7, a = 5, /3 = 4, we get as the total probability: P — Y T- 1 — y. This result is evidently wrong, since a mathematical probability is never an improper fraction. The error lies in the fact that we 32] EXAMPLE 17. 39 have regarded the two events of drawing a ball from either urn as independent and mutually exclusive. A simple ai)j)lication of the symbolic rule for relative probabilities will give us the result immediately. The main event, K, is com])osed of the two following subsidiary events: (1) to get a white ball from A, or (2) to get a white ball from B. We shall symbolically denote these two events by A ■ IT' and B ■ JV respectively. Thus we have : P{E) = P(A ■ JV) + P(B ■ Jn = P(A)PJ]V) + P(B)PJ\V). Now the probability to obtain urn A is P{A) = pi = h, also to get B: P{B) = p-y — h. The probability to get a white ball from A when this particular urn is previously selected is expressed by the relative probability: Similarlv for B: p.m = P^=l- PB(in - pa = 1 Substituting these different values in the expression for P(E) we get finallv: For the particular numerical example we have: 1 /.") 4\ 9 H=+=) P 2\7 ' 7/ 14 32. Example 17. — The probability of the happening of a certain event, E, is p, while the probability for the non-occurrence of the same event is 3 • • • (1 - Pa)(1 - p,) •■• (1 - p„). We shall now denote the sum of all products in (1) containing if factors p by the symbol »S^. It is readily seen that ^ will have all positive integral values from r to n inclusive. We may therefore write the total compound probability in the following form : P{E^ri) = AoSr + ,llS.+ : -f .l.,S,+o + • • • + An-rSn. (2) The student must bear in mind that the different S are merely symbols for different s^nns of all the i)roducts of /•, r + 1, r + 2, • • • n factors p respectively. Our problem is now to determine the unknown coefficients A. It is easily seen that the coefficient Aq = \, since all different products containing r factors p appear only once. The other coefficients of the form A do not depend on the values of p, however. They remain therefore unaltered if we equate all of the various p's and let them equal p. Ex- pression (1) then simi)ly becomes I j • p''{l — p)"~\ We must form all possible rth powers of n similar factors, which can be done in ( I ways. The expression (2) on the other hand 44 THE ADDITION AND MULTIPLICATION THEOREM. [34 becomes : + h An-rP''. Any S^ is by definition the sum of all products containing (p factors p and we may form I | sucii j^roducts from n elements ;;. But we saw above that

when e\-ery exi)o- nent is rej)hiced by an index number (i. c, .S* replaced by .S'^) and the expansion broken off at the term *S'". The student must of course constantly bear in mind the symbolic meaning of ^S^. The second ])art of the problem is easily solved by the sym- bolic method. Denoting this particular event hy Er, we have the following identity: P(Er) - P(Er+l) = P(E,r,) or P(Er) - PiE,r,) = P{Er+l). The following relations are self-evident: P{E,) =1; P{Ei) = PiEo) - P(EJ - 1 - Y^, also; P{E,) = P{E,) - P(£,„) = '^ ^ ^' 1+S (1 + 5)2 (1 + 5)2* The complete induction gives us finally: P(Er) .= (1 +sy Assuming the rule is true for /•, we may easily prove it is true for r + 1 also. We have in fact: (1 + sy (1 + s)^^ (1 + sy+^ ' 46 THE ADDITION AND MULTIPLICATION THEOREMS. [35 35. Example 20. Tchebycheff's Problem. — The following solu- tion of a very interesting problem is due to the eminent Russian mathematician, Tchebycheff, one of the foremost of modern analysts. A proper fraction is chosen at random. What is the proba- bility it is in its lowest terms? Stated in a slightly different wording the same question may also be put as follows: If A/B is a proper fraction, what is the probability that .1 and B are prime to each other? If p2, pz, Ph, • • • pm denote respectively the probabilities that each of the primes 2, 3, 5, • • • m is not a common factor of numerator and denominator of A/B, then the probability that 710 prime number is a common factor is: P = p2 ■ pz ■ p;, ■ ■ • Pm ■ ■ ■ p, • • • ad. inf. (I) This follows from the multiplication theorem and from the fact that the sequence of prime numbers is infinite. Tchebycheff now first finds the probability Qm = ^ — Pm that the fraction A/B does contain the prime m as factor of both A and B. By dividing any integral number by the prime m we obtain besides the quotient a certain remainder that must be one of the following numbers, viz.: 0, 1, 2, 3, 4, • • • (m - 1). Each of the above remainders may be regarded as a possible event. The probability to obtain as a remainder is accordingly 1/m. The probability that m is contained as a factor of .1 is therefore 1/m. This same quantity is also the probability that m is a factor of B. The probability that both A and B are divisible by m is therefore: 1 111 .1 qm= I — Pm = = —^, or Pm = I — —i- 111 m m- m^ Hence we have for the various primes: P2=l-r^, P3=l— 5^, P5=l-T^, •••• 35] EXAMPLE 20. tchebycheff's problem. 47 Formula (I) then takes the form: Forming the reciprocal l/P we get: 1111 P l_i i_i i_l 22 3'-' 52 ad. inf. Now each factor on the right hand side is the sum of a geometrical progression, as: p = ^ 1 + ^ + ^2J2 + • ■ • / \ 1 + 32 + (3^ + • • • f ( 1 + 55 + (^2 +•••)••• ad. inf. Multiplying out we shall have : ^^^T2~i92~r'Q2'T2~'r2' '*' ^^' iril- The above infinite series is, however, merely the well known Eulerian expression for tt^/G, hence: Suppose furthermore we were assured that none of the three primes 2, 3, 5 was a common factor of both A and B. What would then be the probability that the fraction might be reduced by division by one or more of the other primes? Denoting by the symbol Pa) the probabiHty that none of the primes from 7 and upw' ards is a common factor, we get : also: ^ = ^= (1-2^) (1-32) (1-52)^(7), 48 THE ADDITION AND MULTIPLICATION THEOREMS, [35 or: p.. = ^[(.4)('-^)(-l)] = --. The probability of the divisibihty of both numerator and denominator of a fraction chosen at random by a prime hirger than 5 is tlius: J_ 20 1 - P(7) = -. The summation of the infinite series of the reciprocals of the squares of the natural numbers baffled for a long time the skill of some of the most eminent mathematicians. Jacob Bernoulli, the renowned classical writer on probabilities, proved its conver- gency but failed to find its sum. The final summation was first performed by Euler. CHAPTER V. MATHEMATICAL EXPECTATION. 36. Definition, Mean Values. — It is common belief among many people that gambling and all kinds of betting have their source in reckless desire. This is often argued by moral reform- ers, but cannot be said to be the true cause. Whenever by ordi- nary gambling or by a bet, actual value is exposed to a complete or partial loss, this exposure is not due to the fact that the gamester is reckless, but because there is hope of an actual gain. "Hope," says Spinoza in his treatise on ethics, "is the indeterminate joy caused by the conception of a future state of alTairs of whose outcome we are in doubt." Actual mathematical calculation cannot be attempted on the basis of this definition any more than it could be attempted to determine a mathematical prob- ability from the definition of Aristotle. "We disregard there- fore the psychological element of desire, which is associated with hope or expectation as well as the anxiousness or dread associated with the related psychological element of non-desire" (Cantor). The so-called niathrinafical expectation is the product of an expected gain in actual \alue and the mathematical probability of obtaining such a gain. The danger of loss may in this case be regarded as a negative gain. Thus if a person, A, may expect the gain, G, from the event, E, whose probability of happening is equal to p, then e = p-G is his mathematical expectation. The quantity expressed by the symbol, e, is here the amount it is safe to hazard for the expected gain, G. We may also regard the quantity, c, as a mean value or average value. Among a large number of Ji cases only np will bring the gain, G, the others not. Thus the total gain is: pjiG -=- w = pG. Suppose we have n mutually exclusive events, Ei, Ei, • • • , En, 5 49 50 MATHEMATICAL EXPECTATION. [36 forming a complete disjunction. For their respective prob- abilities we have then the following equation: Ih + Ih + />3 + • • • + Pn = 1. If the actual occurrence of a certain one of these events, say, E^, brings a gain of 6'„, then the total value of the mathematical expectation of the n events is: e = Ih- Ch + Pi • 62 + • • • + Pn ■ Gn = "^Pa ' 0^. Since Zp^ = 1 this result may be written: f X (pi + i>2 + • • • + Pn) = Ol-Pl + 6'2-i>2 + G3-pz + • ' -Gn-pn, hence e may be regarded as the mean value of the different quantities G^ with the weights p^ (a = 1, 2, 3, • • •, n). Although we shall discuss the theory of mean values in a following chapter a few preliminary remarks might not be out of place here. A variable quantity A' is related to a series of events Ei, Eo, Ez, • • • , En (it being assumed that these events form a complete disjunction) in such a manner that when E^ happens X takes on the value .r^ (a = 1, 2, 3, • • •, n). If furthermore p\, />>, pz, • ■ ■ denote the respective probabilities of the occurrence of Ei, E2, Ez, • • -, then M{X) = piXi + 7>2.r2 + ■ • -Pn^-n is called the mean value or simply the mean of X. The above definition may be illustrated by the following concrete urn-scheme. An urn contains N balls of which a\ balls are marked .vi, a-i balls marked .12 • • • and finally a,, balls marked Xn where Oi + ch + 03 + • • -On = X. Each drawing from the urn produces a certain number A^, which may assume n different values xu X2, xz, • • •, Xn, each with the respective probabilities: V^ = N'^ = N-'-^- = X- The arithmetic mean of all the numbers written on the balls is: fll-Ti 4- ChXi + • • • anXn N which agrees with the mean as defined above. 38] VARIOUS EXPLANATIONS OF THE PARADOX. 51 37. The Petrograd (St. Petersburg) Problem.— In this con- nection it is worthy to note a celebrated problem, which on account of its paradoxical nature has become a veritable stumb- ling block, and has l)een discussed by some of the most eminent writers on probabilities. The problem was first suggested l)y Daniel Bernoulli in a coniniuiiication to the Petrograd — or as it was then called St. Petersburger Academy — in 173S. The Petrograd problem may shortly be stated as follows: Two persons A and B are interested in a game of tossing a coin under the following conditions. An ordinary coin is tossed until head turns up, whicii is the deciding e\ent. If head turns up the first time A pays one dollar to B, if liead appears first at the second toss B is to receive two dollars, if first at the tliird time four dollars and so on. What is the mathematical expectation of /i? Or in other words, how much must B pay to A before the game starts in order that the game may be considered fair? The mathematical expectation of B in the first trial is ^ X 1 = ^. The mathematical expectation for head in second throw is (^)- X 2 = ^. Or in general the mathematical prob- ability that head appears for the first time in the nth toss is (§)", and the co-ordinated expectation is 2""^-^ 2" = ^. Thus the total expectation is expressed by the following series: 1 J- 1 + 1 -I- ... 2 12 12 1 "\Mien n = oo as its limiting value it thus appears that B could afford to pay an infinite amount of money for his expected gain. 38. Various Explanations of the Paradox. The Moral Expec- tation. — Tliis e\idently paradoxical result has called forth a num- ber of explanations of various forms by some eminent mathe- maticians. One of the commentators was D'Alembert. It was to be expected that the famous encyclopaedist, who — as we have seen — did not view the theory of probabilities in too kindly a manner, would not hesitate to attack. He returns repeatedly to this problem in the "Opuscules" (1761) and in " Doutes et questions" (Amstenhiin, 1770). D'Alembert distinguishes between two forms of possibilities, viz., metaphysical and i)liysical possibilities. An event is by 52 MATHEMATICAL EXPECTATION [38 him called a metaphysical possibility, when it is not absurd. When the event is not too "uncommon" in the ordinary course of happenings it is a physical possibility. That head would appear for the first time after 1,000 throws is metaphysically possible but quite impossible physically. This contention is rather bold. "What would," as Czuber remarks, "D'Alembert have said to an actual rej)()rted case in 'Grunert's Archiv' where in a game of whist each of the four players held 13 cards of one suit." The numerical probability of such an event as expressed by mathematical probabilities is (635013559600)-^. D'Alembert's definitions including the half metaphorical term "ordinary course" are rather \ague. And what numerical value of the matheinaticid proi)ability constitutes the physical impossibility y D'Alembert gives three arbitrary solutions for the probability of getting head in the nth throw, namely : 1 _1_ 1 2»(1 + /3n2) ' 2"+*^" ' , 2^B ' 2" + ' — rj^'?.^ (K-n) where a, 13, B, K are constants and 7 an une\'en number. Daniel Bernoulli himself gives a solution wherein he introduces the term "moral expectation." If a person possesses a sum of money equal to x then according to Bernoulli kdx dy = X is the moral expectation of .r, k being a constant quantity. Integrating we get: 1 (h/ = k I — = A-(log b — log a) = k log- , which is the moral expectation of an increase h — a of an original value a. If now x denotes the sum owned by />' we may replace the mathematical expectation by their corresponding moral ex- pectations, that is to say replace 2"-V2" by (1/2") log ((a + 2''-i)/.r) and we then have: idi ^+^.11 ^+2, 1. .r+2''\ ^\^^log-^+ ilog— ^-h •••^.log ^ j, 38] EXPLANATION OF PARADOX. 53 wliich is a coiver^ent series. In this connection, it may be mentioned that the Bernoullian liypothesis has found (juite an extensive use in tlie moch-rn theory of utihty. De Morgan in his spK'iidid httle treatise "On ProbahiHties" takes the \'ie\v that the solution as first given is by no means an anomaly. He (juotes an actual experiment in coin tossing by BufVoii. Out of 2,{)4S trials 1,{](»1 gave head at the first toss, 494 at the second, 2:>2 at the third, KJT at the fourth, oli at the fifth, 29 at the sixth, 25 at the seventh, 8 at the eighth and at the ninth. Computing the various mathematical expectations, we find that the maximum value is found in the 25 sets with head in the seventh toss, which gives a gain of 25 X r)4 = 1,()()(). The most rare occiu*rence, the G sets of head in the ninth tb.row gives a gain of X 25() = 1,5;)G, which is the next highest j^ain in all the nine sets. De Morgan furthermore contends that if Bufi'on had tried a thousand times as many games, the results would not only have given more, but more yer game, arguing "that a larger net would have caught not only more fish but more varieties of fish; and in two millions of sets, we might have seen cases in which head did not appear till the twentieth throw." Further- more, "the player might continue until he had realized not only any given sum, but any given sum per game." Therefore according to De ]Morgan the mathematical expectation of a player in a single game must be infinite. CHAPTER VI. PROBABILITY A POSTERIORI 39. Bayes's Rule. A Posteriori Probabilities. — The problems hitherto considered liave all luid certain ])()ints in common. Before entering upon the calculations of the mathematical probability of the happening of the event in question, we knew beforehand a certain complex of causes which operated in the general domain of action. ^Ye also were able to separate this general complex of productive causes into two distinctive minor domains of complexes, of which one would bring forth the event, E, while the other domain would act towards the production of the opposite event, E. Furthermore we also were able to measure the respective quantitative magnitudes of the two domains, and then, by a simple algebraic operation, determine the probability as a proper fraction. The addition and multi- plication theorems did not introduce any new principles, but only gave us a set of systematic rules which facilitated and shortened the calculations of the relations between the different absolute probabilities. The above method of determination of a mathematical probability is known as an a j^riori determina- tion, and such probabilities are termed a priori probabilities. The problems treated in the preceding chapters have, nearly all, been related to different games of chance, or purely abstract mathematical quantities. The inorganic nature of this kind of ])roblems lias made it possible for us to treat them in a relatively simple manner. Tn many of tlic problems, which we shall con- sider hereafter, organic elements enter as a dominant factor and make the analysis much more complicated and difhcult. All social and biological investigations, which are of a much larger benefit and practical vahic tlian the ])robUMns in games of chance, lead often to a completely different categon' of ])robabil- ity problems, which are known as " a posteriori ])robabilities." In ])roblems where organic life enters into the cak'iilations, the complex of productive causes is so varied and manifold, that 54 40] DISCOVERY AND HISTORY OF THE RULE. 55 our minds are not able to pifjeonhole the different productive causes, j)lacing them in their proper domains of action. But we know that such causes do exist and are the origin of the event. If now, by a series of observations, we have noticed the actual occurrence of the event, E (or the occurrence of the opposite event E), the problem of the determination of an a posteriori probability to find the probability that the event E originated from a certain complex, say F. We must then, first of all, form a complete hypothetical judgment of the form: E either happens from the complexes i^i, or Fo, or F-i, ■ ■ • or F„. But we must not forget that, in general, the different complexes F^ (a = 1, 2, •••, n) of the disjunction are not known a ])riori. We must, therefore, determine the respective probabilities for the actual existence of such disjunctive complexes F^. These probabilities of existence for the complexes of causes are in general different for each member, a fact which often has been overlooked by many investigators and writers on a posteriori probabilities, and which has given rise to meaningless and paradoxical results. 40. Discovery and History of the Rule. — The first discoverer of the rule for the computation of a posteriori probabilities by a purely deductive process was the English clergyman, T. Bayes. Bayes's treatise was first published after the death of the author by his friend, Dr. Price, in Philosophical Transactions for 1763. The treatise by the English clergyman was, for a long time, almost forgotten, even by the author's own countrymen; and later English writers have lost sight of the true " Bayes's Rule " and substituted a false, or to be more accurate, a special case of the exact rule, in the different algebraic texts, under the discus- sion of the so called " inverse proba})iIities," a name which is due probably to de Morgan, and which in itself is a great misnomer. This point, presently, we shall discuss in detail. The careless application of the exact rule has recently led to a certain distrust of the whole theory of " a posteriori proba- bilities." Scandinavian mathematicians were probably the first to criticize the theory. In 1S79, Mr. J. Bing, a Danish actuary, took a very critical attitude towards the mathematical principles underlying Bayes's Rule, in a scholarly article in the mathe- 56 PROBABILITY A POSTERIORI. [41 matical journal Tidsskrift for Matematik. Bing's article caused a sharp, and often heated, discussion among the older and younger Danish mathematicians at that time; but his views seem to have gained the upper hand, and even so great an authority on the whole subject as the late Dr. T. Thiele, in liis well-known work, " Theory of Observations "' (London, 19()o), refers to Bing's article as "a crushing proof of the fallacies underlying the determination of a posteriori probabilities by a purely deductive method." As recently as 1908, the Danish writer on philosophy. Dr. Kroman, has taken up cudgels in defense of Bayes in a contribution in the Transadiuns of the Royal Danish Academy of Science, which has done much towards the removal of many obscure and erroneous vicMs of the older authors. Among English writers, Professor Chrystal, in a lecture delivered before the Actuarial Society of Edinburgh, has also given a sharp criticism of the rule, although he does not go so dee])ly into the real nature of the problem as either Bing or Kroman. Despite Chrystal's advice to " bury the laws of inverse prob- abilities decently out of sight, and not embalm them in text books and examination papers " the old view still holds sway in recent professional examination papers. It is therefore absolutely necessary for the student preparing for professional examinations to be acquainted with the theory. In the following ])aragraphs we shall, therefore, give the mathematical theory of Bayes's Rule with several examples illustrating its application to actual, problems, together with a criticism of the rule. 41. Bayes's Rule (Case I). — {The different complexes of causes producing the observed event, E, possess different a priori proba- bilities of existence.) Let E denote a certain state or condition, which can appear under only one of the mutually exclusive complexes of causes: Fi, Fo, • • ■ and not otherwise. Let the probability for the actual existence of F\ be ki and if F\ really exists then let coi be the " productive probability " for bringing forth the observed event, E {E being of a different natm-e from F), which can only occur after the previous existence of one of the mutually exclusive complexes, F. Let, in the same manner, F2 have an " existence probability " of ko and a " productive probability " of 0^2, F^ an existence probability of ks and a pro- 41 ] BA yes's rule. 57 ductive probability of 003 • • • etc. If now, by actual observa- tion, we have noted that the event E has occurred exactly m times in n trials, then the probability that the complex Fi was the origin of E is: Similarly that complex /^ ^vas the origin: ,, , , wY 1 , , ^ n — m r^, = and so on for the other complexes. Proof. — Let the number of equally possible cases in the general domain of action, which leads to one of tlie complexes F^, be /. Furthermore, of these t cases let /i be favorable for the existence of complex /'\,/2 for F2,/3 for F3, • • • , etc. Then the probabilities for the existence of the different complexes F^ia = 1, 2, 3, • • • 71) are: /l f-2 /s ,. , Ki = y , K'2 = 'T , ^3 = Y ■ ' ■ respectively. Of the/i favorable cases for complex Fi, Xi are also favorable for the occurrence of E. Of the/2 favorable cases for complex Fo, Xo are also favorable for the occurrence of E. Of the/3 favorable cases for complex F3, X3 are also favorable for the occurrence of E. The probability of the happening of E under the assumption that Fi exists, i. e., the relative probability: PyJ^E), is: COi Al "/l ^\ "A or in general: (a= 1, 2, 3, .-.). The total number of equally likely cases for the simultaneous occurrence of the event E with either one of the favorable cases 58 PROBABILITY A POSTERIORI. [41 for Fi, F2, Fz, ' ■ ■ is: Xi + X2 + X3 + • • • = ^K. The number of favorable cases for the simultaneous occurrence of Fi and E is Xi, for the simultaneous occurrence of Fo and E, X-:, • • • , etc. Hence: we have as measures for their corresponding probabilities But Xi = oji • J\, Xo = 002 • fi, , etc., and /i = ^■l • i, f-i = K-z • t, ■ ' y etc. Hence Xi = COi • K\ • t, X> = CO-) • K-y • t, • • • , etc Substituting these values in the above expression for Qi, Qo we get: as the respective probabilities that the obserxed event originated from the complexes 7'\, Fo, F^, ■ • • . Such probabilities are called a posteriori probabilities. Let us now for a moment investigate the above expression for Q\, Qi, • • • • The numerator in the expression for ^i is aci • coi. But Ki is simply the a priori probability for the existence of F\ while coi is the a ])riori productive probability of bringing forth the event observed from complex F\. The product /ci • coi is simply the relative probability Pf^(E), or the probability that the event E originated from Fi. In the denominator we have the expression '^k^o)^ (a = 1,2, • • • n) which is the total proba- bility to get A' from any of the com])lexes F^. From example 17 (Chapter IV) we know that the probability to get E exactly m times from Fi in w total trials is: 7)1 = (2)k-i • c^rd - co,y-^ and the probability to get E from any one of the complexes, F, 43] BA yes's rule. 59 m times out of n is: 2/Ja = ( 2 ) l^K-a • a;/'(l - coj"— (a = 1, 2, 3, • • •)• If, by actual observation, we know the event E to have happened exactly m times out of 7i, then tlie a posteriori i)r()habiHty that Fi was the origin is: ^^1= --r^TT (« = 1, 2, 3, . • .). (I) ^:) The factorials I j in numerator and denominator cancel each other of course. It will be noticed that, in the above proof, it is not assumed that the a posteriori proba})ility is j)roportional to the a priori probal)ility, an assumption usually made in the ordinary texts on algebra. 42. Bayes's Rule (Case II). — {S,i)eclnl Case. The a priori probabilities of existence of the different complexes are equal.) Sometimes the different complexes F may be of such special characters tha!; their a priori probabilities of existence are equal, i. e., Ki = K-2 = K3 = K:4 • • • Kn- In this case the equation (I) simjily reduces to: cor(i — (oi)""^ Equation (I) gives, however, the most general expression for Bayes's Rule which may be stated as follows: // a definite observed event, E, can originate from a certain series of mutually exclusive complcres, F, and if the actual occurrence of the event has been observed, then the probability that it originated from a specified complex or a specified group of complexes is also the " a posteriori " probability or probability of existence of the specified complex or group of complexes. 43. Determination of the Probabilities of Future Events Based Upon Actual Observations. — It happens frequently that 60 PROBABILITY A POSTERIORI. [43 our knowledge of the general domain of action is so incomplete, that we are not able to determine, a priori, the ])robability of the occurrence of a certain expected event. As we already have stated in the introduction to a posteriori probabilities, this is nearly always the case with problems wherein organic life enters as a determining factor or momentum. But the same state of affairs may also occur in the category of i)r()l)lcms relating to games of chance, which we have hitherto considered. Sui)pose we had an urn which was known to contain white and black balls only, but the actual ratio in which the balls of the two different colors were mixed, was unkiu)wn. With this knowledge beforehand, we should not be able to determine the ])robability for the drawing of a white ball. If, on the other hand, we knew, from actual experience by repeated obser\'ations, the results of former draw- ings from the same urn when the conditions in the general domain of action remained unchanged during each separate drawing, then these results might be used in the determination of the prob- ability of a specified event by future draA\ings. Our problem may be stated in its most general form as follows: Let Fa denote a certain state or condition in the general domain of action, which state or condition can appear onj}' in one or the other of the mutually exclusive forms: Fi, F-y, /"s, • • •. and not otherwise. Let the probability of existence of Fi, F-2, F^, • • • be K\, K2, K'3, • • • respectively, and when one of the complexes F\, Fo, Fz, • • • exists (occurs) let coi, coo, C03, • • • be the respective pro- ductive probabilities of bringing forth a specified event, E. If now, by actual observation, we know the event, E, to have happened exactly ni times out of n total trials (the conditions in the general domain of action being the same at each individual trial), what is then the probability that the event, E, will happen in the (n + l)th trial also? By Bayes's Rule we determined the "a posteriori " probabili- ties or the probabilities of existence of the complexes Fi, Fi, • • ■ as: fl ~ V.r . /., m/l , . \n—m ' ^2 ~ {a= 1, 2, 3, •••). In the in -f l)th trial E may happen from any one of the mutually 44] APPLICATION OK BAVEs's RULE. 61 exclusive complexes: /'\, F-i, P\, ■ ■ ■ whose respective probabilities in producing the event, E, are coi, ooo, (^3, •••• The addition theorem then gives us as the total ])robabiIity of the occurrence of Em the (/? + l)th trial: R^ = ^P,M^) = (?i . coi + (?, . CO, + Q, • C03 SK' „-a,„"'(l-coJ"--a). ^ ^ ^ ^ ^ (HI) ^K^ -CO, U — CO J If the a priori probabilities of existence are of equal magnitude (Case II) the factors k in the above expression cancel each other in numerator and denominator and we have ^ ScO.-d - CO J "-"CO. 2co/'(l -coj— • • ^^'^ 44. Examples on the Application of Bayes's Rule. — Example 21. — An urn contains two balls, white or l)lack or both kinds. What is the proba})ility of getting a white ball in the first draw- ing, and if this event has happened and the ball replaced, what is then the probability to get white in the following drawing? Three conditions are here possible in the urn. There may be 0, 1, or 2 white })alls. Each hyi)othelical condition has a proba- bility of existence equal to \, and the producti^'e probabilities for white are 0, | and 1 respectively. The total probability to get white is therefore: Pi = I • + i • i + i • 1 = i If we now draw a white ball then the probabilities that it came from the complexes: Fi, ¥■>, F3, respectively, are: Q . 1 1^1 1^1 2' 6 • 2' 3 These are also new existence probabilities of the three proba- bilities. The probability for white in second drawing is therefore . (0-^^)0+(J-^|)§+(i^i)l = f. This solution of the problem is, however, not a unique solution, because it is an arbitrary solution. It is arbitrary in this respect, that we have without further consideration given all three com- plexes the same probability of existence, \. We shall discuss 62 PROBABILITY A POSTERIORI. [45 this part of the question under the chapter on the criticism of Bayes's Rule. Example 22. — iVn urn contains fi^'e balls of which a part is known to be white and the rest black. A ball is drawn four times in succession and replaced after each drawing. By three of sucli (h-awiiiii's a white ball was obtained and l)y one drawing a black l)all. What is the probability that we will get a white ball in the fifth drawing? In regard to the contents of the urn the following four hypoth- eses are possible: ¥\: 4 Avhite, 1 black balls, ¥.: 3 " 2 " l\\ 2 " 3 " " ¥,: 1 " 4 " Since we do not know anything about the ratio of distribution of the different colored balls, we may by a direct application of the principle of insufficient reason regard the four complexes as equally probable, or: K\ = Ko = K3 = K'4 = J. If either Fi, Fo, F3 or Fi exists, the respective productive probabilities are: wi = I, C02 = f, aj3 = f , C04 = i. By a direct substitution in the formula: R = 2co/'(l - ojj"-^ (a = 1,2, 3, 4) for n = 4 and m = 3 we get: p _ (miw + gram + amm + anm) ^ ,, ^tm) + (im) + (im) + (im) ^*" 45. Criticism of Bayes's Rule, — In most English treatises on the theory of cliaiice the " a posteriori " determination of a mathematical probability is discussed under the socalled " in- verse probabilities." This somewhat misleading name was prob- ably first introduced by the eminent PvUglish mathematician and actuary, Augustus de Morgan. In the opening of the discussion 45] CRITICISM OP" BAYEb'.S RULE. 63 of a posteriori ])r()l);il)ilitic> in the tiiird chai)tcr of his treatise, " An Essay on rrol);il)ilities/" de Morgan says: " In the preceding chapter, we liave calcidated the chances of an event, knowing the circumstances under which it is to happen or faih We are noAV to place ourselves in an inverted position, we know the event, and ask what is the probabiHty which results from the event in favor of any set of circumstances under which the same might have happened." Is this now an inverse process? By the a priori or — as de Morgan prefers to call them, — the direct prob- abilities, we started from a definitely known condition and de- termined the probability for a future event, E, or what is the same, the probability of a specified future state of affairs. Here we start knowing the present condition and try to determine a past condition. The ])rocess apparently a])pears to be the inverse of the former, although they both are the same. We possess a definite knowledge of a certain condition and try to determine the probability of the existence of a specified state of affairs, in general different from the first condition, but whether this state of affairs occurred in the past or is to occur in the future has no bearing on our problem. In other words, time does not enter as a determining factor. And even if we were willing to admit the two processes of the determination of the different probabil- ities to be inverse, the probabilities themselves can not be said to be inverse. Nevertheless, this misleading name appears over and over again in examination papers in England and in America as a thoroughly embalmed corpse which ought to have been buried long ago. What is really needed, is a change of customary nomenclature in the whole theory of probability. Instead of direct and inverse, a priori and a posteriori probabilities, it would be more proper to si)eak about " prospective " and " retro- spective " probabilities in the application of Bayes's Rule. All probabilities are in reality determined by an empirical process. That there is a certain probability to throw a six with a die we only know after we have formed a definite conception of a die. The only probabilities which we perhaps rightly may name a priori are the arbitrary probabilities in purely mathematical problems where we assume an ideal state of affairs. " There is," to quote the Danish writer on logic. Dr. Kroman, " really 64 PROBABILITY A POSTERIORI. [46 more reason to doubt the a priori than the a posteriori probabil- ities, and it would be more natural and also more exact in the application of Bayes's Rule to speak about the actual or original and the new or gained probability." The discussion above has really no direct bearing on Bayes's Rule but was introduced in order to give the student a clearer understanding of the main principles underlying the whole deter- mination of a posteriori probabilities by means of actual experi- mental observations, and also to remove some obscure points. From his ordinary mathematical training every student of mathe- matics has an almost intuitive imderstanding of an inverse process. Naturally when he encounters again and again the customary heading: " inverse probabilities " in text-books he obtains from the very start — almost before he starts to read this particular chapter — an inverse idea of the sul)ject instead of the idea he really ought to have. Nowhere in continental texts on the theory of probabilities, will the reader be able to find the words direct and inverse applied in the same sense as in English texts since the introduction of these terms by de Morgan. We shall advise readers who have become accustomed to the old terms to pay no serious attention to them. 46. Theory Versus Practice. — In § 41 we reduced Bayes's Rule to its most general form: ^ - -K. • cord - coj"- ^" - i. -, -^, • • •)• This is an exact expression for the rule, but it is at the same time almost impossible to employ it in ])ractice. Only in a few exceptional cases do we know, a priori, the different values of the often numerous probabilities of existence k„, of the complexes F^, and in order to api)ly the rule with exact results we require here sufficient facts and information about the dilfcrcnt com- plexes of causes from which the observed event. A', originated. Bayes deduced the rule from special examples resulting from drawings of balls of different colors from an urn where the different complexes of causes were materially existent. The i)robability of a cause or a certain com])Iex of causes did not here mean the probability of existence of such a complex but the i)robability ■46] THEORY VERSUS PRACTICE. 65 that the observed event originated from this particuhir complex. In order to eiucichite this statement we give following sinipU; exami)le: Example 23. — A bag contains i coins, of which one is coined with two heads, the other three having both head and tail. A coin is drawn at random and tossed four times in succession and each time head turns up. What is the probability that it was the coin with two heads? The two complexes /\ and Fo, whicii may produce the event, E, are: Fi, the coin with two heads, and /-'•>, an ordinary coin. The probability of existence of /\ is the probability of drawing the single coin with two heads which is equal to j, the probability of existence for the other complex, F-y, is equal to |. The respective productive probabilities are 1 and ^. Thus ki = ^, Ko = 4 , coi = 1 and coo = ^. Substituting these values in formula (I) (n = 4, m = 4), we get: Q = axv)^{\xv + ix ay) = i-^u = H- But in most cases we do not know anything about the material existence of the complexes of causes from which the event, E, originated. On the contrary, we are forced to form a hypothesis about their actual existence. To start with a simple case we take example 21 of § 44. We assumed here three equally possible conditions in the urn before the drawings, namely the presence of 0, 1, or 2 white balls. From this assumption we foiuid the probability to get a white ball in the second drawing, after we had i)re\'iously drawn a white ball and then put it back in the urn before the second drawing, to be equal to |. As we already remarked, this solution is not unique because it is an arbitrary solution. It is arbitrary to assign, without any consideration whatsoever, ^ as the probability of existence to each of the three conditions. Let us suppose that each of the two })alls bore the numbers 1 and 2 respectively. W^e may then form the following equally Hkely conditions: ?>i6o, hiir-2, h-iU'i, iViWo, each condition having an a })riori probability of existence equal to J and a productive probability for the drawing of a white 6 66 PROBABILITY A POSTERIORI. [46 ball equal to: 0, ^, ^ and 1 respectively. Thus: K'l = K2 = 'v'S — '<4 = 4 and The respective a posteriori probabilities, that is the new or gained probabilities of the four hyi)othetical conditions, become now by tiie ai)plieati()n of Bayes's Rule (Formula II): Qi= I f?2 = I - 2, r^3 = ^ - 2, (?4 = i Hence the pr()l)abiUty for white in the second drawing is: / -co "'(1 - a)J"~'"co„\ (Formula IV: 7? = V^lvi ^\;r^) \ 2a)„'"(l - wj"-'" / 2a),-(l - wj' 7? = - 2 + (i - 2) + (i - 2) + (1^2) In the first solution we got I for the same probability. Which answer is now the true one? Neither one I The true answer to the problem is tluit it is not given in such a form tliat the last question — the probability of getting a wliite ])all in the second drawing — may be settled without any doubt. The answer must be conditional. Following the first hypothesis we got |, while the second hypothesis gives | as the answer. We next proceed to example 22 which is abnost identical in form to the first one, the only difference l)cing a greater variety of hypothetical conditions. We started here with the folh)wing four hypotheses: Fi: 4 white, 1 black ball, lu: 3 wliite, 2 black, F.,: 2 white, 3 black and F4: 1 white and 4 black balls, assigning | as the hy- pothetical existence ])roba})ility. By marking the 5 balls similarly as in the last example, with the numbers from 1 to 5 we may form the comj)lexes: Fi: 4 white and 1 black ball in (1) ways, Fo: ?> " " 2 " balls" il) " F3: 2 " " .3 " " " (|) " ^4=1 " " 4 " " " (f) " This gives us a total of 5 + 10 + 10 + 5 = 30 different complexes. Assuming all of these complexes equally likely 47] PROBABILITIES FXPRESSED BY INTEGRALS. 67 to occur, we got following probabilities of existence and pro- ductive probabilities: Ki = K2 = K3 = Ki = ' ' ' — >^so — '5^ coi = CO) = C03 = o).} = ojs = I (Productive prob. for Fi) CO6 = C07 = coh = • • • = Wis = I (Productive prob. for F2) CO 16 = 00 17 = • • . = 0)25 = I (Productive prob. for Fs) o)->& = CO27 = C028 = CO29 = C1J30 = ^ (Productive prob. for F4). The total probability of getting a white ball in the second drawing is now —^t,\ ^ — (a = 1, 2, 3, • • •, 30). Sco^ll-ooJ Actual substitution of the above values of co in this formula gives us the final result as: R = ||. 47. Probabilities Expressed by Integrals. — By making an ex- tended use of the infinitesimal calculus Mr. Bing and Dr. Kroman in their memoirs arrived at much more ambiguous results through an application of the rule of Bayes. Starting with the funda- mental rule as given in equation (I) in § 41, we may at times en- counter somewhat simpler conditions inside the domain of causes. The total comi)lex of actions may embrace a large number of smaller sub-comi)lexes construed in such a way that the change from one complex to another may be regarded as a continuous process, so that the productive probabilities are increased by an infinitely small quantity from a certain lower limit, a, to an upper limit, b. Denoting such continuously in- creasing probabilities by i' and the corresponding small proba- bilities of existence by vdv, we have as the total probability of obtaining E from any one of the minor complexes with a pro- ductive probability between a and ^ (a ^ a, ^ ^ b) p = I uvdv. •'a The probability that when E has happened it originated from one of those minor complexes, or the probability of existence of some one of those complexes is: I uvdv p _ ^/a uvdv i: 68 PROBABILITY A POSTERIORI. 47 The situation may be still more simplified by the following con- siderations. In the continuous total complex between the limits a and h we have altogether situated {h — a)ldv individual minor complexes. If we assume all of these complexes to possess the same probability of existence, we must have: iidv dv b — a' The two formulas then take on the form: I vdv and A still more specialized form is obtained by letting a = and b = I which gives: -f vdv and P — .1 7" vdv vdv The above formulas may perhaps be made more intelligible to the reader by a geometrical illustration. Let the various productive ])robabirities, i\ be i)lotte(l along the A" axis in a Cartesian coordinate system in the interval from a to b (a < h). To any one of these probabilities say ^v there corresponds a certain probability of existence, iir, represented by a }' ordinate. In the same manner the next following pro- ductive probability, jv+i, will have a probability of existence 47] I'liOHABILITIES EXPRESSED BY INTEGRALS. 69 represented })y an ordinate ?/r+i. It is now possible to represent the varions //'s by means of areas instead of line ordinates. Thus the probability of existence, Ur, is in the figure represented by the small shaded rectangle, with a base equal to and an altitude of Ur, the total area being equal to AvrUr. That this is so, follows from the well-known elementary theorem from geometry that areas of rectangles with equal bases are directly proportional to their altitudes. The sum of the difi'erent u'h is thus in the figure re])resented as the sum areas of the various small rectangles in the staircase shajx-d histograph. Now ac- cording to our assum])tion v is a continuous function in the interval from a to b. We may, therefore, divide this interval, b — a, into n smaller equal intervals. Let b — a Vr+l — l\ = ACr = n be one of these smaller divisions. By choosing n sufficiently large, (b — a);n or Av becomes a very small quantity and by letting w approach infinity as a limiting value we have ,. /^ — « ,. inn u — — = hm uAv = udv. In this case the histograph is replaced by a continuous curve and udv is the probability of existence that the i)roductive probability is enclosed between /" and i- + dc.^ The probability to get E from any one of the complexes is evidently given by the total area of the small rectangles, or in the continuous case by means of the integral: r uvdv 1 A more rigorous analysis would be as follows: We plot along the abscissa axis intervals of the length t so that the middle of the interval has a distance from the origin equal to an integral nmltiple of e. If now e is chosen suffi- ciently small, we may regard the i)robability of existence of ;/, for values of the variable v between rt — \t and re + 5^ as a constant and the probability that V falls between the limits re — \e and r« + |e may hence be expressed as eur. When e approaches as a limiting value this expression becomes vdv. See the similar discussion under frequency curves. 70 PROBABILITY A POSTERIORI. [48 In the same way the probabiHty that E originated from any of the complexes between a and /3 is: r IV dv f. b uvdv The special case a = and 6=1 needs no further commentary. We are now in a position to consider the examples of Bing and Kroman, Any student familiar with multiple integration will find no difficulty in the following analysis. For the benefit of readers to whom the evaluation of the various integrals may seem somewhat difficult, we may refer to the addenda at the close of this treatise or to any standard treatise on the calculus as, for instance, Williamson's " Integral Calculus." 48. Example 24. — An urn contains a very large number of similarly shaped balls. In 10 successive drawings (with replace- ments) we have obtained 7 with the number 1, 2 with the number 2, and one having the number 3. Wliat is the probability to obtain a ball with another number in the following drawing? We must here distinguish between 4 kinds of l)alls, namely balls marked 1, 2, 3, or " other balls." A general scheme of distribution of the balls in the urn may be given through the following scheme: nx balls marked with the number 1, ny o nz " " " " " 3 and nt = n{\ — X — y — z) other balls. Here .r, y, z and / represent the respective productive probabil- ities. If we now let all such probabilities assume all possible values between and 1 with intervals of 1/??, we obtain the pos- sible conditions in the total complex of actions. Each of these conditions has a probability of existence, .v, and the ])roductive probabilities x, y, z, and 1 — .r — // — z. The original probability for 7 ones, 2 twos and 1 three in 10 drawings is: 10' 48] EXAMPLE 24. 71 Now when n is a very l;irii ' u ■ x'' ■ y~ - z ■ dx ■ dy • dz, where p = 1 — .1- and q — \ — x — y. If now the above event has happened, then the probal)ility to get a different marked ball in the 11th drawing is: ml u ' .1-^ • y- • 2(1 — X — y — z) ■ dx • dy • dz H = I I I '' ■ •^■" • i/' ' 2 • dx ■ dy ■ dz It is, however, quite impossible to evaluate the above integral without knowing the form of the function ?/; but unfortunately our information at hand tells us absolutely nothing in regard to this. Perhaps the balls bear the numbers 1, 2 and 3 only, or perhaps there is an equal distribution up to 10, ()()() or any other number. Our information is reall\ so insufficient that it is quite hopeless to atteni])t a calculation of the a posteriori probability. IMany adherents of the inverse probability method venture, however, boldly forth with tlie following solution based upon the perfectly arbitrary hypothesis that all the ?/'s are of equal magni- tude. This gives the special integral: m.r" • y- • 2(1 — X — y — z), dx - dy • dz — - v'O tVo «^0 y~ ' z ■ dx • dy • dz where once more it must be remembered that x+ y+z^l. In this case the limits of x are and 1, those of y are and I — x and those of 2 are and 1 — .r — y. This is a well-known form of the triple integral which may be evaluated bv means of Dirichlet's Theorem: 72 PROBABILITY A POSTERIORI. [49 ^1 ^i-x ^.-.-j, r(6)r(//or(») Jo Jo Jo r(H-6+m+^0 (See Williamson's Calculus.) Remembering the well-known relation between gamma func- tions and factorials, viz. T(n + 1) = ??!, we find by a mere substitution in the integral, the value of the probability in question to be 1:14. Another and cquidly plausible result is obtained by a slightly different wording of the problem. Ten successive drawings have resulted in balls marked 1, 2, or 3. What is the probability to obtain a ball not bearing such a number in the 11th drawing? This probability is given by the formula. r v'%1 - v)dv ^ = 1 • 1*^ Jo Quite a different result from the one given above. 49. Example 25 — Bing's Paradox. — A still more astonishing paradox is produced by Bing when he gi^•es an example of Bayes's Rule to a problem from mortality statistics. A mortality table gives the ratio of the number of persons living during a certain period, to the number living at the beginning of this period, all persons being of the same age. By recording the deaths during the specified period (say one year) it has been ascertained that of .v persons, say forty years of age at the beginning of the period, m have died during the period. The observed ratio is then (.9 — m)ls. If s is a very large number this ratio may (as we shall have occasion to prove at a later stage) be taken as an approximation of the true ratio of probability of sur\-ival during the period. If .v is not sufficiently large the belie\'(M's in the inverse theory ought to be able to evaluate this ratio by an ap])lication of Bayes's Rule, by means of an analysis similar to the one as follows: Let // be the general symbol for the probability of a forty- year-old ])(Tson being alive one year from hence. Eadi of such persons will in general be subject to different conditions, and the general symbol, y, will therefore have to be understood as the 49] EXAMPLE 25. bing's paradox. 73 symbol for all the ])()ssil)l(' productive probability values changing from to 1 by a continuous process. Assuming s a very large number each condition will have a probability of existence equal to udy. We may now ask: What is the probability that the rate of survival of a group of s persons aged 40 is situated between tiie limits a and /3? The answer according to Baves's Rule is: I ?/""'(! — y)"'ndy .' ^ . (I) 7/"^(l — y)"'udy Let us furthermore divide the whole year into two equal parts and let y\ be the probability of surviving the first half year, 2/2 the probability of surviving the second half, and 7/1 • dy\, 111 • dy2 the corresponding probabilities of existence. Then the respective a posteriori probabilities for ?/] and ?/2 are: 2/i«-'"'(l - yiT' uidyi I 1 ?/i^~™>(l — yi)"''uidyi and j!/2^~"'(l — y'iY''ii'idy2 f (nil + m-i — m) ?/2*~™ (1 — y2)"'-vidy2 '0 (nil and m2 represent the number of deaths in the respective half years.) The probability that both ?/i and y^ are true is then according to the multiplication theorem: ^/i-'-^Hl - yi)""n\dyiy-f~'"{l — y-d'^-U'^dyi j y^^-m^ii - y{)-^Hiidyi ( 1/2*^(1 - y2Thi2dy2 Jo Jo where y = y\ • y-i. The probability that the probability of survival for a full year, y, is situated between the limits a and jS is therefore: a "(1 - ?/i)""//2'~"'(l - 2/2)"''Mi • ?/2 • dyx ■ dlj2 I .vi^^'(l - yi)""iti(Jy\ I y-i^'d — yiT-n^dy Jq ' ' Jo (11) 74 PROBABILITY A POSTERIORI. [49 where the Hmits in the double integral in the numerator are de- termined by the relation: Choosing the principle of insufficient reason as the basis of our calculations, merely assuming that all possible events are, in the absence of any grounds for inference, equally likely, the various quantities cx])ressed by the general symbol, ?/, become equal and constant and cancel each other in numerator and denominator, which brings the a posteriori probabilities ex- pressed by (I) and (II) to the forms: £ y^il - ijTdy J (III) and // yr'^'H - y,ryr'"'-'"^-il - y2)'^dy, ■ dy^ f ^I'-'-Ki - yiY"' \ y-f-^0^ - y2)'""-dy,-dy^ Jq *Jo (IV) where the limits in the numerator in the latter expression are determined by the relation : a < y\y2 < |8. Letting y yi = ~~ and then 1 — 2/1 = 2(1 - y) this latter expression may after a simple substitution be brought to the form: fyr-'Hl - yirdyrj y.'-^il - y.rdy^ (V) (See appendix.) ]\Ir. Bing now puts the further question : A\Tiat is the probability that a new person forty years of age, entering the original large 49] EXAMPLE 25. bing's paradox. 75 group of s persons, will survive one year, when we assume mi = ni'i = 0? (Ill) gives the answer: X Formula (V), on the other hand, gives us I in" (hi I I U2(hj2 ^= (::;)• As the above analysis is perfeetly general, we might equally well have ap])lied it to each of the semi-annual periods, which would give us an a posteriori probability of survival equal to I I ., ) for each half year, or a compound probability of I I .^ I for the whole year. Extending this process it is easily seen that by dividing the year into parts, we shall have (5 4- 1 \ " -3^-^ I as the final probability a posteriori that a forty-year- old person will reach the age of forty-one. By letting n increase indefinitely the above quantity a])])roac]ies as its limiting value and we obtain thus the ])arad()x of Bing: If, among a large group of s cqudlh/ old prrsoih'i, we have observed no deaths during a full calendar year then another person of the same age outside the group is sure to die inside the calendar year. This is evidently a very strange result, and yet, working on the basis of the principle of insufficient reason, the mathematical deductions and formula exhibit no errors. Mr. Bing disposes of the whole matter by simply denying the validity and existence of a ])()steriori probabilities. Dr. Kroman on the other hand defends Bayes's Rule. " Mathematics," Kroman says, " is — as Huxley has justly remarked — an ex- ceedingly fine mill stone, but one must not expect to get wheat flour after having put oats in the quern." According to the 7G I'ROHABILITY A POSTERIORI. [50 Danish scliolar the paradox is due to tlie use of a wrong formula. We ought to have used the general formula (11) instead of formula (V) whieh is a special case. In the general formula we encounter the functions ?/, denoting the probability existence of the various productix^e probabilities y. As we do not know anything about this function // it is lu)peless to attempt a calculation. This brings the criticism down to the fundamental question whether we shall build the theory of probabilities on the principle of " cogent reason " or the principle of " insufficient reason." 50. Conclusion. — Contradictory results of a similar kind to the ones given above have led several eminent mathematicians to a complete denunciation of the laws underlying a posteriori probabilities. Professor Chrystal, especially, becomes extremely severe in his criticism in the previously mentioned address before the Actuarial Society of Edinburgh. He advises " practical people like the actuaries, much though the}' may justly respect Laplace, not to air his weaknesses in their annual examinations. The indiscretions of a great man should be quietly allowed to be forgotten." Although one may heartily agree with Professor Chrystal's candid attack on the belief in authority, too often prevailing among mathematical students, I think — aside from the fact that the rule was originally given by Bayes — that the great F'rench savant has been accused unjustly as the following remarks perhaps may tend to show. In our statement of Bayes's Rule, we followed an exact mathe- matical method, and the final formula (I) is theoretically as correct as any previously demonstrated in this work. The customary definition of a mathematical probability as the ratio of equally favorable to coordinated possible cases, is not done away with in this new kind of probabilities; the former are found in the numerator and the latter in the denominator; and if we take care that each of the particular formulas, with its definite requirements, is applied to its particular case, we do not go beyond ])ure mathematics or logic. But are we able to get complete and exact information about these re({uircnients? In the example of the tossing of a coin with two heads, this informa- tion was at hand. Here we were able to enumerate exactly the difi'erent mutually exclusive causes from which the observed 50] CONCLUSION. 77 event originated. We were also able to determine the exact quantitative measures for the probabilities, k, that these com- plexes existed as well as the different productive pr()l)id)ilities, a;. Here the most rigid requirements could be satisfied, and the rule gave therefore a true answer. In the other examples we encountered a different state of affairs. Here we were not able to enumerate directly the dif- ferent complexes of causes from which the event originated, but were forced to form different and arbitrary hyi)()tlieses about the complexes of origin, /', and each hypothesis gave, in general, a different result. Furthermore, we assumed a priori that the different probabilities of the actual existence of the complexes were all equal in magnitude, and it was, therefore, the special formula (H) we emi)l()yed in the determination of the a posteriori probabilities. In this formula, the different ks do not enter at all as a determining factor; only the i)roductive probabilities, co, are considered. The assumption that all the k's are equal in magnitude is based upon the principle of insufficient reason, or as Boole calls it, " the equal distribution of ignorance." The principle of equal distribution of ignorance makes in the case of continuously varying productive probabilities, v, the function, //, of the proba])ilities of existence of the various complexes equal to a constant quantity. In other words, the curve in Fig. 1, is replaced by a straight line of the form, it — k. Now, as a matter of fact, we possess in most cases, some partial knowledge of the complexes of action producing the event in question. This partial knowledge — although far from complete enough to make a rigorous use of formula (I) — is nevertheless sufficient to justify us in discarding completely any general hypothesis assuming such simple conditions as above. Such partial knowledge is, for instance, found in the Paradox of Bing. Here the rather absurd hypothesis was made that the possible values of the probability of surviving a certain period were equally probable. In other words, it is equally probable that there will die 0, 1, 2, • • -, or s persons in the particular period. " Common sense, however, tells us that it is far more probable that, for instance, 90 jicr cent, of a large number of forty-year-old persons will survive the i)eriod than no one or every one will die 78 PROBABILITY A POSTERIORI. [50 in the same period " (Kromaii). The indiscreet use of formula (II) therefore naturally leads to paradoxical results. On the other hand, the fallacy of the luippy-j:;o-lucky computers, em- ploying the special case (II) of Bayes's Rule, as well as the critics of Laplace, lies in their failure to make a proper distinction between " equal distribution of ignorance " and " partial cogent reason," which latter expression properly may be termed " an unequal distribution of ignorance." If, despite the actual presence of such unequal distribution of ignorance, we still insist in using the special formula (II), which is only to be used in the case of an equal distribution of ignorance, it is no wonder we encounter ambiguous answers. Not the rule itself, its discoverer, or Laplace, but the indiscreet computer is the one to blame. Messrs. Bing, Venn and Chrystal, in their various criticisms, have filled the quern with some rather "wild oats" and expected to get wheat flour; and that one of those critics in his disappoint- ment in not getting the expected flour should blame Laplace, is hardly just. So much for the principle of " equal distribution of igno- rance." It may be of interest to see how matters tm-n out when we like von Kries insist upon the principle of " cogent reason " as the true basis of our computations. The reader will quite readily see that a rigorous application of the Rule of Bayes in its most general form as given by fornuila (I) really tacitly assumes this very principle. In formula (I), we require not alone an exact enumeration of the various complexes from which the observed event may originate, but also an exact and complete information about the structure of such complexes in order to evaluate their various probabilities of existence. If such informa- tion is present, we can meet even the most stringent requirements of the general formula, and we will get a correct answer. But in the vast majority of cases, not to say all cases, such information is not at hand, and any attempt to make a computation l)y means of Bayes's Rule must be regarded as hopeless. We may, how- ever, again remark that very seldom we are in complete ignorance of the conditions of the complexes, w^hich is the same thing as saying that we are not in a position to employ the principle of equal distribution of ignorance in a rigorous manner. From 50] CONCLUSION. 79 other experiments on tlie same kind of event, or from other sources, we may have attained some partial information, even if insufficient to employ the ])rinciple of cogent reason. Is such information now to be comi)letely ignored in an attempt to give a reasonable, although approximate answer? It is but natural that the mathematician should attempt to obtain as much of such information as possible and use it in the evaluation of the various probabilities of existence. Thus for instance, if, in the Paradox of Bing, we had observed that the probability of survival for a forty-year-old person never had been below .75 and never above .95, it would be but reasonable to substitute those limits in their proper integrals in order to attain an approximate answer. To illustrate this somewhat subjective determination of an a posteriori probability, we take another example from the memoirs of Bing and Kronian. Example (24)- — A merchant receives a cargo of 100,000 pieces of fruit. If every single fruit is untainted, the value of the cargo may be put at 10,000 Kroner. On the other hand, any part of the cargo more or less tainted is considered worthless. The merchant lias never before received a similar cargo and does not know how the fruit has been afl'ected by travel. As samples, he has selected 30 pieces picked at random from the cargo and all samples proved to be fresh. lie asks a mathematician what value he can put on the cargo. If the mathematician uses the special formula (II), assum- ing an equal distribution of ignorance, therefore assuming that it is equally probable that for example none, 5,000 or all the individual pieces of fruit were untainted, the answer is: 10,000 ^^ = 9687.5 Kroner. ( f^¥v Jo If we use the trne rule, the a posteriori probability of the whole- someness of the cargo is given by the integral: >i i 1 80 PROBABILITY A POSTERIORI. [50 where v is tlie general expression for a possible probability of wholesomeness between and 1 and udv the corresponding proba- bility of existence. Now if the mathematician has no complete information as to this particular function, ?/, it would be foolish if him to attempt a calculation, since the hy])othcsis of an equal probability of existence for all possible values of v evidently gives an arbitrary and perhaps a very erroneous result. On the other hand, the computer may possibly have access to some partial information. Perhaps tlie merchant has rec(>ived fruit of a similar kind or heard about cargoes of this particular kind of fruit received by other dealers. If now the merchant were able to inform the computer that in a great number of similar cases the probability of wholesomeness had been between 0.9 and 1 with an approximately even distribution, while it never had been below 0.9, then nothing would hinder the mathematician to present the following comi)utation: ^''' . = 0.9726 v^^le and tell the merchant that on the basis of the information given 9,726 Kroner would be a fair price for the cargo. This is really the point of view taken by the English mathe- matician. Professor Karl Pearson, one of the ablest writers on mathematical statistics of the present time, when he says: "I start, as most writers on mathematics liave done, with ' the equal distribution of ignorance ' or I assume the truth of Bayes's Theorem. I hold tliis theorem not as rigidly demonstrated, but I think with Edgeworth that the hypothesis of the equal dis- tribution of ignorance is, within the limits of ])ractical life, justified l)y our experience of statistical ratios, which are unknown, i. e., such ratios do not tend to cluster markedly round any ])articuhu- j)()iiit." To sum up the above remarks: Theoretically Bayes's Rule is true. If we are able to enumerate and determine the probabilities of existence of the complexes of origin it will also give true results in practice. If we are justified in assuming the principle 50 ] CONCLUSION. 81 of " insufficicMit reason " or " eciual distrihiition of i<^iioraiice " as the basis for our calculations, formula (II) may be employed with exact results after a rigid enumeration of the complexes. If the principle of " cogent reason " is required as the basis, an exact computation is in general hopeless, and we can only after having obtained ])artial subjecti\'c infoimation give an approxi- mate answer. Witli these remarks we shall conclude the elementar\' dis- cussion of the merely theoretical ])art of the subject, '^riie follow- ing chapters require in most cases a knowledge of the infinitesimal calculus, and many of the questions discussed above will appear in a new and instructive light by this treatment. CHAPTER VII. THE LAW OF LARGE NUMBERS. 51. A Priori and Empirical Probabilities. — In the previous chapters we limited ourselves to the discussion of such mathe- matical probabilities, where we, a priori, on account of our knowledge of the various domains or complexes of actions, were able to enumerate the respective favorable and unfavorable possibilities associated with the occurrence or non-occurrence of the event in question. " The real importance of tHe theory of probability in regard to mass phenomena consists, however, in determining the mathematical relations of the various proba- bilities not in a deductive, but in an empirical manner — without an a priori exhaustive knowledge of the mutual relations and actions between cause and effect — by nteans of statistical enumeration of the frequency of the observed event. The conception of a probability finds its justification in the close relation between the mathematical probabilities and relative frequencies as determined in a purely empirical way. This relation is established by means of the famous Law of Large Nimibers " (A. A. Tschuprow). To return to our original definition of a mathematical proba- bility as the ratio of the favorable to the coordinated equally possible cases, we first notice that this definition is wholly arbitrary like many mathematical definitions. The contention of Stuart Mill that every definition contains an axiom is rather far stretched. In mathematics a definition does not necessarily need to be metaphysical. A striking example is offered in mechanics by the definitions of force as given by Lagrange and Kirchhoff. What is force? " Force," Lagrange says, " is a cause which tends to produce motion." Kirchhoff on the other hand tells us that force is the product of mass and acceleration. Lagrange's definition is wholly metaphysical. Whenever a definition is to be of use in a purely exact science such as mathe- matics, it must teach us how to measure the particular phe- nomena which we are investigating. Thus, to quote Poincare, 82 51 ] A PRIORI AND EMPIRICAL PROBABILITIES. 83 " it is not necessary that the definition tells us what force really is, whether it is a cause or the effect of motion." An analogous case is offered in the criticism of a mathematical probability as defined by Laplace, and the attempts to place the whole theory of probabilities on a purely empirical basis by Stuart Mill, Venn and Chrystal. These writers contend " that probability is not an attribute of any particular event happening on any particular occasion. Unless an event can happen, or be conceived to happen a great many times, there is no sense in speaking of its probability." The whole attack is directed against the defiuition of a mathematical probability in a simjle trial which definition, evidently by the empiricists, is regarded as having no sense. The word " sense " must evidently be con- sidered as having a purely metaphysical meaning. In the same manner Kirchhoff' s definition might be dismissed as having no sense, since it would seem as difficult to conceive force as a purely mathematical i)roduct of two factors, mass and acceleration, as it is to conceive the definition of a mathematical probability as a ratio. The metaphysical trend of thought of the above writers is shown in their various definitions of the probability of an event. Mill defines it merely as the relative frequency of happenings inside a large number of trials, and Venn gives a similar defini- tion, while Chrystal gives the following: " If, on taking any very large number N out of a series of cases in which an event, E, is in question, E happens on piV occasions, the probability of the event, E, is said to be p." Let us, for a moment, look more closely into these statements. Any definition, if it bears its name rightly, must mean the same to all persons. Now, as a matter of fact, the vagueness in a half metaphorical term like " any very large number " illustrates its weakness. The question immediately confronts us " what is a very large number? " Is it 100, 1,000 or perhaps 1,000,000? A fixed universal standard for the value of N seems out of the question and the definition — although perhaps readily grasped in a " general way " — can hardly be said to be happily chosen. Another, and perfectly rigorous definition, is the following one given by the Danish astronomer and actuary, T. N. Thiele. 84 THE LAW OF LARGE NUMBERS. [51 Thiele tells us that " common usage " has assifj;ned the word probability as the name "for the limiting value of the relative frequency of an event, when the number of observations (trials), under which the event happens, ai)j)r()ach infinity as a limit." A similar definition is later on given by the American actuary R. Henderson, who says: " The numerical measure which has been universally adopted for the probability of an event under given circumstances is the idtimate value, as the number of cases is indefinitely increased, of the ratio of the number of times the event happens under those circumstances to the total possible number of times." There is nothing ambiguous or vague in these definitions. Infinity, taken in a purely quantitative sense, has a perfectly uniform meaning in mathematics. The new definition differs, however, radically from our customary definition of a mathematical a priori probability. We cannot, therefore, agree with Mr. Henderson when he continues " the measure there given has been universally adopted and this holds true in spite of the fact that the rule has been stated in ways which on their face differ widely from that above given. The one most commonly given is that if an event can happen in a ways and fail in b ways all of w'hich are equally likely, the probability of the event is the ratio of a to the sum of a and b. It is readily seen that if we read into this statement the meaning of the words " equally likely," this measure, so far as it goes, reduces to a particular case of that given above." In order to investigate this statement somewhat more closely, let us try to measure the probability of throwing head with an ordinary coin by both our old definition of a mathematical probability and the definition by Mr. Henderson of what we shall term an empirical probability. Denoting the first kind of probability by P{E) and the second by P'{E) we have in ordinary symbols PiE) = h P\E) = lim F(E, v) 11=00 where the symbol F{E, v) denotes the relative frequency of the event, E, in v total trials. No a priori knowledge will tell us offhand if P'{E) will approach | as its ultimate value. The 52] EXTENT AND USAGE OF BOTH METHODS. 85 two methods are radically different. By the first method tlie determination of the mimerical measure of a probability depends simi)ly on our ability to jud^n' and sej^rej^ate the equally possi})le eases into cases faAorable and unfavorable to the event E. By tlie second method the (h^termination of the ])rol)ability de])ends, not alone on the segref:;ation and consequent enumeration of the favorable from the total cases, but chiefly on the extent of our observations or trials on the event in question. 52. Extent and Usage of Both Methods. — Before enterin}^ into a more detailed discussion of the actual quantitative comparison of the two methods, it mi<2;ht be of use to compare their various extent of usai^e. In this res])ect the empirical method is vastly sui)erior to the a ])riori. A rigorous aj)plication of the a ])riori method, as far as concrete problems a ])urely deductive process, as illustrated by Bayes's Rule in the earlier chapters, leads to ])arad()xical residts. Our a ])riori knowledge of the complexes of causes governing death or snrvi\al is so incomplete that even a qualitative — not to speak of a quanti- tative — judgment is out of the (piestion. The empiricid n:iethod shows us at least a way to obtain a measure for the probability of the event in (juestion. By observing during a period of a yejir an infinite number of forty-year-old ])ersons of wliom, after an exhaustive qualitative in\'estigation, we are led to believe that their present conditions as far as health, social occupation, en- vironments, etc., are concerned are equally similar, we may by an enumeration of those who died during the year obtain the desired ratio as defined V)y P'{E). Of course, ol)servation an infinite number is ])ractically impossible. An approximate ratio may be formed by taking a finite, but a large, number of cases under observation. But how large a number? This very question leads straightforward to another problem, namely the quantitative determination of the range of variance between the approximate ratio and the ideal ultimate ratio as defined bv 86 THE LAW OF LARGE NUMBERS. [52 the relation P'iE) = lim FiE, v). Since it is impossible to make an infinite number of observations we cannot find the exact vahie of the ran is the unknown quantity, we have also found a means of determining the value of /; in known quantities. Our next question is — What is the probability that the absolute value of the difference between p and the relative frequency of the event as expressed by the ratio of a to s does not exceed a previously assigned quantity? Or the probability that a X? Now, as the reader will see later, we shall prove that lim F{E, v) = P{E) = p. v=x> It must, however, be remembered that this result is reached by a mathematical deduction, based upon the postulate of mathe- matical probabilities, and not in the manner as suggested in the above statement by Mr. Henderson. It is only after having established such purely quantitative relations that we are entitled to extend the laws of mathematical probabilities as deduced in the earlier chapters to other problems than the simple problems of games of chance. 53. Average a Priori Probabilities. — In the previous para- graphs of this chapter, another important matter is to be noted, namely the assumption that the complex of causes producing the event in question remains constant during the repeated trials (observations), or, stated in other words the mathematical a priori probability remains constant. Under this limitation the extension of the laws of mathematical probabilities would have but a very limited practical application. In all statistical mass phenomena such an ideal state of affairs is rather a verv 88 THE LAW OF LARGE NUMBERS. [54 rare exception. If we consider an ordinary mortality investiga- tion we know with absolute certainty that no two persons are identically alike as far as health, occupation, environment and numerous other thin(i;s are concerned. Thus the postulated mathematical probability for death or survival during a whole calendar year will in general be different for each person. We may, however, conceive an average probability of survival for a full year defined by the relation V\ + /J2 + 7^3 + • • • Ps 2/? ^' = . ~ = T' where p\, p-i, pz, • • • are the postulated probabilities of each individual u!ider observation. Our task is now to find: 1. An algebraic relation between the average probability as defined above, the absolute frequency a and the total number of observations (trials) s, 2. The same relation when s approaches a as its ultimate value, 3. The probability of the existence of the inequality, a ^X, where a denotes the absolute frequency- of the occurrence of the event, s the total number of observations (trials) and X an ar- bitrary constant. 54. The Theory of Dispersion. — As we mentioned before the empirical ratio ajs represents only an ap})roximation of the ideal ultimate value of lim F{E, r). If we now make a series of observations (trials) on the occurrence of a certain event E, such that instead of a single set of observations of s individual ob- servations we take A^ such sets, we shall have A^ relative frequency ratios : ,f ' ,V ' ,9 ' S ' Since the ratios are a])])r()ximations only of the ultimate ratio they will in general exhibit discrepancies as to tlieir numerical values and may be regarded as A"^ different empirical a]))>roxima- tions. "^I'he question now arises how these various empirical ratios group themselves around the value of lim F{E, r). The dis- 55] HISTORICAL DEVELOPMENT OF LAW OF LARGE NUMBERS. 89 tribution of the empirical ratios around the ultimate ratio is by- Lexis called " dispersion." 55. Historical Development of the Law of Large Numbers. — The first mathematician to investigate the problems we have roughly outlined in the previous paragraphs was the renowned Jacob Bernoulli in the classic, " Ars Conjectandi," which rightly may be classified as one of the most important contributions on the subject. Bernoulli's researches culminate in the theorem which bears his name and forms the corner-stone of modern mathematical statistics. That Bernoulli fully realized the great practical importance of these investigations is proven by the heading of the fourth part of his book which runs as follows: " Artis Conjectandi Pars Quarta, tradens usum et applicationem praecedentis doctrinae in civilibus et oeconomicis." It is also here that we first encounter the terms " a priori " and " a pos- teriori " probabilities. Bernoulli's researches were limited to such cases where the a priori probabilities remained constant during the series or the whole sets of series of observations. Poisson, a French mathematician, treated later in a series of memoirs the more general case where the a priori probabilities varied with each individual trial. He also introduced the technical term, " Law of Large Numbers " (" Loi des Grand Xombres "). Finally Lexis through the publication in 1877 of his brochure, " Zur Theorie der ]Massenerscheinungen der menschlichen Gesell- schaft," treated the dispersion theory and forged the closing link of the chain connecting the theory of a priori probabilities and empirical frequency ratios. Of late years the Russian mathe- matician, Tchebycheff, the Scandinavian statisticians, Wester- gaard and Charlier, and the Italian scholar, Pizetti, have con- tributed several important papers. It is on the basis of these papers that the following mathematical treatment is founded. In certain cases, however, we shall not attempt to enter too deeply into the theory of certain definite integrals, which is essential for a rigorous mathematical analysis, but which also requires an extensive mathematical knowledge which many of my readers, perhaps, do not possess. To readers interested in the analysis of the various integrals we may refer to the original works of Czuber and Charlier. CHAPTER VIII. INTRODUCTORY FORMULAS FROM THE INFINITESIMAL CALCULUS. 56. Special Integrals. — In the following chapters we shall attempt to investigate the theory of probabilities from the stand- point of the calculus. Althougli a knowledge of the elements of this branch of mathematics is presupposed to be possessed by the student, we shall for the sake of convenience briefly review and demonstrate a few formulas from the higher analysis of which we shall make frecjuent use in the following paragraphs. All such formulas have been given in the elementary instruction of the calculus, and only such readers who do not have this particular branch of mathematics fresh in memory from their school days need pay any serious attention to the first few paragraphs. 57. Wallis's Expression for tt as an Infinite Product. — We wish first of all to determine the value of the definite integral: Jn = X'%in-rrf.r, (1) under the assumption that n is a positive integral number. This integral is geometrically equal to the area between the x axis, the axis of y, the ordinate corresponding to the abscissa \tv and the graph of the function y = sin" .r. Letting // = D^n = sin x, V = sin"~^ X, we get by partial integration: J„ = _ cos .r sin"-i .r] ^ '--\- J" '~ cos .r(/i- 1) sin"-^ x cos xdx. (2) If we substitute the upper and lower limits in the first term on the right hand side of the above expression for J„ this term reduces to 0, assuming /( > 1. Thus we have: Jn = in — D^J*"^" sin"~2.r-cos2a:c?a:. 90 57] WALLIS'S EXPRESSION OF tt AS AN INFINITE PRODUCT. 91 Putting cos^ .T = 1 — sin^ x, we get: Jn = (/I - l)o/"^" sin"~- xdx - {n - l)J^'^' sin" xdx. (3) The last integral is, however, equal to J„ and the first integral is, following the notation from (1), equal to J„_2. We shall therefore have: Jn + in — \)Jn = (n — l)e/„_2, or nJn = (W - 1) Jn-2. (4) Replacing nhy n — 1, n — 2, n — 3, • • • successively we get: nJn = (W — 1)J„_2, (n - l)Jn-i = {n - 2)J„_3, (n — 2)J„_2 = (w — 3)J"„_4, According as n is even or uneven we shall have one of the following equations at the bottom of the recursion formula: Jo = oj " sin° xdx = ^^ " dx = Jx, or Ji = oJ " " sin xdx = — cos a- T " = 1. (5) If, for even values of 7i, we let n = 2m, and, for uneven values, n = 2m — 1, we get finally the following recursion formulas: 2W J2m = (2 m — l)J2m-2, (2m— l)J-2m-l = (2 W — 2)J2m-Z, {2m - 2)J2n.-2= (27w-3)J2^_4, (2m-3)J2^_3= (2m-4)J2^_5, 2J2 = l47r, 3/3=2X1. Successive multiplication of the above equations gives us finally : ^ (2m- l)(2m-3)-»-l tt ''^"^ 2m(2m-2)---2 ^2' _ (2m - 2)(2m - 4)>--2 ^^^ /2m-i - ^2vi- l)(2m- 3)--.3' We may now draw some very interesting conclusions from the 92 FORMULAS FROM THE INFINITESIAL^L CALCULUS. [58 above equations. Both integrals represent geometrically areas bounded by the graphs of the functions: y = sin-'" .r and y = sin-"*~^ .r respectively. The difference of the ordinates of these graphs, namely: (sin .r — 1) sin-"'-! x is evidently decreasing with increasing values of the positive integer n, since sin x lies between and + 1 and sin-'""^ x ap- proaches the value except for certain values of .r. The larger we select ?// the less is the difference of the two areas and the ratio will therefore approach 1, or the expression {2m- 2)(2m - 4)---2 ^ {2m - l)(2m - 3) •••3 _ tt (2m- l)(2w - 3)---3 ' 2m(2w-2)---2 ~ 2' Hence: TT lim 22.42.62...(2to- 2)2. 2m 2 ;,:r; l- • 3-^ • S^ • • • (2m - 3)2(2m - 1)2 Multiplying with 2--4--62- • •{2m — 2)- we get: X ,. 2'"'-h7i[{m - 1)/]^ ,. 22'"(m/)2 L.= ^J7/2. - = am — rTT, TTT^T^ — or i:m — 2 „,=« [{2m - 1)/]- „,^« (2,„/) M2m This is the formula originally discovered by the English mathematician, John Wallis (1616-1703), and by means of which TT may be expressed as an infinite product. 58. De Moivre — Stirling's Formula. — ^We are now in a position to give a demonstration of Stirhng's formula for the approximate value of n! for large values of n. A. de Moivre seems to have been the first to attempt this approximation. In the first edition of his "Doctrine of Chances" (1718) he reaches a result, which must be regarded as final, except for the determination of an unknown constant factor. Stirling succeeded in completing this last step in his remarkable "Methodus Differentialis" (1738). In the second edition of "Doctrine of Chances" (1738) de Moivre gives the complete formula with full credit to Stirling. He mentions as his belief that Stirling in his final calculation possibly has made use of the formula of Wallis. The demonstration by the older English authors is rather lengthy and much shorter 58] DE MOivRE — Stirling's formula. 93 methods have been devised by later writers. ]\I()st authors make use of the Euk-riau integral of the second order by which any factorial may be expressed by a gamma function: r(/t + 1) = J^^x^e-^dx = n!. Another method makes use of the well-known Euler's Summation Formula from the calculus of finite differences. This metliod is of special interest to actuarial students, who frequently use the Eulerian formula in the computation of various life contingencies. For the benefit of those interested in this particular method we may refer to the treatises of Seliwanoff and Markhoff, two Russian mathematicians.^ The Italian mathematician, Cesaro, has, however, derived the formula in a nuich simpler manner.- Cesaro starts with the inequalities: 1 (-;r' 1+ e < 1 + - = tt to the power 6, to p to the first power or 14,437,500 + 0,187,500 + 193,359,375 + 429,087,500 + 044,531,250 + 585,937,500 2,170,782,336 62. The Most Probable Value in a Series of Repeated Trials. — In the examples just given we determined the probability for the happening of the most probable event in a series of s observa- tions by a direct expansion of the binomial (p -\-q )*. This may be done whenever s is a comparatively small number. But, when s takes on large values, this method becomes impracticable, not to say impossible. Suppose that s — 1,400, then the actual straightforward expansion {p + r/)^"*"'^ would require a tremen- dous work of calculation which no practical computer would be willing to undertake. We must therefore in some way or other seek a method of approximation by which this labor of calcula- tion may be avoided and try to find an a])proximate formula by which we are able to express the maximum term in a simple manner, involving little computation and at the same time yielding results close enough for practical as well as theoretical purposes. Jacob Bernoulli in his famous treatise "Ars Conjec- tandi" was the first mathematician to solve this problem. Bernoulli also gave an expression for the probability that the departure from the most probable value should not exceed pre- viously fixed limits. The method, however, was very laborious and the final form was first reached by Laplace in "Theorie des Probabilites." We saw before in Chapter IV that the general term 100 LAW OF LARGE NUMBERS. [ G2 ill the binomial exi^ansion (p + 7)* represented the probabiHty that an event, E, will hap])en a times and fail /3 times in s trials, where p and q were the respective probabilities for snceess and failure in a siiiji;le trial. The exponent a may here take all posi- tive integral values in the interval (0, s), including both limits. The question now arises, which particular value of a, say «„, will make the above quantity a maximum term in the expansion of the binomial? If a,, really is this ])articular value, then it must satisfy the following inequalities: gl gl (I) (11) = (an- l)!(^n+l)!^ ^ • (HI) Dividing (II) by (III) and (II) by (I) we obtain the following inequalities: an q - I3n p - which also may be written (/3„ + l)p ^ qan and («„ + l)q ^ ^np. The following reductions are self evident: (s — a„ + l)p ^ a:„(l — 2^) or sp -\- p "^ an, and {an + l)q ^ {s — an)p or a,// + anp '^ sp — q or a„ ^ sp — q. From which we see that «„ satisfies the following relation: ps — q ^ an ^ ps -\- p. Since p -{- q = 1, we notice that an is enclosed between two limits whose difference in absolute magnitude equals imity. The whole interval in which a„ is situated being equal to unity, and since a„ must b(> an integral number, this particular a„ is deteniiiiHil iiiii(|U('ly as an integral positive number when both ps — q and /AS- + p are fractional quantities. Tf ps — 7 is an integral number ps + p will also be integral, and «„ had to be a 63] APPROXIMATE CALCULATION OF THE MAXIMUM TERM. 101 fractional number in order to satisfy the above inequality. Since by tlie nature oF the probk^m a,, can take j)ositive integral values only, the binomial expansion of {p + q)^ must have two terms which are greater than any of the rest. Dividing both sides of the inequality by s, we shall have V < — <7)+-or(7 + -> — and « + - > ^ . s s ^ s ^ s .y ^ s s Since botli p and q are proper fractions, both />/.s' and qls are less than 1/6". We may therefore safely assume that the highest pos- sible difference between the two quotients cv„/,s' and ^n/s and the probabilities p and q will never exeeed 1 .9. Now if s is a very large number this quantity may be neglected, and we may therefore write ps = an and qs = j3n. Substituting these values in our original expression for the general term of the binomial expansion we get as the maximum number: 63. Approximate Calculation of the Maximum Term, T^. — When the trials are repeated a large number of times the straight- forward calculation of the maximum term becomes very laborious. The only table facilitating an exact computation is in a work "Tabularum ad Faciliorem et Breviorem Probabilitatis Com- putationem Utilium, Knneas," by the Danish mathematician, C. F. Degen, This table, which was published in 1824, gives the logarithms to tweh'c places for all values of nl from n = 1 to n = 1,200. Degen's table is, however, not easily obtained, and even if it were, it would be of little or no value for factorials above 1,200 !. Our only resort is therefore to find an approximate expression for the above value of n !. This is most conveniently done by making use of Stirling's formula for factorials of high orders. We have s\ = 5^i\-*V27, {sq)\ = {sqy'^+'^'^e-''^^l2^. 102 LAW OF LARGE NUMBERS. [64 Substituting the above values iu the expression s!/{(,sp)! (sq)!) we get 1 Hence we have which reduces to T = njSPqSq jjSp+1 n^sq+l /2 -^2x5 ' 1 T = X2ir.spq as an approximate value for the maximum term. Tchehychcff\s Theorems. — Despite all that has been said about the most probable value, its use is somewhat limited, and it might well, without harm, be left out of the whole theory of probabilities. Just because an event is the most probable it does by no means follow it is a very i)robable event. In fact the expression ( '^'2-Kspq)~'^ which for large values of s converges towards zero shows that the most probable event in reality is a very improbable event. This statement may seem a little paradoxical; but it is easily understood by realizing that the most probable event is only a probability for a possible combina- tion among a large number of equally possible combinations of a different order. Instead of finding the most probable event it is more important in practical calculations to determine the average number or mean value of the absolute frequencies of successes. In Chapter V we pointed out the close relation between a mathematical expectation and the mean value of a varial)l('. This relation is used by the Russian mathematician, Tchebycheft", as the basis of some very general and far-reaching theorems in probabilities, by means of which the Law of Large Numbers may be established in an elegant and elementary manner. 64. Expected or Probable Value. — In Chapter V we defined the product of a certain sum, s, and the probabiiitN' of winning such a sum as the mathematical expectation of .v. It is, however, not necessary to associate the happening of the event with a monetary gain or loss, in fact it serves often to confuse the reader and we may generalize the definition as follows. 7/ a 64] EXPECTED OR PROBABLE VALUE. 103 variable at may assninc any of the values ai, a-i, az • • • a^ each with a resjjective probability uf existence (p{ai) (i = 1,2 • • • s) and such that X(p{ai) — 1, then we define: llanpiai) = e{ai) as the expected value of a i. Some writers use also the term probable value instead of expected value. In other words the expected value of a variable quantity, a, which may assume any one of the values ai, a-i- • •«« is the sum of the products of each individual value of the variable and the corresponding ])ro})ability of existence of sucli value. Suppose we now have two o])posite and complementary events E and E for which the probabilities of happening in a single trial are equal to p and q = 1 — p respectively. When the trials are repeated s times the probabilities of E hapi)ening s times, E no times, of E hai)])ening s — I and E once, of E s — 2 and E 2 times and so on, may be expressed by the individual terms of the expansion: (p + qy, where the general term expressing the occurrence oi E a times and of E (s — a) times is: which is also the pio})ability of the existence of the frequency number a. The variable in the binomial expansion is a, which may assume all values from to .s- inclusive. We now first of all proceed to find the expected value — or the mathematical expectation — of the following quantities: a, [a — e{a)] and [a — e(a)p. We shall presently show the reason for the selection of the abo^'e exi)ressi()ns, which perhaps may appear at the present, somewhat i)uzzling to the student. In mathematical symbols the expected values of the abo\'e quantities are expressed as follows* e(a) = Zaifia), e[a — e(a)] = Z[a — e(a)]if(a) and e[a - e(a)Y = ^[a - e(a)]V(«) and the summation is to take place from a = and to a = *. 104 LAW OF LARCiE NUMBERS. [65 65. Summation Method of Laplace. The Mean Error. — The analytical difficulty lies in the summation of the expressions as given above. Laplace was the first to give a compact expres- sion for the different sums in a simple and elegant manner. By the introduction of the parameter / Laplace writes: 1 5- • (4) Let also a — Xe(^). We then have by a mere substitution in the above inequality: Pt>1-^,. (5) This constitutes the first of Tchebycheff's criterions which says: The probability that the absolute value of the difference \ a — e{a) | does not exceed the mean error by a certain multiple, X, (X > 1) is greater than 1 — (l/X-). Now we made no restrictions as to the variable, ^, which may be composed of the sum of several independent variables, a, /3, 7, • • • . We saw before that e\a + ^ + 7 + • • •) = 62(a) + e\^) + 6^(7) + • • • Tchebycheff's criterion may therefore be extended as follows: The Tchebycheffian probability, Pt, that the difference | a + /3 + 7 -\- • ■ • — e{a) — e{^) — e{y) — • • • 1 loill never exceed the mean error thy a certain midtiple, \> \, is greater than 1 — (l/X^). 110 LAW OF LARGE NUMBERS. [69 68. The Theorems of Poisson and Bernoulli proved by the application of the Tchebycheffian Criterion. — Bernoulli in his researches limited himself to the solution of the problem in which the probabilities for the observed event remained constant during the total luiniber of observations or trials. Poisson has treated the more general case, wherein tlie individual probal)ility for the happening of the event in a single trial varies during the total s trials. This may probably best be illustrated by an urn schema. Suppose we have s urns Ui, U2, ■ " Us with white and black balls in various numbers. Let the probability for drawing a white ball from the urns Vi, Uo, ••• Us in a single trial be Pi, Ih, ■ • • Ps respectively, r/i, r/o, • • • q, the chances for drawing a black ball in a single trial. If a ball is drawn from each urn, what is the probability of a drawing a white and .v — a black balls in s trials? It is easily seen that the Bernoullian Theorem is a special case when the contents of the s urns and the respective probabilities for drawing a white ball in a single trial are the same for all urns. 69. Bernoullian Scheme. — We shall now show how the Tche- bycheffian critierions may be used in answering the question given above. First of all we shall start witli tlie simi)ler case of the Bernoullian urn-schema. Here the probability for drawing a white or a black ball from each of the s urns in a single trial is J) and q respectively. The square of the mean error in a single trial is pq. From the formulas in § GG it then follows: e- = er + €2" + • • • = PQ -\- P'l + ;'7 + * ' ' -^ times = spq or _ e = ^l.S'pq. While the above expression gives us the mean error of the absolute frequency of the variable a, the relative frequency of a to the total number of trials, s, is given as ^fpq We now ask: WTiat is the total probability that the absolute deviation of the relative; frequency a/s from its expected value sp/s = p never becomes larger than X times the mean error. 70] poisson's scheme. Ill c — 'Spq/s? Letting X = 'Sslt and using the symbols Pr for this particular probability, we have according to Tchebycheff's criterion : Pt> 1 - 1/X', or Pt > 1 - f/s. Since the mean error is equal to yxjis we have: The answer to our question above follows now a fortiori as follows : The total probability that the absolute deviation of the relative frequency from the postulated a priori probability, j), never exceeds the fpiantity, '\pqlt, is greater than 1 — (^-/.v). By taking / large enough we may reduce ^IjH/lt (where pq is a fraction whose maximum value never can exceed 1 -r- 4,) below any previously assigned quantity, 5, however small. If, for instance, we choose the value .0001 for 5, we may rest assured that 'ylpq/t will be less than 5 when we take / larger than 5000. But no matter how large t is, so long as it remains a finite number, by letting s = cc as a limiting value, fjs will simultaneously approach as a limiting value. From the deductions thus derived we are now able to draw the following conclusions: 1) By letting s = oo as a limiting value, the probability, Pt, that the absolute difference between the relative frequency ajs and the postulated a priori probability, p, never becomes greater than -^q/t approaches 1 or certainty as a limit. 2) By choosing the quantity, t, which is less than Urn '^s, suffi- ciently great, ice may bring ypq/t below any previously assigned quantity, 8, or make the difference between p and ajs as small as we please. From these conclusions we obtain a fortiori the follow^lng ,. a lim- = p. This constitutes the essential features of the Bernoullian Theorem. 70. Poisson's Scheme. — Let pi denote the postulated prob- ability for success in the first trial, p2. in the second, ps in the 112 LAW OF LARGE NUMBERS. [70 third, etc., and let furthermore qi, q-i, q^, • • • be the respective probabihties for the correspoiuhng faihires. If the trial (observa- tion) is repeated .s- times we obtain the following values for the probable or expected value of the frequency for successes eia) and the mean error e e{a) = pi-\r p2-\- pz-\- •••Vs = ^Ph € = Vpir/i + p^q-i + Psqz + • • • Psqs = ^^p.qi {i = 1, 2, 3, • • -5) If by po and qo we denote the arithmetic mean or the average value of the s p's and s g's, such that 2^1 + p2 + P3+ ■■• Ps .^. Po = -^ (3) qo = , (4) and assume that po and 70 denote the constant probabilities during each of the s trials (observations), we should according to the Bernoullian Theorem have : eiois) = spo (5) e{aj}) = yJspoqo (6) where as stands for the absolute frequency in a Bernoullian series. An actual comparison of (1) and (5) and (3) shows that: i'iap) = eias) (7) where ap is the symbol for the absolute frequency in a Poisson series. In other words: If the s trials had been ])erformed with constant ])robability for success equal to ^>o instead of with varying probabilities pi, p^, • • • /?«, the expected or probal)le value would be the same for the Bernoullian and Poisson scheme. With regard to the mean error we find, however, after a little calculation, ^iKcc) = es\oi) - Zip, - poY. (8) The expression for the mean error in Poisson's Theorem is of the following form €/- = Vpi^i + 2^292 + PzQz + • • 'PiQi = ^Zpiqi (i = 1, 2, 3- • -s). 70] poisson's scheme. 113 Now piqi may be transformed as follows: Writing Vi = Po -h (Pi — Vq) Qi = qo - (Pi - Po) and multiplying we obtain: PiQi = PoQo - {pi - po)(po - qo) — {pi — PoY, and summing up for all values of i from i = I to i = s we have: €p' = spoqo — ^(Pi — Pq)' = f^B — '^{Pi — Po)^- As {pi — 2Jo)" always is a positive quantity, it is readily seen that the mean error in a Poisson scheme is always less than the mean error in the corresponding Bernoullian series. Writing e as follows: € = ^IplQl + 2?2( 5 S and letting X = 'ss/t, we have according to Tchebycheff's The- orem the following rule: The probability Pr that the relative frequency remains inside the limits: Pi + P2+ • '■ Ps ^ f ^l\ _ Pi -^ P2 + • • • -\- Ps e t ± J) 1 JP I+ P 2-\- • • ' -\- Ps _ Pl^+P2^ -\- '" + Ps t^ s s is greater than 1 — (1/X") or 1 — (t'/s). By taking t sufficiently large and by letting s approach infinity as a limiting value the last term in the above difference, namely the average probability, po, and X times the mean error, becomes smaller than any previously assigned quantity, d, however small, while Pt at the same time will approach 1 as a limit. From this it now follows: When an infinite number of trials is made on an event, following the scheme of Poisson, then the expression: ,. a pi + P2 + • ■ ■ + Ps hm - = = Pq. x=tn S S 114 LAW OK LARGE NUMBEKS. [71 The essential part of Poisson's Theorem is contained in this equation. ^Yhen p = p\ = jh = • • • pa y>'e have a BernoulHan series and obtain: hm- = p, s-ao "^' which result we already derived above in a direct way. 71. Relation between Empirical Frequency Ratios and Mathe- matical Probabilities. — In the above limit, a indicates the total number of lucky events while ,s' is the total number of trials, the quotient a -^ s then is nothing- more than the empirical i)rob- ability as defined in the precedinji; paragraphs. Both the Bernoullian and Poisson Theorems show that this empirical probability approaches the postulated a priori probability, p, (or the average probability po) as a limiting value. In this way we have succeeded in extending the theory of probability to other ])r()blems than the conventional kind involved in the games of chance or drawings of balls from urns. We do not need to limit our investigations to problems where we are able to determine a priori the probability for the happening of an event in a single trial, but limit ourselves to postulate the existence of such an a ])ri()ri probability. A large number of trials or observations is made on a certain event E. This event is now observed to have occurred a times during the s total trials. To illustrate: An urn contains red and white balls, the total number of balls being unknown, a single ball is drawn and its color noted. This ball is replaced and the contents of the urn is mixed. A second drawing is made and the color of the drawn ball noted before the ball is put back in the urn. Let this process be repeated .s- times, where s is a large number, and furthermore let a be the number of red balls which appeared during the s trials. The quotient a -^ 5 we now call the empirical or a posteriori probability for the observed event, in this particular case the a posteriori probability for the drawing of a red ball. When s = 00 the Bernoullian Theorem tells us that the empirical probability found in this manner and the postulated a priori probability whose niinicrical value, however, was unknown before the drawings took j)lace, are identical as far as numerical 72] APPLICATION OF THE TCHEBYCHEFFIAN CRITERION. 115 magnitude is concerned. As we already observed in the intro- ductory remarks to this chapter it is impossible to perform a certain experiment an infinite innnber of times, and it is therefore out of the question to determine the limiting and ideal value of the posteriori probability, and we must satisfy ourselves with an approximation by performing a finite number of trials, or let s be a finite number. The quotient a -^ 5 is then the empirical approximate a posteriori probability. We know also that al- though this quotient is an a])i)roxiniation of the postulated a priori ])robability only, that ))y increasing .? or what amounts to the same thing, by making a large number of trials, the dif- ference between the approximate empirical probability ratio, a -T- s, and the a priori probability, p, becomes smaller as the number of trials is increased. But how small is the difference? Or how many times shall we repeat the trials (observations) so that, for practical purposes, we may disregard this difference? It does not suffice to be satisfied with the fact that the difference becomes proportionately smaller the greater we make the number of trials and merely insist that in order to avoid large errors it is only necessary to operate with very large numbers. Immediately the question arises: What constitutes a large number? Is 100 a large number, or is 1,000, 10,000, 100,000 or even a million an answer to this question? As long as this question remains unanswered, it ]i(>lps but little to j)()ke upon the "law of large numbers," a tendency which unfortunately is too manifest in many statistical researches by amateur statisticians. As long as a definition, much less than a numerical determination of the range of "small numbers" is lacking, little stress ought to be laid on such remarks based in the metaphorical terms of "small" and "large" numbers. 72. Application of the Tchebycheflfian Criterion. — It is readily seen that even a rough quantitive determination of the difference between the approximate a posteriori probability and the postulated a priori probability based upon the mere vague state- ment of "large numbers" is utterly impossible, and it remains to be seen, therefore, if the theory of probability offers us a criterion that might serve as a preliminary test for the above difference. To restate our problem: If p is the postulated a 'priori 116 LAW OF LARGE NUMBERS. [72 probability and a -^ s is the empirical probability (a posteriori) or relative frequency of the event, E, ivhat is the probobility that the difference, \ (a's) — p | does not exceed a previously assigned quantity? In the mean error and the associated theorem of Tchebycheff we have a simple and easily applied criterion to test this prob- ability. Tchebycheff's rule states that the probability, Pt, of a devia- tion of a variable from its probable value, not larger than X times its mean error, is greater than 1 — (1/X"). For X = 3 Pt> \ - I = 0.888 X = 4 7V > 1 - Jg = 0.937 X = 5 Pt> I - h = ^^-96. This shows that a deviation from the expected or probable value of the variable equal to 4 or 5 times the mean error possesses a very small probability and such deviations are extremely rare. Let us for example assume that the observed rate of mortality in a certain population group is equal to .0200. Let furthermore the number exposed to risk equal 10,000. The mean error is (.02X.98\^ 1 n onn ) ~ -0014. If the number of lives exposed to risk was one million instead of 10,000, the mean error would be (.02 X .98\^ "r^TTTTTT-TTT I =.00014. A deviation four times this latter quantity 1,000,000/ ^ ^ is equal to .00056, and according to Tchebycheff's criterion the probability for the non-occurrcnre of a deviation above .00056 is greater than .937, or the probability of dying inside a year will not be higher than .0206 or less than .0194. For an observation series of 4,000,000 homogeneous elements we might by a similar procedure expect to find a rate of mortality' between 0.02 + 0.00028 or 0.02 - 0.00028. Thus we notice that the mean error of the relative frequency numbers decreases as the number of observations increases. CHAPTER X. THE THEORY OF DISPERSION AND THE CRITERIA OF LEXIS AND CHARLIER. 73. BemouUian, Poisson and Lexis Series. — In the previous chapter we Hmited our disevission to single sets consisting of s individual trials and found in the mean error and the criterion of Tchebycheff a measure for the uncertainty with which the relative frequency ratio a/s as well as the absolute frequency a were affected. How will matters now turn out if, instead of a single set, we make N sets of trials? As already mentioned in paragraph 54, in general in N such sets we shall obtain A^ dif- ferent values of a, denoting the absolute frequency of the event represented by the sequence ai, ao, 0(3, • • • oi^;. Our object is now to investigate whether the distribution of the above values of a around a certain norm is subject to some simple mathematical law and if possible to find a measure for such distributions. In this connection it is of great importance whether the pos- tulated a priori probabilities remain constant or not during the N sample sets. Three cases are of special importance to us.^ 1. The probability of the hapi)ening of the event remains constant during all the N sets. The series as given by the ab- solute frequencies in each set is knoAvn as a BemouUian Series. 2. The same probability varies from trial to trial inside each of N sample sets, the variations being the same from set to set. The series as given by the absolute frequencies is in this case known as a Poisson Series. 3. Tlie probability remains constant in any one particular set but varies from set to set. The absolute frequency series as produced in this way is called a Lexis Series. The above definition of these three series may, perhaps, be made clearer by a concrete urn scheme. 1 The terminology in due to Charlier. 117 118 THE THEORY OF DISPERSION. [73 A. BernovlUan Series. — .9 balls are drawn one at a time from an urn, containing black and white balls in constant proportion during all drawings. Such drawings constitute a sample set. Let us in this particular set have obtained say a\ white and /3i black balls, where (X\-\- ^\ = -v. We make N sets of drawings under the same conditions, keeping a record of white balls drawn in each set. The number sequence thus obtained, «i, oi-i, «3, • • • CX-N- is a Bernoullian Series. B. Poisson Series. — .9 individual urns contain white and black balls, the proportion of white to black varying from urn to urn. A single ball is drawn from each urn and its color noted. In this way we get cvi white and /3i black balls constituting a set. The balls thus drawn are replaced in their respective urns and a second set of s drawings is performed as before, resulting in 02 white and jSz black balls. The number sequence, Oil, "2, as, • • • a,v, of white balls in N sets represents a Poisson Series. C. Lexis Series. — s balls are drawn one at a time under the same conditions as set No. 1 in the Bernoullian series. The ai white and /3i black thus drawn constitute the first set. In the second and following set the composition of the urn is changed from set to set. The number sequence representing the number of white balls in the N respective sets: «!, ao, OC3, • • • Qfy is a Lexian Series. The scheme of drawings is the same as in the Bernoullian Series except that the proportion of white to black balls varies from set to set. 74. The Mean and Dispersion. — Since we have no a priori reasons for choosing any one particular value of the various as of the above sequences in preference to any other, we might give equal weight to each set and take the arithmetic mean as defined by the formula : \f — '^^'i + 0:2 + 0:3 + ■ • • q:.v ,j. N ■ ^ ^ of the .V values of a. 73] BERNOULLIAN, POISSON AND LEXIS SERIES. 119 It will 1)0 unnecessary to enter into a detailed discussion of the mean, which is a quantity used on numerous occasions in every day life. We shall, however, define another important function known as the dispersion (standard deviation). The dispersion is denoted by the Greek letter, a, and is defined by the formula c' = ^ . (II) We shall now attempt to find the expected value of the mean and the dispersion in the three series. First of all take the Bernoullian Series, Let the constant probability for success in a single trial be j^o- We have then for the various expected values or mathematical expectations of a: Set No. 1: ^(0:1) = spo Set No. 2: eia^) = spo Set Nc N: e(av) = ^po or: e(ai) + ejaj) + • • • + e{a^) _ 7:e(aJ) _ Nspo _ N ~ y ~ N ~ "'^°' which shows that the mean in a Bernoullian Series of N sample sets is equal to the expected value of the absolute frequency in a single set. In regard to the dispersion we have for the various sets; Set No. 1 : e(ai — My = e-{ai) = spoqo Set No. 2: e{a2 — My — €-(0:2) = *Po9o Set No. .V: eia^, - My = e\a^) = spoQo Summing up and forming the mean we obtain for the expected value of the dispersion in a Bernoullian Series, which we shall denote by the symbol o-^ : 2€-(a,) Nspoqo ^B - ~]v~" ^ N "" *^°^°* 120 THE THEORY OF DISPERSION. [73 This result shows that the dispersion in a BernoulUan Series is equal to the mean error, e, in a single set. We now proceed to the Poisson Series. Let pi be the mathe- matical probability of the happening of the event in the first trial, p-2 be the pr()l)ability in the second trial and so on for all trials, and let us furthermore denote the means of the p's and g's by: Pi + p2 -\- P3 • ■ ■ + ps Po = 9o s qi + q2 + qs • " + Qs Applying a similar analysis as above we have: Set No. 1 : e(ai) = pi + p2 + • • • + Ps = spo Set No. 2: eia^) = pi + P2 + •'■-{- Ps = spo Set No. N: e(a^^) = pi -\- P2 + • • • + /?« = spo The actual summation of the above values of e(a) gives us the following value of the mean in a Poisson Series: Mp = spo. Let us for a moment assume that all the drawings had been performed with a constant probability, po. According to the Bernoullian scheme we should then have: Ms = spo. An actual comparison shows that M^ = Mp. This shows that the same mean result is obtained if we draw .v balls from the urns Ui, JJi, • • • Us with their corresponding probabilities pi, po, • • • p, for drawing a white ball, as would be obtained if we drew all the s balls from a single urn where the composition is such that the ratio of the number of white to that of black balls is as po : qo, where po and r/o are defined as above. Let us now see how matters turn out in regard to the dispersion. We have for the N sets: Set No. 1 : e{ai - M)- = piqi + ^2^2 + • • • = ^p^q^ = e^ai) Set No. 2: e(a2 — My = p^qi + ^272 + • • • = ^p^q^ — e^ia^) Set No. N: e{a^ - Mf = p^q^ + ^292 + • • • = ^p.q. = t\a^) 73] BERNOULLIAN, POISSON AND LEXIS SERIES. 121 111 § 70 we showed, however, that ^p^Qv could be expressed as follows: €p-(a) = 52Jo'7o - 2(p^ - 2?o)- = 6/(a) - 2(p^ - po)^. A simple straightforward calculation gives us now for the dispersion, ap, (t;~ = (T^ - l{p^ - /Jo)-, In the corresponding Bernoullian Series with constant proba- bility, ])o, the dispersion is equal to spo({o> which shows that the dispersion in a Poisson Series is less than the corresponding dispersion of the Bernoullian Series. We finally come to the mean and the dispersion in the Lexian Series which we shall denote by il/^, and cr^ respectively. Let us furthermore define the two quantities po and Qq as follows: Vi + P2+ •■■ +Pn Vo = qo = N 9i + 92 + \- Qn N A computation along similar lines as above gives us first for the mean, Mj^: Set No. 1 : e(ai) = spi Set No. 2: e(a2) = sp2 Set Xo. X: ^(«A-) = spj^ Thus we have: Ic(q;..) Scvp, s[pi + P2+ ■•• p.vl ^^h = —^ = 7 Y = ]^ = m- For the dispersion we have the following expectations: Set Xo. 1: e(spo — aiY Set Xo. 2: e(spo — a-iY Set Xo. X: e{spo — a^Y The expected value in the I'th set is e{spo — aS~ = 2(*2?o — a^yrobabiHty binomial: {Pv + (IvY = 1- All analysis along similar lines as in § 65 gives us now: e{spQ — aj- = s-pQ' — 2s-poPy + s-pj^ + sp,q^ = sp,.q,. + .v-(/J, - Pq)- as the expected value of the square of the difference between the mean and the absolute frequency in the j'th set. For all N sets we then have ., ^sp„q,. .sition and may be used as a standard of comparison with other series. This is the method adopted by Lexis in investigating certain statistical series, and we shall re- turn to it in the following chapter. Lexis determines first in a direct manner the dispersion as defined by formula (II) from the statistical data as given by the number sequence a. This process is known as the direct process (by Lexis called a physical process) and gives a certain dispersion, a. After this the dispersion is computed by an indirect (coml)inatorial) process under the assumption that the series follows the Ber- noullian distribution. The ratio, a : (Tq, which Charlier calls the Lexian Ratio and denotes by the symbol, L, ma\' now give us an idea about the real nature of the statistical series as represented by the number sequence. When L — I, the series is by Lexis called a normal series. When L > I, the series is called hypernormal. When L < 1, the series is a suhnonnal series. It is easily seen from the respective formulas that the Poisson Series are subnormal series whereas the Lexian Series are hyper- normal. The great majority of statistical series are — as we shall have occasion to see in the following chapter — of a hyper- normal kind and correspond thus to the Lexian Series. In § 74 we found the dispersion in the Lexis series as <^l' = o 2 Neglecting s in comparison with ,s'- and renjembering that Ms = spo, we have as an approximation: rp Va^^-2 0-, Po M B Charlier calls the quantity lOOp the coefficient of disUirhancy of the statistical series. It is readily seen that the Charlier coef- ficient is zero in normal series. For hypernormal series it is a positive real quantity whereas for subnormal series p is imaginary. CHAPTER XL APPLICATION TO GAMES OF CHANCE AND STATISTICAL PROBLEMS. 76. Correlate between Theory and Practice. — In the theo- retical analysis just completed we treated the fundamental ele- mentary functions in the theory of probabilities, the probability function, the expected or probable value of a variable quantity, the mean error, the dispersion and the coefficient of disturbancy. The formulas thus derived were founded upon certain hypo- thetical axioms, which formed the basis of a mathematical a priori probability as defined by Laplace. As far as the purely abstract mathematical analysis is concerned it matters but little if the hypotheses are physically true or not, that is to say, if they agree with physical facts in the universe as it is known to us. A mathematical analysis may be made on the basis of widely divergent hypotheses, a fact which is clearly shown in the Euclidean and Non-Euclidean geometries. It is, however, quite a different matter when we wish to apply our theory to actual phenomena (physical observed events) as it is evident that a correlation between hypothesis and actual facts follows by no means a priori. It is, of course, true that the different hypotheses in the tlieory of probabilities are derived to greater or less extent from outside sense data. Such sense data, however, give us only the effect and no clue whatsoever to the relation between cause and efl'ect. In the application of our theory every hypothesis — or rather the results derived from such hypothesis — must be verified by actual experience. Before such a veri- fication is made, we advise the reader to be sceptical and not trust too much in the authority of others but follow the sound advice of Chrystal : " In mathematics let no man over-persuade you. Another man's authority is not your reason." We can so much more encourage an attitude of scepticism in view of the fact that even among the leading mathematicians of the present time there exists no uniform opinion as to the truth of the axioms underlying the theory of probabilities. 127 128 APPLICATION TO GAMES OF CHANCE. [76 77. Homograde and Heterograde Series. Technical Terms. — Whene\er a coiniiion charactoristic or attribute of several groups of observed individual objects or events allows a purely quantitative determination, it may be made the subject of a mathematical analysis and in such cases we are often able to make excellent use of the theory of probabilities. Such quan- titative measurements may be divided into various domains of classification. Traces of such classification are found in almost every treatise on mathematical statistics but a uniform system nomenclature is unfortunately lacking among the various statisticians and any one reading the modern literature on mathe- matical statistics notices often various inconsistencies of the different authors. Mr. G. Udny Yule in his excellent treatise "Theory of Statistics" classifies the statistical series into "sta- tistics of attributes" and "statistics of variables." Apart from the fact that Mr. Yule's statistics of variables also is a statistics of attributes — although of different grades — the author appar- ently ignores the criterion of Lexis and the associated criterion of Charlier. The German writers use the terms "stetige und unstetige Kollektivgegenstand " (continuous and discontinuous collective objects), which were originally introduced by Fechner. Other writers, such as Johannsen of Denmark and Davenport of America, use still other terms. After ha^■ing made a com- parison of the various systems of classification I have in the following decided to adhere to the system of Charlier wherein the observed statistical series are classified as homograde and heterograde. If the individuals all possess the same character or attribute in the same grade (intensity) — or if we disregard the different grades of the attributes — such individuals are called homograde, and the statistical series thus formed is a homograde series. If on the other hand we take into consideration the different varying grades of the attributes obserxcd or measured and form the series accordingly we obtain a heterograde series. As examples of homograde series we may mention the observed recorded series of coin tossing, card drawings in reference to a specified event, number of births or deaths in a population group, etc. A coin when tossed will either show head or tail, a person will 77] HOMOGRADE AND HETEROGRADE SERIES. 129 either be dead or alive. There are no intermediate degrees as for instance that of a half dead person. In all such series the dividing line between the occurrence of the event (attribute) E and the occurrence of the opposite event E is distinct and suggests itself a priori and there is no doubt as to the classification of the observed event. The original record of observation of a homograde series — also known as the yrimary list — is simply a record of the presence or non-presence of a specified attribute of the individuals belonging to the group under observation and is of the following form: Primary List of Homograde Individuals. Attribute. Sjmbol for the Indiyidual Present {E). Non-present {E). /i 1 u 1 h 1 u 1 u 1 In this scheme the individuals I\, h and /a possess the attribute E while the individuals /o and I^ do not have this attribute. In observing the presence of a specified attribute in a group of individual objects we meet, however, frequently series of quite another nature than the simple homograde series. When in- vestigating the different measures of heights of persons inside a certain population group no simple dichotomous (f. e., cutting in two) division in two opposite and mutually exclusive groups suggests itself a priori. It is of course true that we might divide the total population under observation into two subsidiary groups of tall individuals and short individuals. But the question then immediately arises, What constitutes a short or a tall person? The answer must necessarily be arbitrary. Persons above the height of 170 cm. may be classed as tall while persons falling short of such measure may be classed as short persons, and we might in this way form a primary homograde table of the form as given above. There is no logical reason, however, to choose the quantity 170 cm. as the dividing line and comparatively 10 130 APPLICATION* TO GAMES OF CHANCE. [78 little value would result from such a classification. It is evident that all persons belonging to groups of tails or shorts are not iden- tical as to the particular attribute in question. The height is merely a characteristic which varies with each individual and no two in(livi(hials have matheinatically speaking the same height. If we take into consideration the different grades of height among the individuals and arrange the primary table accordingly we obtain a heterograde series of observations. The general form of the primary table of such series is: Primary List OF Heterograde Individuals. Symbol for the IndiTidual Grade of Attribute. /l Xi I2 X2 Is x» 7* Xt /« Xs In X„ Here the quantities .ri, .ro, • • • .Tn give the measures (in kilo- gram, liter, meter, etc.) of the characteristic in question.^ As examples of heterograde series we may mention the lengths, volumes or weights of animals, plants or inorganic objects; astronomical observations as to the brightness of celestial objects; meteorological records of rainfall. tein])erature or barometer heights ; the frequency of deaths among policyholders as to attained age in an assurance company; duration of sickness or disablement, etc. The investigation of heterograde series is a ])r()bleni of which we shall treat later under the theory of errors or frcxjuency curves. The homograde series may, however, be explained fully by means of the Bernoullian, Poisson and Lexian Series as founded on the mathematical theory of probabilities in tlic ])re^■i()us chapters. 78. Computation of the Mean and the Dispersion in Practice. — It would be sui>erfluous to enter into a detailed denionstratioii of the practical calculation of either the mean or the dispersion ' It is to be noted that in the homograde series the primary list is given by abstract numbers while the heterograde series consists of concrete numbers. 78] COMPUTATION OF THE MEAN. 131 were it not for the fact that this calculation is performed with a lot of unnecessary and useless labor by the untrained student and even by many professional statisticians. By the ordinary school method the number zero is chosen as the starting point and all the variables are expressed in their absolute magnitudes, i. e., their distance from 0. In this way one often encounters mul- tiplication and addition of large numbers. The Danish biologist and statistician, W. Johannson, has illustrated the futility of this method in the following example taken from his treatise " Forela^s- ninger over Lieren om Arvelighed" (Copenhagen, 1905).^ Dr. Petersen, the director of the Danish Biological Station, counted the tail fin rays of 703 flounders (Pleuronedes) caught around the neighborhood of the Skaw. The observations follow: Number of rays: 47 48 49 50 51 52 53 54 55 56 57 5S 59 60. 61 No. of flounders: 5 2 13 23 5S 90 134 127 HI 74 37 10 4 2 1 The ordinary way of computing the mean would be as follows: [5 X 47 + 2 X 48 + 13 X 49 + h 1 X 61] 4- 703, where 703 is the total number of individuals under observation. In Chapter X we gave the following formula for the mean: ^r _ mi + mo + VH + • • • + w,v ,^. M - -^ . (1) This formula may evidently be written as follows: mi — Mo + rrii — Mo -\- 7113 — Mo + • • • + m^ — Mq M = N (2) 2(m, - Mo) + Mo = -^^ ^ + Mo = h + 3/, In this expression Mo, which Charlier calls the provisional mean, is an arbitrarily chosen number. To show how the introduction of this quantity actually shortens the calculation of the mean we return to the above quoted series of observations of tai', fin rays of flounders. 1 German edition "Elemente der exakten Erblichkeitslehre" (Jena, 1913), page 11. 132 APPLICATION TO GAMES OF CHANCE. [78 NtTMBEu OF Rays (x) ix 703 Flounders According to Observations op Dr. Petersen. A^ = ZF{x) = 703, Mo = 53. Frequency X. = ^X-r) X — Mo. (x - 3fo)F{x). 47 5 -6 - 30 48 2 -5 - 10 49 13 -4 - 52 50 23 -3 - 69 61 58 -2 -116 52 96 -1 - 96 53 134 +0 + 54 127 +1 + 127 55 111 +2 +222 56 74 +3 +222 57 37 +4 + 148 58 16 +5 + 80 59 4 +6 + 24 60 2 +7 + 14 61 1 +8 + 8 Sum = S 703 -373 +845 We have now: b = (845 - 373) ^ 703 = 0.67, M = Mo + 6 = 53.67. The method is quite simple and needs hardly any explanation. From a cursory examination of the material we notice tluit the mean is situated in the neighborhood of the series consisting of 53 rays. We choose therefore the provisional mean, J/o; as 53. We next form the algebraic differences of x — Mq. These dif- ferences are then multii)lied by F{x). The algebraic sum of these products divided by N = 2F(.t) gives us the value of 6, which quantity added to Mo gives the value of the mean, M. To show a slightly modified form of the method we take the following observations of coal-mine accidents in Belgium, covering the i)eriod 1901-1910, from "Annales des Mines de Belgique." These data I have reduced to a stationary population group of 140,000 mine workers. In other words the quantity s as defined in § 83 is eciual to 140,000. 78] COMPUTATION OF THE MEAN. 133 Number (m) of Persons Killed in Coal Mine Accidents in Belgium, 1901-1910. s = 140,000, N = 10, Mo = 140. Year. m. 1901 164 1902 1.50 1903 100 1904 130 190.-) 127 1906 133 1907 144 1908 1.50 1909 1.33 1910 133 Sum = 2 Hence m — Mo. (771 - MoY- +24 576 + 10 100 +20 400 -10 100 -13 169 — 7 49 + 4 16 + 10 100 - 7 49 - 7 49 -44 +68 1608 b = (68 - 44) ^ 10 = 2.4, .1/ = 140 + 2.4 = 142.4. In this example probably it would have been easier to have formed the sum 1,m^ directly and then obtained the mean by division by 10. The actual formation of the algebraic sums of m^, — Mq however, greatly facilitates the calculation of the dispersion, a, to which we now shall turn our attention. The formula for the dispersion ^'=^'-'^=-^'(^=1,2,3, ....V) (3) may evidently be written as follows: , (im - MoY- + (1112 - MoY + • • • + (w^ - MoY N 2(m, - MoY (4) where h as usual means M — Mo, Mq being the provisional mean. For Belgian coal mine accidents we thus obtain from the above data: 0-2 = (1608 -^ 10) - .5.76 = 155.04. Where the number of observed individuals is very large an arrangement as that given above for the Belgian statistics becomes too bulky and it is therefore customary to group the observations in classes as for instance in the example of Dr. Johannsen. The dispersion is then computed according to the following elegant 134 APPLICATION TO GAMES OF CHANCE. 78 method due to Charlier from whose brochure "Grunddragen af den matematiska Statistiken" ("Rudiments of ^Mathematical Statistics") I take the following example: Number of Boys (//() pkk 500 Children Born in 24 Provinces of Sweden DURING Each Month in 1883 and 1890. s = 500, N = 576, Mo = 257, w = 5. Class. Limits Hi. Num ber. Frequency xF\x). x«i?(i). {X + l)=F(x), 200-204 — 11 1 - 11 + 121 100 205-209 — 10 210-214 — 9 215-219 — 8 1 - 8 + 64 49 220-224 — 7 2 - 14 + 98 72 225-229 — 6 5 - 30 + 180 125 231-234 — 5 13 - 65 + 325 208 235-239 - 4 18 - 72 + 288 162 240-244 — 3 47 -141 + 423 188 24.5-249 — 2 60 -120 + 240 60 250-254 — 1 81 - 81 + 81 25.5-2.59 108 108 260-264 + 1 91 + 91 + 91 364 265-269 + 2 60 + 120 + 240 540 270-274 + 3 44 + 132 + 396 704 275-279 + 4 22 + 88 + 352 550 280-284 + 5 16 + 80 + 400 576 28.5-289 + 6 6 + 36 + 216 294 290-294 + 7 29.5-299 + 8 300-304 + 9 1 + 9 + 81 100 Sum = 2 576 + 14 +3596 +4200 The class width interval in the above scheme was chosen as 5. The observed frequencies are given in column 3. We thus find that the greatest frequency of 108 falls in tlic class interval 255-259. Choosing this class interval as the origin wc designate the other class intervals with their proj)er positi\e and negative numbers as shown in column 2. The provisional mean, Mq, is taken as the center of class 0, or Mo = 257. In this way the class interval ic — 5 is taken as the unit. The whole calculation is very simple. We first of all form the product X X F(x). The sum of these products divided with 57G = N gives the distance — b — from the ])r()\isioiiaI mean to the arithmetic mean, expressed in units of the class interval, w. 78] COMPUTATION OF THE MEAN. 135 We have thus: b = w X 14 -^ 576 = + 0.0243U' = + 0.122, or M = 257+ b = 257.12. The formula for the dispersion takes the form ■^^^fi^)_j,y N where b is expressed in units of the class interval. The table gives us 2F(.r).r2 _ 359(5 ^j. a- = u'-[359G -^ 57G - (0.024)-] = ^^3.242, a = ic X 2.498 = 12.49. Charlier now checks the results by means of the following relation: -(.r + \rF{x) = Z.v-F{x) + 2^F{x) + ZFix). For the above example we have: 2.r2F(.r) = + 3,596 22.rF(.r) = + 28 2F(.r) = + 576 Sum - + 4,200 = 2(.r + l)-F{x), which proves the accuracy of the calculation. The full elegance of the Charlier self checking scheme is shown at a later stage under the calculation of the parameters of fre- quency curves. In the meantime the student may test the ad- vantage of the provisional mean by trying to compute the mean and the dispersion by the conventional school method. A direct computation by this method would in the last example take about a whole day's labor. Before we proceed to apply the formulas previously demon- strated, we wish to call the attention of the reader to the following important properties of the mean and the dispersion: 1. The algebraic sum of the de\'iations from the mean — /. e., 2(w„ — M) — is zero. This follows immediately from formula (2) of §78. We have: 2(m,_- Mo) N M = _A_i_ ^+ j/^ = 5 + Mo, 136 APPLICATION TO GAMES OF CHANCE. [79 where Mo, the provisional mean, is an arbitrarily chosen number and b = 1(7??,. — Mq) -^ N. If Mq = M we have evidently b = 0, which proves the statement. 2. The dispersion (standard deviation) is the least possible root-mean-square deviation, i. e., the root-mean-square deviation is a minimum, when the deviations are measured from the mean. We have (see formula (4)): „ 2(/», - MY- Z(m. - MoY ,2 a- = jr = ^r 6^ from which tlic proposition follows a fortiori. 79. Westergaard's Experiments. — The Danish statistician, Harald Westergaard, in his " Statistikens Teori i Grundrids" gives the following results of 10,000 observations divided into 100 equal sample sets of drawings of balls from a bag containing an equal number of red and white balls (the ball was returned to the bag after each drawing): White: 33 34 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 Frequency. 01 12223 3 4565 11 95 10 48 White: 55 5d 57 .58 59 60 (il 62 63 Frequency: 3 5 4 4 111. The elements as resulting from Westergaard's drawings clearly represent a BernouUian Series where the number of comparison s is equal to 100. Arranging the data in classes — taking 3 as the class interval — the computation of the mean and the dis- persion is easily performed by means of the Charlier self checking scheme. [{NOULLIAN Series. Number of White Balls IN 100 Drawings (Westergaard). s = 100, N = 100, Mo = 49, w = = 3. m. X. Fix). xF(x). xH'ix). (X+1)2F(X). 33-35 -5 1 - 5 25 16 36-38 -4 39-41 -3 5 -15 45 20 42-44 -2 8 -16 32 8 4.5-47 -1 15 -15 15 48-50 25 25 51-53 + 1 19 + 19 19 76 54-56 +2 16 -f32 64 144 57-59 +3 8 + 24: 72 128 60-62 +4 2 + 8 32 50 63-65 -1-5 1 + 5 25 36 Sum 100 (-51-t-88) 329 503 ^Uj ciiarlier's experiments. 137 Control Check. Xx'^Fix) = 329 2'ExF{x) = 74 2F(.r) = 100 Sum = 503 = 2(0,- + l)2/'(a:) b = w(88 - 51) : 100 = /r X 0.37 = 1.11, or M = 3/o+ b = 50.11, a^ = u'2[329 : 100 - 6-]^ = ic'-{3.29 - 0.137) = 28.377, or a = 5.33. Giving due allowance for the respective mean errors of the mean and the deviation we have finally :- .1/ = 50.11 it 0..53G, a = 5.33 d= 0..378. We shall now compare these values with the corresponding the- oretical values of the Bernoullian series. The a priori probabil- ities of drawing red and white are in this example p — q — \. Hence we have as the theoretical values for the mean and the dispersion : M^ = 100 X i = 50, om this deck, containing 11 spades, 13 clubs, 13 diamonds and 15 hearts, a card was again drawn. The drawings were in this manner continued until all the spades were replaced by hearts. The same operation was ai)plied to the clubs, which were replaced by diamonds. After 27 drawings the deck contained only red cards. Altogether 100 sample sets of 27 drawings were made with the following results: Poisson Series. Number (w) of Black Cards i.v Sample Sets of 27. s = 27, N = 100, Mo = 7, w = I. m. X. F{z). xF[x). x'^F(x). (i + l)2F(z). Control Check. 18 24 +378 14 + 32 +100 22 68 +510 126 128 25 12 +5 1 +5 +25 36 13 +6 1 +6 +36 49 Sum: 100 +16 +378 510 The calculation of the mean and the dispersion with their respective mean errors yields the following result: h= + 0.16, M = 7.16 db 0.211, a-"- = 3.78 - (0.16)- = 3.754, a = 1.937 d= 0.149. The theoretical Poisson values according to the formulas of § 67 are: Mp - 6.75, (7p = 2.111. If we now take the arithmetic mean of the various proba- 3 -4 2 - 8 + 32 4 -3 6 -18 + 54 5 _2 14 -28 + 56 6 -1 14 -14 + 14 7 22 8 + 1 17 + 17 + 17 9 +2 14 +28 + 56 10 +3 8 +24 + 72 11 +4 1 + 4 + 16 140 APPLICATION TO GAMES OF CHANCE. [80 bilities of drawing a black card wo find that p^ — j. If all the tlrawin^s had been i)c'rformed with a constant probability we should accorcHiiij; to the Bernoullian scheme have: Mj, = 27 X I = G.75, cr^ = V27 + i X f = 2.25. These results verify the formulas as obtained under the discussion of the Poisson Series. (Mp = Mg, Cp < cr^.) Lexian Series. — In testing the Lexian Series Charlier first took 10 samples of 10 individual drawings in each sample from an ordinary whist deck. The number of black cards thus drawn was recorded. After this, 10 samj)les of the same mag- nitude were taken from a deck containing 25 black and 27 red cards; and then 10 samples from a deck with 24 black and 28 red cards. Of the total 270 samples (until the deck contains only red cards) Charlier gives the first 100 which gave the following result: Lexian Series. Number (m) of Black Cards in 10 Drawings. s = 10, A^ = 100, Mo = 4. m. X. nx). xF{x) x^'Fix). (x+-l)^F{x). Control Check. 1 -3 4 -12 + 3G + 16 2 -2 9 -18 + 36 + 9 3 -1 19 -19 + 19 + 4- 21 + 21 5 +1 23 +23 + 23 + 92 6 +2 10 +20 + 40 + 90 +294 7 +3 12 +36 + 108 + 192 + 76 8 +4 2 + 8 + 32 + .50 + 100 Sum: 100 +38 +294 +470 +470 The final computations (with mean errors) give: b= + 0.38, M = 4.38 db 0.167, tr2 = 294 : 100 - b'- = + 2.796, See Nunn, "Exercises in Algebra" (London, 1914), pages 432-33. 146 83] STATISTICAL DATA AND MATHEMATICAL PROBABILITIES. 147 which is as important for the statistical analysis as is the gathering of structural materials for the erection of a large building. Mathematical statistics is thus the tool we must use in the final analysis of the statistical data. It is a very effective and powerful tool when used properly by the investigator. At the same time it is not an automatic calculating machine in which we need only put the material and read off the result on a dial. A person without any knowledge whatsoe^Tr about the nature of loga- rithms may in a few hours be taught how to use a logarithmic table in practical computations, but it would be foolish to view the formulas and criteria from probabilities when applied to statistical data in the same light as a table of logarithms in cal- culating work. Such formulas and criteria must be used with caution and discretion and only by those who have taken the trouble to make a thorough study of probabilities and master their real meaning and their relation to mass i)henomena. If put in the hands of mere amateurs the formulas become as dangerous a toy as a razor to a child. It is not our intention to give in this work a description of the technique of the collection of the material, which depends to a large extent on local social conditions and for which it is difficult to give a set of fixed rules. In the following we sluill treat the mathematical methods of statistics exclusively, and furthermore make the theory of probabilities the basis of our investigations, 83. Analogy between Statistical Data and Mathematical Probabilities. ^Let us for the moment imagine a closed commun- ity with a stationary population from year to year and let us denote the size of such a population by s. Let us furthermore suppose we w^ere given a series of numbers: mi, mo, mz, • • ■ m^, denoting the number of children born in various years in this community. The ratios mi mo mz mu s ' ,s ' s ' s may then be looked upon as probabilities of a childbirth in various years. As Charlier justly remarks, "such an identi- fication of a statistical ratio with a mathematical probability is 148 HOMOGRADE STATISTICAL SERIES. [83 ax first sight a mere analogy which possibly may have very little in common Anth the observed statistical phenomena, but a closer scrutiny shows the great importance for statistics of such a view." If such ratios could be regarded as mathematical probabilities wherein the various ??i's were identical to favorable cases in s total trials, the mean and the dispersion could be de- termined a priori from the Bernoullian Theorem. The founders of mathematical statistics regarded the identification of an or- dinary statistical series with a Bernoullian Series almost as axiomatic. This view is found even among some leading writers of the present time. Among others we apparently find this traditional view by the eminent English actuary, G. King, in his classic "Text Book." In Chapter II of this well-known standard actuarial treatise a probability is defined as follows: "If an event may happen in a ways and fail in (3 ways, all these ways being equally likely, the probability of the happening of the event is a -T- (a + /3)." With this definition as a basis King then de- duces the elementary formulas of the addition and multiplication theorems. He then continues: "Passing now to the mortality table, if there be h persons living at age .v, and if these h+n survive to age X -\- n, then the probability that a life aged .r will survive n years is Z^+n ^ h = nPx- And again "tlie probability that a life aged x and a life aged y will both survive n years is „PxX nVv"^ From the above it would appear that the author unreservedly assumes a one-to-one correspondence between the U+n survivors and "favorable ways" as known from ordinary games of chance and a similar correspondence between the original J^ i)ersons and "equally possible cases." A simple consideration will sliow that there exists no a priori reason for such a uiiifiue correspondence between ordinary empirical death rates and mathematical proba- bilities. None of the original h persons can be considered as ' Mr. H. Moir in his "Primer of Insurance" tried to avoid the difTioulty by- giving a wholly new definition of "equally likely events." According to Moir "events may be said to be 'equally likely' when they recur with regu- larity in the long run." Apart from the half metaphorical term "in the long run" Mr. Moir fails to state what he means by the expression "with regu- larity." If the statement is to be understood as regular repet it ions of a certain event in various sample sets, it is evident that we may obtain a regular recur- rence of the observed absolute frequencies in a Poisson Series, where — as we know — the events are not equally likely." — A.F. 84] COMPARISON AND PROPORTIONAL FACTORS. 149 being "equally likely" as in the sense of games of chance. Numerous factors such as heredity, environment, climatic and economic conditions, etc., play here a vital part in the various complexes embracing the original Ix persons. The belief in an absolute identity of mathematical probabilities and statistical frequency ratios seems to have originated from Gauss. The great German mathematician — or rather the dogmatic faith in his authority as a mathematician — proved thus for a number of years a veritable stumbling block to a fruitful devel()})inent of mathematical statistics. Gauss and his followers maintained that all statistical mass phenomena could be made to conform with the law of errors as exhibited by the so-called Gaussian Normal Error Curve. If certain statistical series exhibited discrepancies they claimed that such deviations arose from the limited number of observations. The deviations would become less marked if the number of observed values was enlarged and would eventually disappear as the number of ob- servations approached infinity as its ultimate value. The Gaus- sian dogma held sway despite the fact that the Danish actuary, Oppermann, and the French mathematicians, Binemaye and Cournot, have pointed out that several statistical series, despite all efforts to the contrary offered a persistent defiance to the Gaussian law. The first real attack on tlie dogma laid down so authoritatively by Gauss was delivered by the French actuary, Dormay, in certain investigations relating to the French census. It was, however, first after the appearance of the already men- tioned brochure by Lexis, "Die ]\Iassenerscheinungen, etc.," that a correct idea was gained about the real nature of statistical series. The Lexian theory was expounded in the previous chapters of this work, and we are therefore ready to enter upon the investi- gations of a few selected mass observations from the domain of vital statistics. 84. Number of Comparison and Proportional Factors. — In the mathematical treatment of the Lexian theory of dispersion we tacitly assumed that the total number of individual trials in a sample set or the number of comparison, s, remained constant from set to set. In the observations on games of chance it 150 HX)MOGRADE STATISTILAL SERIES. 84 remained in our power to arrange the actual experiments in such a manner that s would be constant. In actual social statistical series such simple conditions do not exist. In comparing the number of births in a country with the total population it is readily noticed that the poi)ulation does not remain constant but varies from year to year. For this reason the various numbers m denoting the births are not directly comparable with another. We may, however, easily form a new series of the form: Wa', s s s s — • Vli, — • mo, — ■ 7??3, • ■ • — *1 ^2 *3 -^A' wherein the various numbers, mi, mo, ms • • •, corresponding to the numbers of comparison Si, S2, s^, • • ■ , are reduced to a constant number of comparison s. This series is by Charlier called a reduced statistical series. Such a reduction requires, in many Proportional Factoks for a Hypothetical Stationary Population in Sweden and Denmark Equal to 5,000,000 and 2,500,000 Respectively. Sweden, npnmark, Year. Inhabitants. a:ai,. Year. Inhabitants. s:sk 1876 4,429,713 1.1288 1888 2,143,000 1.1666 1877 4,484,542 1.1150 89 2,161,000 1.1569 1878 4,531,863 1.1033 1890 2,179.000 1.1473 79 4,578,901 1.0919 91 2,195.000 1.1390 1880 4,565,668 1.0952 92 2,210,000 1.1312 81 4,572,245 1.0936 93 2,226,000 1.1230 82 4,579,115 1.0919 94 2,248,000 1.1121 83 4,603,595 1.0861 1895 . 2,276.000 1.0984 84 4,644,448 1.0765 96 2,3()6,(K)0 1.0841 1885 4,682,769 1.0677 97 2,33S,(H)0 1.0694 86 4,717,189 1.0600 98 2,371,000 1.0544 87 4,734,901 1.0560 99 2,403,000 1.0404 88 4,748,257 1.0.530 1900 2,432,000 1.0280 89 4,774,409 1.0472 01 2,4()2,00() 1.01.54 1890 4,784,981 1.0449 02 2,491,000 1.0036 91 4,802,751 1.0410 03 2,519.<)()() 0.9925 92 4,806,865 1 .0402 04 2. .546,000 0.9819 93 4,824,150 1 .0365 1905 2,574,000 0.9713 94 4,873,183 1.0261 06 2,603,000 0.9604 1895 4,919,260 1.0165 07 2.()35.()00 0.9488 96 4,962,568 1 .0076 OS 2,(')))S,()()0 0.9370 97 5,00n,()32 0.0981 09 2,7()2,()()0 0.92.52 98 5,062,918 0.9875 1910 2,7:',7,()00 0.91.34 1899 5,097.402 0.9809 11 2,80(),()()0 0.8929 1900 5,136,441 0.9734 1912 2,830,000 0.8834 85] CHILD BIRTHS IN SWEDEN. 151 cases, a certain correction. However, wiien the general ratios s -7- Sk {k = I, 2, 3 • ■ ■ N) are close to unity the reduced series may be treated as a directly observed series. In most of the following examples taken from Scandinavian statistical tabular works the proportional factor s -r- Sk, is close to unity as shown in the table below. For Sweden I have, following Charlier, assumed a stationary population s = 5,000,000. The corresponding Danish s I have taken as 2,500,000. The above figures are taken from " Sveriges officielle statistik " and "Statistisk Aarbog for Danmark " for 1913 (Precis de Statistique, 1913). 85. Child Births in Sweden. — From Charlier's "Grunddragen" I select the following example showing the number of children born in Sweden in the period from 1881-1900 as reduced to a stationary population of 5,000,000. Number of Children born in Sweden as to Calendar Year (Charlier). s = 5,000,000, X = 20, Mo = 140,000. Year. m. TO — Mo. (to - A/o)'. 1881 145,230 +5,230 27,352,900 82 146,630 +6,630 44,089,600 83 144,320 +4,320 18,662,400 84 149,.360 +9,360 87,609,600 1885 146,600 +6,600 43,560,000 86 148,270 +8,270 68,392,900 87 148,020 +8,020 64,320,400 88 143,680 +3,680 13,542,400 89 138,300 -1,700 2,890,000 1890 139,600 - 400 160,000 91 141,070 + 1,070 1,144,900 92 134,830 -5,170 26,728,900 93 136,540 -3,460 11,971,600 94 134,840 . -5,160 26,625,600 1895 136,820 -3,180 10,112,400 96 135,330 -4,670 21,808,900 97 132,750 -7,2.50 52,.562,500 98 134,820 -5,180 26,832,400 99 131,320 -8,680 75,342,400 1900 134,460 -5,540 30,691,600 Sum 2 = = + 53,190 - - 50,390 654,401,400 From which we obtain: 6 = (+ 53,190 - 50,.390) : 20 = 140 M = Mo+b= 140,140 152 HOMOGRADE STATISTICAL SERIES. [86 a- = 654,401,400 : 20 - b"- = 32,700,470, or a = 5,718. The empirical probability of a birth (po) is po = M :s = ().()2S()3, so that qo = I - po = 0.97197 and the Beriioullian dispersion o-fi = ^spo f/o = 369.0. The actual observed dispersion (5,718) is thus much greater than the Bernoullian. The birth series is considerably hyper- normal. The Lexian ratio has the value L = 5,718 : 369.0 = 15.50, while the Charlier coefficient of disturbancy is: lOOp = 4.07. Both the values of L and p show that the birth series by no means can be conij)ared with the ordinary games of chance but is subject to outward ]>erturbing influences. 86. Child Births in Denmark. — The following example shows the corresponding birth series for Dermiark in the 25-year period from 1888-1912 as reduced to a stationary population of 2,500,000. The computation of the various parameters follows: b = (39,713 - 30,287) : 25 = + 377, M = Mo+b= 73,377, cr2 = 281,208,156 : 25 - 6^ = 11,106,197.2, cr/ = s2)o qo = 71,223. (po = M : s = 0.0293508), 1 = ^ .^^= 12.5 lOOp = 100( V^2 _ ^^2) . ^/ = 4 52. NuMUEK OF Children Born in Denmark as to Calendar Year. 5 = 2,500,000, A^ = 25, .1/n = 73,000. Year. m. m — Mo. (m — Mo)^. 1888 78,659 + 5,659 32,024,281 89 77,956 + 4,956 24,561,936 1890 76,154 + 3,154 9,947,716 91 77,377 + 4,377 19,158,129 92 74,059 + 1,059 1,121,481 93 76,965 + 3,965 15,721,225 94 75,9.56 + 2,956 8,740,636 1895 75,649 + 2,649 7,017,201 96 76,183 + 3,183 10,131,489 97 74,404 + 1,404 1,971,216 86 CHILD BIRTHS IN DENMARK. 153 Year. m. m ■-Mo. (m-A/o)2. 98 75,570 + 2,570 6,604,900 99 74,236 + 1,236 1,527,()06 1900 74,146 + 1,146 1,313,316 01 74,341 + 1,341 1,798,281 02 73,058 + 58 3,364 03 71,802 - 1,198 1,435,204 04 72,359 - 641 410,881 1905 70,981 - 2,019 4,076,361 06 71,280 - 1,720 2,958,400 07 70,516 - 2,484 6,170,256 08 71,438 - 1,567 2,455,489 09 79,597 - 2,403 5,774,409 1910 68,777 - 4,223 17,833,729 11 66,016 - 6,984 48,776,256 1912 85,952 - 7,048 49,674,304 Sum: 2 = -30,287 +39,713 281,208,156 Practically the same deductions hold true for this Danish series as for the Swedish series. We meet again a hypernormal series subject to perturbing influences. The closeness of the two values of the Charlier coefficient of disturbancy indicates that the number of births in Sweden and Denmark apparently are subject to the same outward disturbing influences. 87. Danish Marriage Series. — The following table shows the number of marriages in Denmark from 1888-1912. Number of M ARRIAGES ] [N Denmark. &■ = 2,500,000, .V = 25, . Mo = 18,000. Year. m. m — Ma. (m - ilfo)». 1888 17,605 - 395 156,025 89 17,622 - 378 142,884 1890 17,181 - 819 670,761 91 17,017 - 983 966,289 92 17,012 - 988 976,144 93 17,676 - 324 104,976 94 17,445 - 555 308,025 1895 17,736 - 264 69,696 96 18,239 + 239 57,121 97 18,676 + 676 456,976 98 18,870 + 870 756,900 99 18,661 + 661 436,921 1900 19,015 + 1,015 1,030,225 01 17,870 - 130 10,900 02 17,712 - 288 82,944 03 17,791 - 209 43,681 04 17,895 - 105 11,025 1905 17,947 - 53 2,809 154 HOMOGRADE STATISTICAL SERIES. [87 Year. m. 06 18,592 07 19,072 08 18,750 09 18,453 1910 18,255 11 17,749 1912 18,034 - 251 in—Mo. (m-Afo)". + 592 350,464 + 1,072 1,149,184 + 750 562,500 + 453 205,209 + 255 65,025 63,001 + 34 1,156 Sum: 2 = -5,742 +6,617 8,686,841 Hence we have: b = (6,6-17 - 5,742) : 25 = 35, M = Mo + 6 = 18,035. -^ [^1 + ^2 + • • • Sy\ Hsk -TTik — spo ) 2 - {nik—SkPo)- In finding the theoretical dispersion, assuming a Bernoullian distribution for which po may be used an an approximation of the mathematical a priori probability, we ask the reader to examine the general term of the expression for a", viz.: ^ (mk — •npo)- : ^Sk. If the individual trials follow the Bernoullian Law the expected value of the factor {rtik — SkPoY takes the form: e[(mk — .s'kpo)-] = 2(»^. — .n-poYtpinik) = Skpoqo. This brings the general term for cr- to the form: o S' S ^ PoQo = ^r- spoqo- 160 HOMOGRADE STATISTICAL SERIES. [90 Thus the expected value of a- accordhig to the Bernoulliaii distribution may be written as follows: ^•=-v g jV"* o-b" = 2 s^spoqo = -^-spoqo, or : a^ = fVspoQo, where as before /^ = -"U- : -^t find / = ^Z- JNs_ These formulas give us the means of computing the Lexian Ratio and the Charlier coefficient of disturbancy in the ordinary way. Some of the computations require, however, a great amount of arithmetical work and the goal is reached more easily by making use of the mean deviation (in § 74a). We found there the following relation: In the w^eighted series it is readily seen that the value of d will be of the form: \Sk Z^j vik — Skpo ^Sk ^Sk If the series may be assumed to follow a Bernoullian distri- bution we have o-g = 1.253:]t?. P'rom the above formulas it is readily noticed that we may find the mean and the dispersion directly from the observed series without a preliminary reduction to a common number of com- parison s. This is in fact the method used in the above example of coal-mine accidents in various states. We have: 2^0 Znik : ^Sk = 2,167 : 726,659 = .002982, ^ ^ 2.9! w, — Skpoj ^ ^^^^^ ^ ^^^ . 726.659 = 0.9757, a = 1.253.3 X I? = 1.223, "^0 000 (Ts' = P-nm^ = ;^.,^ X 1,000 x 0.997 x 0.003 = 0.O817, lOOV^-cr^^ lOOp = ^ = 40 approx. 91 ] SECULAR AND PERIODICAL FLUCTUATIONS. 161 The large value of the Charlier coefficient of disturbancy clearly shows that conditions in coal mines by no means are uniform in the whole union but vary greatly according to the locality. An actual computation shows in fact that in a few states such as ^Michigan and Iowa we find an imaginary coeffi- cient of disturbancy whereas States as Ohio and West Virginia exhibit marked hypernormal series with a large coefficient of disturbancy. The establishment of this fact is of some im- portance in connection with accident assurance. Many sta- tisticians seem to be of the opinion that a standard accident table computed from the data of the whole union ought to serve as the basis for assurance premiums. Such a table would assume uniform conditions all over the union. The enormously high value of p as computed above shows the fallacy of such a view. 91. Secular and Periodical Fluctuations. — In the last para- graphs we have just learned how to detect the presence of dis- turbing influences in a statistical series. A value of the Lexian ratio differing from unity or a value of the Charlier coefficient of disturbancy differing from zero indicates the presence of fluc- tuations in the chances for the event or phenomena under in- vestigation. After having established the presence of such fluctuations it is the duty of the statistician to trace the sources of the disturbing influences. This is in general done by means of the theory of correlation, which will be discussed in the second volume of this work. It is, however, possible to classify the disturbances under two categories which by Charlier are termed as secular and periodical variations.^ The periodical fluctuations are in general difficult to discuss on account of the variations in the period of the dis- turbing forces. In many cases we are in absolute ignorance about the length of such a period and therefore unable to subject the series to a mathematical analysis. If the length of the period is known it is indeed not difficult to determine the periodical disturbances. This is often the case in series giving the occur- rence of a certain disease in various months. In statistics giving the frequency of malaria in a community the observed cases are 1 Lexis uses the terms "evolutionary" ("symptomatic") and "periodical" ("oscilating") for such fluctuations. 12 102 HOMOGRADE STATISTICAL SERIES. [91 nearly all limited to the warmer months and infrequent in the winter months. In the secular fluctuations due to certain outward influences workinj; continually in the same direction it is quite easy to calculate the rate of such variations. Let /3 denote the increase (decrease) of the original probabilities (pi, p-2, Pa, ■ ■ • Pn) from set to set in the given statistical series so that P2 — Pi = /3 Ps — P2 = Pn - Pn-i = /3 We then have: Pk = Pi + a- - i)/3. (1) The mean probability has the value: Pl + P2 + P3 + ■ ■ ■ + Pn Po =- N _ Pi + Pi + ^ + Pi + 2/3 + • • • + Pi + (iV - 1)^ ,.,, N ^"^ = Pi + ^^/?. Eliminating pi from (1) and (2) we have: Pk - Po= (^ ^- - 2 ) ^' If the observed and reduced numbers »?i, Wo, '"3, • • • ^y may be regarded as approximately coinciding with ,s7>i, sp^, sp^, • ■ ■ sp.^ we may write (2) as follows: ('■ - m In order to obtain an expression for .

such a procedure is possible, theoretically at least, we should, however, in most cases find it a very tedious and laborious task in actual practice. It, therefore, remains to be seen whether it is possible to transform these sym- metrical functions of the power sums of the observations into some other symmetrical functions, which are more flexible and workable in practical computations and which can be expressed in terms of the various values of s. 104. Semi-Invariants of Thiele. — It is the great achievement of Thiele to have been the first mathematician to realize this possibility and make such a transformation by introducing into the theory of frequency curves a peculiar system of symmetrical functions which he called semi-invariants and denoted by the symbols Xi, X2, X3 . . . Starting with the power sums, s», Thiele defines these by the following identity XiW , Xjoj' , Xaw' , „ , , „ , .■> „ , .3 Soeli !2 13 =so+— --I--2- + — ^+... (1) which is supposed identical in respect to co. Since s,- = So' the right hand side of the equation may also be written as e"'" + e"^ + e"^ -f . . . = Se'^i'^. 192 THEORY OF ERRORS AND FREQUENCY CURVES. 104 Differentiating {I) witli respect to co we have Soe XlO) \~Cx}- /V3W' TT"^ |2 "^ |3 ■^• X2C0 X3W- A, + -r|-+^- + ] S3 So + pj-CO + 12CO- + TT^CO'^ + X2CO X3 ]■ . So 83 o , S4 3 , = Si + ^w + 12 w- + i^w"* + Multiplying out and equating the various coefficients of equal powers of co we finally have: — Si = XiSo 52 = XiSi + X2S0 53 = XiSo + 2X2S1 + X3S0 54 = X1S3 + 3 X2S2 + 3 X3S1 + X4S0 where the coefficients follow the law of the binomial theorem. Solving for X we have Xi = si : So X2 = (.S9S0 - si) : So X3 = (S3S0 - 3s2SiSo + 2s?) : So Xi = (.s'4.sb — 4S3S1S5 — 3s2-% + 12s2sTso — 6si) : Sq The se:ni-invariants X in respect to an arl)itrary origui and unit are definetl hv the relation ■%e Xico Xow^ X.iw' IT "^12"'^ IF "*■• • = f"'^-\- f." ^-\- c"^"^ -\. where Oi, 02, 03 . . . are the individual observations. Let us now change to another coordinate system with another unit and origin defined by the following linear transformation Oi = aoi + c The semi-invariants in this new system are given by the relation Soe TT |2 ^ 13 ■•"• • = e'""+e"-""+ ... = e (aoi + r)co_j_ (ao: + c)aj + 104] SEMI-INVARIANTS OF THIELE. 193 Since the various values of X' do not depend upon the quantity CO we may without changing the value of the semi-invariants re- place CO by CO : a in the above equations which give \ CO X CO- X,CO' CO , , ^co , , ^co a{i a-\Z a-'l-i a i "■ i "■ , soe = e -\- e -\- e +••.. rcoT -| <> CO "iCO O.iCO rco Xico X;co- Xsco' = e Soe Taking tlic logaiithins on both sides of the equation we have Xi'co X/co- X;/co"^ ceo Xito Xoco"- Xjco^ 'W' a^ a^ ~^^ir Tr IT Diff(M('ntiating successivel}^ with respect to co we have Xi' X/co Xs'co'- c Xoco X.-iCo'- IJa a- If a' a \1 \A \>' , X.j'co X4'co- XjCo- X:/ Xi'cO 4+ -^ + . . . = X3+ X,'a;+. . . a"* a' Letting co = we therefore have — - = — + Xi, or X/ = aXi + c a a — =■ = X>, or X/ = a'-'X) a- " X3' — =- = X;!, or X3' = a^Xs from which we deduce the following relations Xi ( a.r + c) = a Xi (x) + c Xr(a.r -\- c) = a^Xrix) for r > 1 We shall for the present leave the semi-invariants and only ask the reader to bear in mind the above relations between X and s, 194 THEORY OF ERRORS AND FREQUENCY CURVES. [ 105 of which we shall later on make use in determining the constants in the frequency curve (p{x). Before discussino; the generation of the total frequency curve it will, however, be necc^ssary to demonstrate some auxiliary math- ematical formulae from the theory of definite integrals and integral equations which will be of use in the following discussion as mathematical tools with which to attack the collected statistical data or the numerical observations. 105. The Fourier Integral Equation. — One of these tools is found in tiie celebrated integral theorem of Fourier, which was the first integral equation to be successfully treated. We shall in the following demonstration adhere to the elegant and simple solution In' M. Charlier. Charlier in his proof supposes that a function, F(co), is defined through the following convergent series. F{oj) = a [f(o) 4- /( a)c'^'^' +/(2a)e''^"^ + . . . + /(- a)^-'''^' + /(-2a)e-'«'^' + . . . ] 771 =0O or F{co) = a^/(am)e'^"'"'' TO = - OO (2) where i = \/- 1 We then see by a well known theorem of Cauchy that the integral + 00 /(co) = ff(x)e''"dx'- ^^^ is finite and convergent. If we now let ma = x and let a = as a limiting value, a becomes equal to dx and /(am) = fix). Consequently we may write Urn F((x)) = I{co). a = o Multiplying (2) by e~''°'^'do) and integrating between the limits — 7r/ a and + tt / a we get on the left an expression of the form I F(co)c~'"'^'^V/co and on the right a sum of definite integrals -x/a of which, however, ail but the term containing /(?• a) as a factor will vanish. This particular term reduces to 'See Goursat: Mathetnalical Analysis (English Translation, New York), page 364. 106] FREQUENCY FUNCTION SOLUTION OF EQUATION. 195 a I f(ra)di lea or 2ir/(ra) - TT a Hence we have +x/a /(ra) = -^ fFioj) e-'-'^"'dco ^^^^ -•rc/a By letting a converge toward zero and by the substitution ra = X this equation reduces to + 00 fix) = ^//(co)e— -rfc- (^^^ fco We then have, if we introduce a new function (p(o}) defined by the simple relation : ■\/2t ^(co) = lim F{c>)), or a= (5a) (5b) •^^^^ = ^ J'^^^^^~'""^ CO Charlier has suggested the name conjugated Fourier function of f{x) for the expression (pi 03). The equations (5a) and (5b) are known as integral equations of the first kind. The expression e^"'(or e""^"') is known as the nucleus of the equation. If in (5b) we know the value of j(p{Oi)e- - - = ^ ) (pjode"''^ For continuous variates, x, the above sums are transformed into definite integrals of the form X, , X, ,, X3 ,, +^ +°? |1 |2 ^13 ^ e" j (p{x)dx = I This definite integral was first evaluated by Laplace by means of the following <^legant analysis. Using the well known Eulerean relation for complex quantities the above integral may be written as -|-00 +°° e 2 cos [(Xi — :r)coJ(/co+ ?" \ e ^" sin [(Xi — x)aj] rfco — 00 — 00 The imaginary number vanishes because the factor e '^ is an even function and sin [(Xi — x)o}\ an uneven function, and the area from — ^^ to will therefore equal the area from to -|- 00 ^ but be opposite in sign, which reduces the total area from — 00 to + °° or the integral in question to zero. In regard to the first term, similar conditions hold except that cos [(Xi — x)co] is an even function and the integral may hence be written as -r + 00 - hi 2 ' COS {r(i))(I CO where r = \i — x Regarding the parameter r as a variable and differentiating / in respect to this variable we have dl 2 r, . -^co^^ w ^ / J- = — I {— hocoe - ) sm (rw) ao) From this we have by partial integration: — (U dr 4-0O 00 — e - '^ sin {rco)da3 ~ ^ I e - '^ cos {rco)da) \2 L J X2 «/ ^ rT 1 dJ r = — — or __=- — X2 / dr X2 198 THEORY OF ERRORS AND FREQUENCY CURVES. [ 107 From \\hicli we find log / = ~ 2V. "*" '^^ ^ where log .1 is a constant. Hence we have: — — r- I = Ae^' In order to determine ^4 we let r = and we have 00 This finally gives the expression for ^„(.r) in the following form: 1 -^xr \/27rXi as a preliminary approximation for the frequency curve (p{x). The first mathematical deduction of this approximate expression for a fi-cquency curve is found in the monumental work by Laplace on Probabilities, and the function (Po(x) entering in the expres- sion (po{x)dx, which gives the probability that the variate will fall between x — ^dx and x + \dx, is therefore known as the Lapla- cean probability function or sometimes as the Normal Frequency Curve of Laplace. The same curve was, as we have niention(>d, also deduced independently by Gauss in connection with his studies on the distribution of accidental errors in precision measurements. Laplace's probability function, (po{x), possesses some remark- able properties which it might be well worth while to consider. Introducing a slightly different system of notation by writing Xi = M and \/\o = a, (p„{x) reduces to the following form: J - (x - My:2(7'' which is the form introduced by Pearson. The frequency curve, (Po{x), is here expressed in reference to a Cartesian (•f)()rdinate system with origin at the zero point of the natural number system and whose unit of measurement is also equivalent to the natural number unit. It is, however, not neces- sary to use this system in preference to any other system. In fact, we may choose arbitrarily any other origin and any other unit 108] hermite's polynomials. 199 standard without altering the properties of the curve. Suppose, therefore, that we take M as the origin and a as the unit of the system. The frequency function then reduces to where z = (x — M) : a Since the integral of (Poiz) from — oo to + «> equals unity the following equation must necessarily hold. + 00 -t-' This latter result may, however, be deduced independently of the fact that of them is identically zero in the plane (3) every pair of th(Mn, (p^{z) and Hm{z) satisfy the relation 4-00 J (Pniz)Hm(z)dz = (n 5 w) 109 ] ORTHOGONAL FUNCTIONS. 201 We have the self-evident relation -j-OO 4-0O -1-00 J Hm(z)(pniz)dz = J H^(z)H„{z)(Poiz)dz = J HJz)(pJz)dz — CO — 00 — oo Since this relation holds for all values of m and n it is only neces- sary to prove the proposition for n > m. For if it holds for n > m it will according to the above relation also hold for n < m. By partial integration we have: — -<-00 -^00 +0O J Hm(z) partial integrations J H^{z)iPn{z)dz = (— 1)" J Hn^''\z) . If we know that (p{z) can be developed into a series of this form, which after multiplication by any continuous function can be integrated term for term, then we are able to give a formal determiiiatipn of the coefficients c. This foi-mal determination of any one of the c's, say c„ consists in multiplying the above series by Hi{z) and integrating each term from — 00 to + «> . All the terms except the one containing the product Hi{z)ip,{z) vanish and we have for c,. j and which vanishes together with its derivatives for 2 = ± <» can be developed into an infinite series of the form: — ^{z) =y\cie-''--''H,{z) where Hiiz) is the Hermite polynomial of order z." 111. Derivation of Gram's Series.^It is, however, not our intention to follow up this treatment which is outside the scope of an elementary treatise like this and shall in its place give an approximate representation of the frequency function, (fiz), by a method, which in many respects is similar to that introduced by the Danish actuary Gram in his epoch-making work " Udvik- lingsraekker," which (contains the first known systematic develop- ment of a skew frecjuency function. Clram's problem in a some- what modified form may l)ri(^fly be stated as follows: — Being given an arbitrary relative frequency function, to + oo . In all the above expansions of a frequency series we have used the expression o" is technically known as the dispersion or standard deviation of the series of variates, we can make the second coeffi- cient Co vanish. In respect to the coefficients C3 and c\ we have now -fco -(-00 -f-co (- C3 = 73^ [J z^F{z)dz - 3 J 2F(2)rf2l: J F{z)dz I— nn _. QD -J — CD which reduces to — \^ — , while 4-00 4-°° +°° +°° C4=-^|4^ J z'Fiz)dz—6J z'F{z)dz+^j F{z)dz\: j F{z)dz, '— — 00 00 — 00 —I — CO which reduces to — Fl! _ ^'^2 , 3.S0I _ ± [84 _ ^"1 HLso 60 "^ soj" K U J While the coefficients of higher order may be determined with equal ease it will in general be found that the majority of mod- 118 J COEFFICIENTS EXPRESSED KY KF.MI-FNVARIANTS. 209 erately skow frequency distributions can be expressed by means of the first 4 parameters or coefficients. 113. Coefficients expressed by Semi-Invariants. — We shall now show how the same lesults for the values of the coefficients may be obtained from the definition of the semi-invariants. Since we have proven that a frequency function, F{z), may be expressed by the series F{z) = 1:c,ul)lished /'"(7-.s/ Course in Statis- tics claims the method of least scjuan's "wliich is the traditional way of approaching all such problems, is shown to be impracticable in a large number of cases, either because the resulting equations cannot be solved, or, when they are capable of solution, because the labour involved would be colossal." This objection falls, however, to the ground in the case of the expansion of a frequency function in serial form because the un- known parameters, with the exception of the origin (the mean) and the unit (tlu; dispersion) of the co-ordinate system, all appear as coefficients in true liruuir equations and hence are eminently adaptable to the treat- ment by least s(iuarcs. The attitude of these writers is probably due to the fact that they work exclusively with the Pearsonian type of frexjuency curves where the function, F(z), is given as a closed expression rather than as an expansion in serial form. In nearly all IVarson's cur\e types there appear not more than four constants wliich in a measure acciounts for the often successful application of the method of monu^nts, although several of the examples presented })%• Mr. .Jones in his book can scarcely lie said to be recommenda- tive to Pearson's theory. On the other hand, it is a great drawback, not being al)le 1o have mor(> than four constants at our disposal. Personally I have encountered a large number of statistical s<'rii'S where the Pear- sonian theory fails. This same fact is also noted l)y Jorgensen who on Fiage -V.) of his l-'rek-rensjUnler og Korrelation states that "jeg kender flere agttagelsesraekker. livor Pear.son's Teori svigter lot alt." In the purely theoretical development it matters but little whether we use moments or least squares in the expansion of a frecpiency function in a series; a fact which is readily seen from our previous d(!monstrations. In the purely practical work wo have, however, this fact to consider, 116] REMARKS ON CRITICISMS 217 that the method of moments works exelusiv^ely with ansas expressed as definite integrals, which are often ditlicult to deti-rmine in extremely skew distributions. And it is only by successive approximations that we in this manner reach a i)hiusil)le result. Moreover, unless the observa- tions are very numerous, it is almost hopeless to comi)ute tlui moments of higher order than the fourth, because of the very largt; (irrors arising from random sanii)liug. ("iiarlier in one of liis monographs asserts that it is generally ust>lt'ss to comi)ute moments of higher order tiian the second when the number of individual ol)servations in the statistical series is less than 1000. Thiele gives the following brief rules: For the first and second semi-invariants rely exclusively on the observed data. For semi-invariants of higher order than 6 rely exclusively on theoretical considerations. For intermediate semi-invariants (between the 2d and 6th) rely partly upon theory and partly upon the observations. Caradog Jones, on the other hand, lustily ventures forth with moments of the fourth order, based upon 241, and in some instances even as low as 180 indi\idual observations. It is, therefore, no wonder that some of his results exhibit a somewhat poor "fit" with the original data. Another criticism which may be lodged against the method of moments as used by some adherents of the Pearsonian school, rather than l)y Pearson him- self, is that it works with unweighted observations, and the values of the extremities of the frequency cur\es are gi\'en the same weights as the more numerous observations in the immediate neighborhood of the mean. A second objection, raised among others by Elderton, is that the ex- pansion in serial form sometimes gives rise to negative frequencies at the extreme tail ends of the curve, due of course to the fact that we have used a limited number of terms of the series. From purely practical con- siderations this objection counts little, because the observations at the extremeties are very few in numlier. It matters, for instance, but little in ordinary calculations of assurance premiums whether the upper limit of a mortality table is at 90 or at 100, and when Pearson from liis curves actiudly has attemi)ted to put an ui>pfr limit to the duration of human life, he has, to l)orrow an expression from the Danish biologist, Johannsen, begun to handle biology (is mathematics and not with mathematics. In this connection it may also be noted that the Pearson Type I curve gives imaginary values beyond certain limits. When now certain followers of the Pearsonian school have considered this as an advantage and tried to interprete the limits as possbile values of repeated or i)resumpti\e observa- tions, it seems that such disciples have stretched their poinlJ a bit too far. It is not possible to see why negative results should be less plausible than imaginary results. Every student of ordinary algebra knows that the "iniaginary" quantities are just as valid as the so-called "real" quantities, and it is prol>ably the choice of this unhai>py and ill-chosen nomenclature which has gi\en rise to the abo\e extravagant claims of some of the fol- lowers of Pearson. Finally some English and American actuaries have objected to the arbitrary choice of the parameters in the Oram or Charlier expansions. Unless i have completely misunderstood Mr. Elderton this is one of his his chief criticisms against Charlier's method. With my best intentions I cannot agree to this and will even go so far as to say that Mr. Elderton's criticism really speaks in favor of the methods put'forth by the Scandi- navian scholars. As we have repeat I'dly emphasized in the preceding paragraphs, the arbitrary choice of ci and c-2 amounts mathematically to the choice of an arbitrary origin and unit in the Cartesian co-ordinate system to which surely ru) mathematician will make objections. Neither can objections be raised from t lu^ point of view of common sense. We might as well object to the meter as a uiut of measure in preference to the yard, or to reckoning the solar time from the Greenwich meridian instead of the meridian of Paris. 218 DETERMINATION OF PARAMETERS [117 The failur(> of tlu' luctliod ol' iiioiiuMils to coTiiputc with any degree of aceunK'v luoineiits of liigher order in tlie case of the niajority of ordinary obser\ations is i>robably the reason why some actuaries, especially in America, have maintained that the Gram or Cliarlier A tyi>e of fre- quency curAes is not ])owerful enough to represent more than moderately skew i'requt'ncy distributions. In spite of the incontrovertible fact that the most recent researches in the theory of integral equations have demonstrated beyond doubt that any freqiu'ucy curve can be de\("loped in convergent series by Hermite polynomials in conjunction with tlie normal Laplacean frequency curve an American actuary, Mr. Merwyn Davis, has taken the "bull by the horns" so to speak and lioldly gone on record with the positive statement that "the C'harlier series fails completely in cases of appreciable skewness." With all due respect for this young matador who has so boldly entered the ring to challenge the work of some of the most eminent mathemati- cians in th(^ realm of integral equations I feel, however, that if Mr. Davis has actually succeeded in "throwing the bull" it is only in the sense as implied in the colloquial slang of his native America. In fact, we shall presi'utly in some of our examples take up the challenge of Mr. Davis and show that the series he so curtly rejects can — by means of a simple transformation — be ixsed on decidedly skew frequency distributions with even greater success than the Pearsonian curve types. With these preliminary remarks we shall now proceed to give several examples of the application of the Gram or Lapla- cean — Charlier frequency series, employing either the method of moments or the method of least squares in the numerical determination of the constants, although preference will be given to the latter method in cases of appreciable skewness or excess. It is, however, not our intention to go into details of the method of least squares and its relation to error laws, except in its connection with the problem of maximum and minimum. Any number of standard treatises are now available on the sub- ject, however, to which we may refer interested readers.* 117. Charlier's Scheme of Computations. — The general formulae for the semi-invariants were given on page (192). In practical work it is, however, of importance to proceed along systematic lines and to furnish an automatic check for the cor- rectness of the computations. Several systems facilitating such work have been proposed by various writers but the most simple and elegant is probably the one proposed by M. Charlier and which is shown in detail with the necessary control checks on the following pages. Charlier employs moments, while we in the following demonstration shall prefer the use of the semi- invariants. * A particularly attractive presentation in Englisii is found in David Brunt's Com- bination oj ObHrrvations (Cambridge, 1918). 117] charlier's computation scheme 219 If we define the power sums of the relative frequencies ^{x) f-\- .o r'^ CO x'F{x)dx: I F{x)dx for r = 0,1,2,3 . . , . we find that the expressions for the semi-invariants as given on page (192) may be written as follows: Xi =mi /l2 = TO2— mr /L-! = ma — Sw-iWi +2mi''' X4 = W4 — 4???.tmi — ^m-r + 12?W2?Wi" — 6wi^ The advantages of the Charlier scheme for the compuation of the semi-invariants lies in the fact that it furnishes an auto- matic check of the final results. If we expand the expression {x-\-iyF{x) we have: x'F{x) +4x^F(x) +6x-T(x) +4xF(x) -\-F{x) or 2](x + l)T(a:)=S4+4s3-f6s2+4si+So, which serves as an independent control check of the computa- tions. Moreover, another check is furnished by the relation W4 = X +4miX3 + 6mi-X2 + 3Xo- + mV. In order to illustrate the scheme we chose the following age distribution of 1130 pensioned functionaries in a large American Public Utility corporation. Ages No. of Pensioners Ages No. of Pensioners 35-:i9 1 6.5-69 28j 40-44 6 70-74 248 45-49 17 7r>-79 128 50-54 48 80-84 38 55-59 118 8.5-89 13 . 60-64 224 o\'(T 90 3 Choosing the age of 67 as a provisional origin the Charlier scheme is shown in detail on next page. The computation below gives the numerical values of the frequency function which now may may be written as follows: F(x)=1130[<^oU)+.0258.^3(x)+.0158^4(x)] where -[Ct.^^Y (f(i(x) = e 1.624 \ 27r 220 DETERMINATION OF PARAMETERS [118 Ages 35-39 40-44 45-49 50-54 55-59 60-66 65-69 F(x) 1 6 17 48 118 224 2S6 700 xF{x) 6 30 68 144 236 224 a;2F(x) 36 150 272 432 472 224 x3F{x) 216 750 1,088 1,296 944 224 x*F(x) (z+l)*F(z) 708 1,586 1,296 3,750 4,352 3,888 1,888 244 4,518 15,418 625 1,536 1,377 768 118 286 4,710 70-74 + 1 248 248 248 248 248 3,968 75-79 +2 128 256 512 1,024 2,048 10,368 80-84 +3 38 114 342 1,026 3,078 9,728 85-89 +4 13 52 208 832 3,328 8,125 90-94 +5 2 10 50 250 1,250 2,592 over 95 +6 1 6 36 216 1,296 2,401 2 430 686 1,396 3,596 11,248 37,182 Sr 1,130 — 22 2,982 -922 26,666 41,892 wir= 1.0000 -.0195 2.6378 -.8156 23.5699 Xi =mi = — .0195 m2= 2.6378 S4= 26,646 Xi2=wi2 = .0004 -mi2 = -.0004 4s3 = -3,688 Xi^ = mi^ = .0000 X2= 2.6374=0-2 6s2= 17,892 Xi^ =rai'* = .0000 ■Vxi= 1.6240=0" 4si = - 88 4.2831=0-3 «o= 1,130 6.9558=0-4 41,892 wi2mi = —.05 13, i/ismi . = .0159 m-i^ =6.9580, moniy^ = .0010 mz = - -.8156 nn = 23.5699 X4= 2.6450 — 3m2TOi = .1539 — 4»i3Wi = — .0636 4miX3 = .0516 2mi3 = .0000 -3^22 = -20.8740 6mi2X2 = 0060 X3=- .0017 12m2TOi2 = .0127 3X22=20.8677 -6wu4 = .0000 X4 = 2.6450 wu" = .0000 23.5703 =W4 C3 = X3 -.a^ = - -.1545 C4=X4:o-4=.3803 -C3:3!=.0258 C4:4! = +.0158 118. Comparison Between Observed Data and Theo- retical \ allies. — The next step is now to work out the numeri- cal values of F(x) for various values of x and compare such values with the ones originally observed. This process is shown in detail in the following scheme: 119 J OBSERVED AND THEORETICAL VALUES 221 (1) (2) (3) (4) (o) (Oj (7) (8) (9) (10) Obs. X I— Xi {x—\^)■.a^p^^{^) ifsiz) (p^iz) -7 -G.9S -4.:«)0 .OOOl +.0058 +.0170 +.0001 +.000:5 .000.') -6 r.<)s :5.()S'j .000.-) .0170 .0470 .001.') .ooos .oo-js ■_' 1 -.5 4 OS :5.0()7 .()():)() .0710 .12()7 .OOIS .OOiO .0074 .5 -4 3 OS 2.4.-)l .019S .14.-)S +.0()()2 .0(«S +.0000 .024.-) 17 17 -3 2.98 1.835 .0741 +.0.500 -.4345 +.0013 -.00()S 0()S() 48 48 -2 1.98 1.219 .1897 -.3.502 -.7036 -.0090 .01 1 1 . KiOO 118 118 -1 -0.98 -0.()03 .3320 -.5287 +.31()0 -.013() -.00.-)0 3140 219 224 +0.02 +0.012 .3989 +.0143 1.1963 +.0004 +.0189 .4182 291 286 + 1 1.02 0.628.3273 ..53.59 +.2.584 .0138 +.0041 .34.52 241 248 +2 2.02 1.244 .1835 +.3325 -.7157 +.0086 -.0113 .1808 126 128 +3 3.02 1.8()0 .0707 -.0605 -.4094 -.0015 -.()()()5 .0()27 44 38 +4 4.02 2.475.0186 .1443 +.0703 .0037 +.001 1 .0212 15 13 +5 5.02 3.091 .0034 .(UiSO .1241 .0018 .0020 .003() 3 2 +6 6.02 3.707 .0004 .0165 .045() .0004 .0007 .0007 1 1 +7 +7.02 +4.322 .0001 -.0050 .0162 -.0001 +.0003 .0003 Column (1) gives the values of the variate x reckoned from the provisional origin, or the centre of the age interval 65-69. (2) is X less the first semi-invariant, whereby the origin is shifted to the mean or a. Column (3) represents the final linear trans- formation : z= {x — Ai) :^> = 0.9576). We now propose to express the observed function F{x) or (p{z) by a Gram-Charlier series of the form: F{x) = ractical curve fitting to the most diverse kinds of statistical data I have had occasion to use both the Pearsonian and the Gram — Charlier tyi)e of curves, and while I fully recognize the theoretical elegance and apparent simplicity of the Pearson system, I feel nevertheless that from the point of view of the practical computer the older system as devised by the Scandinavian investigators is to be preferred in comparison with the methods advocated by the followers of the distinguished founder of the English Biometric school. *Mr. Carver s able anri intorfslinp; analysis by means of finite diderence equations is, however, to a Kreat extent antcccded by Iho much earlier Danisii memoirs of Opper- man and dram where the finite diirin^nee e(|uation methods arc diseussed. CHAPTER XVI LOGARITHMICALLY TRANSFORMED FREQUENCY FUNXTIONS 122. Transformation of the Variate. — While it is always possible to express all frequency curves by an expansion in Hermite polynomials, the numerical labor when carried on by the method of least squares often involves a large amount of arithmetical work if we wish to retain more than four or five terms of the series. Other methods lessening the arithmetical work and making the actual calculations comparatively simple have been offered by several authors and notably by Thiele, who in his works discusses several such methods. Among those we may mention the method of the so-called free func- tions and orthogonal substitution, the methods of correlates and the adjustment by elements. The chapters on these methods in Thiele's work are among some of the most import- ant, but also some of the most difficult in the whole theory of observations and have not always been understood and appre- ciated by the mathematicians, chiefly on account of Thiele's peculiar style of writing. A close study of the Danish scholar's investigations is, however, well worth while, and Thiele's work along these lines may still in the future become as epochmaking in the theory of probability as some of the researches of the great Laplace. The theory of infinite determinants as used by M. Fredholm in the solution of integral equations is another powerful tool which offers great advantages in the way of rapid calculation. All these methods require, however, that the student must be thoroughly familiar with the difficult theory upon which such methods rest, and they have for this reason been omitted in an elementary work such as the present treatise. We wish, however, to mention another method which in the majority of cases will make it possible to employ the Gram or Laplacean — Charlier curves in cases with extreme skewness or excess. We have here reference to the method of logarithmic transformation of the variate, x. 235 236 TRANSFORMED FREQUENCY CURVES [124 123. The General Theory of Transformation. — ^One of the simplest transformations is the previously mentioned linear transformation of the form z^f{x) =ax -(- h, by which we can make two constants, Ci and ci vanish. Other transformations sug- gest themselves, however, such as f{x) =ax--\-bx-\-c, f{x) = V x, f{x) =logx and so forth. For this reason I propose to give a brief developm.ent of the general method of transformations of the statistical variates, mainly following the methods of Char- lier and J0rgensen. Stated in its most general form our problem is: If a fre- quency curve of a certain variate is given by F{x) what will be the frequency curve of a certain function of x, say /(a:)? The equation of the frequency curve is y^F{x), which means that F{x)dx is the probability that x falls in the interval between x — Yzdix and x-\-'^-?d^- The probability that a new variate z after the transformation z =f{x), ovx(z) =x, falls in the interval z — }4dz and z-\- y^dz is therefore simply F[x{z)]x\z)dz = F{x)dx, which gives in symbolic form the equation of the transformed frequency curve. The frequency for z =f(x) is of course the same as for x. The ordinates of the frequency curve, or rather the areas between corresponding ordinates, are therefore not changed, but the abcissa axis is replaced by f{x). Equidistant intervals of x will therefore not as a rule — except in the linear transformation — correspond to equidistant intervals of /(z). If, for instance, the frequency curve F{x) is the Laplacean normal curve 1 _ j-2 --zcri F(x)= — ^e ■ o-V27r and if we let z =f(x) =x- or x = \ z, we have evidently 1 e-^:-2<^^ F(z)=— -— ayj 2t 2\ Z 124. Logarithmic Transformation. — Of the various trans- formations the logarit h mlc is of special importance. 1 1 happens that even if the variate ./• forms an extremely skew frequency distribution its logarithms will be nearly normally distributed. 124] LOGARITHMIC TRANSFORMATION 237 This fact was already noted by the eminent German psycholo- gist, Fechner, and also mentioned by Bruhns in his Kollek- tivmasslehre. But neither Fechner nor Bruhns have given a satisfactory theoretical explanation of the transformation and have limited themselves to use it as a practical rule of thumb. Thiele discusses the method under his adjustment by ele- ments, but in a rather brief manner. The first satisfactory theory of logarithmic transformation seems to have been given first by J0rgensen and later on by Wicksell.* J0rgensen first begins with the transformation of the normal Laplacean fre- quency curve. Letting z = logx and bearing in mind that the frequency of x equals that of logx we have z =f(x) = logx, or x = x {z) = e' ami dx — e'dz The continuous power sums or moments of the rth order around an arbitrary origin take on the form /-(- >i 1 / J — m\.2 f^x Ifl'x.li — m\'l x'-e -^ " ' dx={n\' 27r)-wJ x'e-^~''~''^ dx — X o r + ^ -:^('—"lY The change in the lower limit in the second integral from — 3c to zero arises simply from the fact that the logarithm of zero equals minus infinity and the point — » is thus by the transformation moved up to zero. By a straightforward transformation (see appendix) we may write the above integral as M,= e e-'''"dt = Ne f^ \ 27r Changing from moments to semi-variants by means of the well-known relations AO — Mo \i = Mi:Mo l2 - (M 2M0 - M,-) :Mo' *Thelaw of errors. leadiiiK to the geometric mean as the most probable value of the vanate as discovered by Sir Donald McAHster in 1879 mav, however, be considered as a toreruiu.er ot Jiirgeiisen's work. 238 TRANSFORMED FREQUENCY CURVES [125 X3 = (M3Mo=- 3M2MiMo+2Mi^):Mo' we have Ai — e X, = e-"'+-"'\e"'-l) These equations give the semi-invariants expressed in terms of m and n. On the other hand if we know the semi-invariants from statistical data or are able to determine these semi-in- variants by a priori reasoning we may find the parameters m and n. 125. The Mathematical Zero. — A point which we must bear in mind is that the above semi-invariants on account of of the transformation are calculated around a zero point which corresponds to a fixed lower limit of the observations. Very often the observations themselves indicate such a lower limit beyond which the frequencies of the variate vanish In the case of persons engaged in factory wj.'k t'i3.'3 is in most countries a well-defined legal age limit below which it is illegal to employ persons for work. Another example is of erad in the number of alpha particles radiated from certain radioa3tive metals. Since the number of particles radiated in a certain interval of time must either be zero or a whole positive number it is evident that —1 must be the lower limit because we can have no negative radiations. Analogous limits exist in the age limit for divorces and in the amount of moneys assessed in the way of income tax. The lower limit allows, however, of a more exact mathe- matical determination by means of the following simple con- siderations. It is evident that this lower limit must fall below 126] MATHEMATICAL ZERO 239 the mean value of the frequency curve. Let us suppose that it is located at a certain point, a, at a distance of yj units from the mean M = /.i{x) — y; = a; and let us furthermore as a be- ginning place the origin at Xi(x), in which case Xi of course equals zero. By shifting the origin to a, which implies a translation of >7 units in negative direction, the original variate (x) is transformed into x + y;^ and Xi will now equal rj while the semi-invariants of higher order remain the same as before the transformation because of the well known relation Xr(a: — >:) = XrC^) for r >1 We may therefore write the previously given relations between the X's and m and n as follows: X2 = yi-(e"'-l) or e"' = l+Xo:-,72 which reduces to /^syj^ — S/^2-yj- — /i'^^ = 0. The solution of this cubic equation which has one real and two imaginary roots gives us the value of >: or Xi — a and thus determines the mathematical zero or lower limit. We have in fact: n^ = log( 1 + X-i :y;~) and w = log -<■ — 1.5«-, while 126. Logarithmically Transformed Frequency Series. — We have already shown that the generalized frequency curve could be written as F{x)=c,o(x) and all its derivatives are supposed to vanish for x=0 and a: = oo the first term to the right becomes zero and By successive integrations we then obtain the following recursion formula 126] LOGARITHMIC TRANSFORMATION 241 o o o O O Or finally o Expanding e^'^ in a power series we have ■V ^TT o o The general term in this expansion is of the form V^2- r! J ' o ' x'^e "'■ " -• dx which according to the formulas given on page (237) reduces to: (_^)-'e'"''"+'^+'^"'^'-+'^V:r! Hence we may write r = 00 / e^'^^J,x)dx = {-o)y y'jB ' ^ '^'- ' ^ ' co'':r! o r =0 Consequently the relation between the semi-invariants and the frequency function Fix) =kofo{x) -^\4^x(x) +^'*2(x) --%3(x) + . . . . 242 TRANSFORMED FREQUENCY CURVES [126 can be expressed by the following recursion formula -If —2!" "31 , Sioj 820;- S?a) V =0 s = o r = o The constants k are here expressed in terms of the unadjusted moments or power sums, s. It is readily seen that the Sheppard corrections for adjusted moments, M, also apply in this case. We are, therefore, able to write down the values of the k's from the above recursion formula in the following manner M4 = A:4e'"+^'^"' +4^:36-'"+-"' +6A:2e^'"+*-^"' +4ii:ie'*'"+^'^' It is easy to see that it is not possible to determine the gener- ating function's parameters m and n from the observations. These parameters like M and rr in the case of .the Laplacean normal probability curve must be chosen arbitrarily. If m and n are selected so as to make A;i and kz vanish we have M2 = A;oe=''"+*-^"' the solution of which gives ^n2 ^ M.M, ,^ _ _M7_ ^ Mo'M2 while 127] PARAMETERS DETERMINED BY LEAST SQUARES 243 This theory requires the computation of a set of tables of the generating function 1 irlog x—mT^ and its derivatives. For ^\\){x) itself we may of course use the ordinary tables for the normal curve -,>-+2X,'' = 12.4981 In order to determine the mathematical zero or the origin we have to solve the following cubic: /'^i'/:'* — o'/.-rc = X2'\ or 12.489-/:' - 152.511-/:- = 362.47 the positive root of which is equal to 12.39. The zero point is therefore found to be situated 12.39 5-year units from the mean or at age 67.95 + 5(12.39), i. e. very nearly at age 130, 246 TRANSFORMED FREQUENCY CURVES [129 which we henceforth shall select as the origin of the co-ordinate system of the first component. We have furthermore 12.39 = e"'^^-^"', and 1.1296 = e~"''--"'\e''' -1) =(12.39)-^(e"'-l), the solution of which gives rz- = 0.04436, ?i = 0.2106, m = 2.4504, all on the basis of a 5-year interval as unit. If we wish to change to a single calendar year unit we must add the natural logarithm of 5, or 1.6094, to the above value of m, which gives us 7;; =4.0598, while n remains the same. The above computa- tions furnish us with the necessary material for the logarithmic transformation of the variate x which now may be written as 2 = [log (130 -x)- 4.0598] :0.2106, where x is the original variate or the age at death. Having thus accomplished the logarithmic transformation we may henceforth write the generating function as 1 lr log(130-x)-4.05 9812 -i ^ (^\ — ^L 0.2106 J /^\ -.-^^-^ .2106V 27r ^ V27r We express now F{x) by the following equation. Fix) =ko'io{x) -\-k:i^:i(x) -\-h^i{x) -{- .... or in terms of the transformed z: 3 1.928 .0621 .0859 .3412 452 18 3 47S 4 1.873 .0690 .0656 .3965 508 14 4 526 45 1.822 .0757 .0442 .4474 557 9 4 570 6 1.762 .0845 -.0156 ..3060 622 + 3 5 630 7 1.704 .0934 + .01.34 ..3596 687 - 3 6 690 8 1.W7 .1028 .0487 .6082 7.38 10 6 7.54 9 1.589 .1129 .0853 .6419 832 18 6 820 50 1.529 .1239 .12.55 .()893 913 27 7 893 1 1.471 .1352 .1599 .7132 994 34 967 2 1.409 .1479 .2114 .7349 1,089 45 1.051 3 1.348 .1609 .2565 .7430 1,185 54 1,138 4 1.286 .1745 .3022 .7307 1,288 63 1,231 55 1.224 .1886 .3467 .7062 1,391 74 1,324 6 1.160 .2035 .3907 .6642 1.501 83 1,425 7 1.095 .2190 .4320 .6037 1,612 92 6 1,526 8 1.030 .2347 .4688 .5257 1,730 99 5 1,636 9 0.963 .2509 .5008 .4180 1,847 106 4 1,745 60 0.896 .2672 .5257 .2911 1,965 112 3 1,856 1 0.828 .2832 .5426 .1831 2,083 115 2 1,970 2 0.758 .2994 .5489 -.03.30 2,201 lit) + 2,085 3 0.689 .3146 ..3474 + .1187 2,318 116 - 1 2,201 4 0.617 .3298 .5329 .2839 2,428 113 3 2,312 248 TRANSFORMED FREQUENCY CURVES [129 (1) (2) (3) (4) (5) Agtv r (^oU) (fai-) ^4(Z) 65 0.543 .3443 .5056 .4537 () 0.470 .3572 .4666 .6156 7 0.396 .3689 .4152 .7686 8 0.319 .3792 .3505 .9098 9 0.243 .3873 .2768 1 .02()2 70 0.164 .3937 .1918 1.117t) 1 .084 .3975 .0999 1.17.")7 2 +0.000 .3989 + .0119 1.1968 3 -0.080 .3976 -.0952 1.1777 4 0.164 .3937 .1918 1.1176 75 0.249 .3868 .2829 1.0180 () 0.348 .3755 .3762 .8592 7 0.425 .3645 .4368 .7043 8 0.516 .3493 .4912 .5146 9 0.608 .3316 .5303 .3069 80 0.702 .3118 .5502 + .0892 1 0.798 .2902 .5473 -.1204 2 0.896 .2672 .5257 .3130 3 0.996 .2436 .48,59 .4380 4 1.098 .2185 .4302 .5899 85 1.203 .1934 .3614 .6943 6 1.309 .1694 .2854 .7358 7 1.418 .1460 .2048 .7340 8 1.529 .1240 .1255 .(5893 <) 1.644 .1034 -.0505 .6106 !K) 1.7(>2 .0845 + .01.56 .5060 I 1 .882 .0679 .0693 .3874 2 2.004 .0536 .1090 .2663 3 2.132 .0397 .1380 .1485 4 2.260 .0310 .1478 -.0483 95 2.393 .0227 .1477 + .0325 () 2..-)30 .0163 .1399 .0905 7 2.()73 .0100 .1207 .1295 8 2.S21 .0074 .1044 .1386 9 2.9»).S .0050 .0842 .1353 100 -3.124 .0028 .0640 .1203 (6) (7) (8) (9) A-o(3) k3{4) M5) Fi(z) 2,532 107 5 2,420 2,627 99 6 2,522 2,716 88 8 2,620 2,789 74 9 2,706 2.848 59 10 2,779 2,900 41 11 2,848 2,929 21 12 2,896 2,937 - 3 12 2,922 2,929 + 20 12 2,937 2,900 41 11 2,930 2,848 60 10 2,898 2,767 80 9 2,848 2,686 93 7 2,772 2,569 104 5 2,668 2,444 112 3 2,553 2,296 117 - 1 2,412 2,134 116 + 1 2,251 1,965 112 3 2,080 1,788 103 4 1,895 1,612 91 9 1J09 1,420 77 7 1,504 1,244 60 7 1,311 1,075 43 7 1,122 913 27 7 947 758 +11 6 775 622 - 3 5 624 500 15 4 489 394 23 3 374 292 29 1 264 228 31+0 197 167 31 - 135 120 30 1 89 76 26 1 49 54 22 1 29 37 18 1 18 21 14 1 6 Since the original observations of d^ are given in 5-year age intervals it becomes necessary to sum the numerical values of ipi,{z) and its derivatives by quinquennial age groupings so as to form the required observation equations. We find thus for instance in the age interval 55-59 the following observation equation (the summation to take place from x = 55 to x = 59) hl40 5,464 90-94 .2729 .4.571 -1.3925 1,757 95-99 .0609 .5714 .6125 278 100- .0068 .1714 .3890 16 From the above table we notice that we have 18 observation equations from which to determine the three unknown para- meters ko, h and ki. The number of equations being greater than the number of unknowns we make use of the method of least squares. While a direct application of this principle of course is feasible, it will, however, be found easier to start with an approximate solution for kn, ki and A-) and then apply the method of least squares. It will be found that in the three age intervals 6L-69, 70-74 and 75-79 where the observations are most numerous the observations will be approximately satisfied by the following preliminary values of k, viz. : A:io = 7300, A;'3= -340 and kU= -50. Multiplying the above values of k with their respective columns in Table I, or in other words forming the products A:Vo, kh

:' = 145.200, therealrootof which is r = 19.0 (on basis of a 5 year unit.) 254 TRANSFORMED FREQUENCY CURVES [129 We furthermore find that ?? =0.120 and w= 2.9227+%, 5 = 4.5321, which finally brings about the transformation of the variate x by means of the formula z = [loge(a:+68.8) -4.532] :0.12 where x is expressed in unit intervals of 1 year. The further determination of the coefficients ^n, h and k^ by means of the method of least squares results in the values: fco = 947.4, A:;, = -63.4 and A:4=-30.0. Multiplying these values with their respective values oifu(z), oo / \ / / l>*oo y r \ I zoo / \ / \ looo J 7 \ > ^ ) — -_ -^ \ — - ■--_ L/ V, "^T '~"~~~ — - I \ s^ ^ \5 2o as 3o 3S >io .^s So ss <&o <&s 7o 75 ao ss -30 05 loo ,«s.. FIGURE 3 Diagram showing Rraduation of <\s column in tiie .1 .l/('') table by a compound frequency curve of the (irain-Cluirlicr ty[K'.s. The sum of h\[x) and Fuix) as shown on page 255 and also in the figure gives us the final compound frequency curve or the d^ curve, from which it now is a simple matter to form the l^ or irfj column and its co-ordinated column of r/^.. Graduation of Ameiiican Male Mortality Table (AM(5)) by means OF a Compound Frequency Curve Age FKz) Fii(x) dx h lOOOgr 15 21 302 323 100532 3.21 6 26 319 345 100209 3.44 7 30 332 362 998G-4 3.62 129] MORTALITY TABLES 255 Age F/d) Fri{x) (is z; lOOOg^ 8 35 342 377 99502 3.79 9 39 350 389 99125 3.92 20 44 354 398 98736 4.03 1 48 354 402 98338 4.11 2- 57 352 409 97936 4.18 3 60 349 409 97527 4.19 4 68 343 411 97118 4.23 25 75 336 411 96707 4.25 16 86 327 412 96296 4.28 7 95 317 413 95884 4.31 8 106 308 414 95471 4.33 9 117 297 414 95057 4.36 30 130 285 415 94643 4.38 1 144 275 419 94228 4.45 2 IGO 261 421 93809 4.49 3 179 249 428 93388 4.58 4 198 238 436 92960 4.69 35 220 226 446 92524 4.82 6 242 213 455 92078 4.94 7 2G8 201 469 91633 5.12 8 296 188 484 91154 5.31 9 323 175 501 90070 5.53 40 360 164 524 90169 5.81 1 396 152 548 89645 6.11 2 436 141 577 S9097 6.47 3 478 128 606 8S520 6.80 4 526 117 643 87914 7.32 45 570 107 677 87271 7.76 6 630 96 726 86594 8.39 7 690 87 777 85868 9.05 8 754 78 ■ 832 85091 9.7S 9 820 69 889 84259 10.55 50 893 61 954 83370 11.44 1 967 53 1020 82416 12.37 2 lor.i 47 1098 81396 13.49 3 1138 41 1179 80298 14.6S 4 1231 36 1267 79119 16.01 55 1324 30 1354 77852 17.39 6 1425 25 1450 76498 18.95 7 1526 52 1548 75048 20.62 8 1636 18 1654 73500 22.41 9 1745 15 1760 71846 24.54 60 1856 12 1868 70086 26.65 1 1970 11 1981 682 IS 29.04 2 2085 9 2092 66237 31.59 3 2201 7 2208 64145 34.42 4 2312 6 2318 61937 37.41 65 2420 5 2425 59619 40.67 256 TRANSFORMED FREQUENCY CURVES fl29 Age F/{i) Fii{x) dT Ix lOOOgi 66 2522 4 2526 57194 43.62 7 2020 3 2623 54668 47.97 8 2700 2 2708 52045 52.02 9 2779 2 2781 49337 56.37 70 2848 1 2849 46556 61.19 1 2896 2896 43707 66.25 2 2922 2922 40811 71.60 3 2937 2937 37889 77.51 4 2930 2930 [34952 83.93 75 2898 2898 32022 90.51 6 2848 2848 29124 97.80 7 2772 2772 26276 105.48 8 2662 2662 23504 113.53 9 2553 2553 20836 122.50 80 2412 2412 18283 131.95 1 2251 2251 15871 141.84 2 2080 2080 13620 152.76 3 1895 1895 11540 164.21 4 1709 1709 9645 177.19 85 1504 1504 7936 189.52 6 1311 1311 6432 203.82 7 1125 1125 5121 219.68 8 947 947 3996 236.99 9 775 775 3049 254.18 90 624 624 2274 274.41 1 489 489 1650 296.36 2 374 374 1161 322.14 3 264 264 787 335.45 4 197 197 523 376.67 95 135 135 326 414.11 6 89 89 191 465.97 7 49 49 102 480.39 8 29 29 53 547.16 9 18 18 24 780.00 100 6 6 6 1000.00 It will be of interest to compare these latter values with the original values of q^ as derived by Mr. Henderson's graduation. Such a comparison is shown in the appended table for quin- quennial ages. Ages lleudorson's gx Fisher's g-i 15 3.46 3.21 20 3.92 4.03 25 4.31 4.25 30 4.46 4.38 35 4.78 4.82 130] ADDITIONAL EXAMPLES 257 Ages Henderson's qx Fisher's gx 40 5.84 5.81 45 7.94 7.76 50 11.58 11.44 55 17.47 ■ 17.39 60 26.68 26.65 65 40.66 40.67 70 61.47 61.19 75 91.94 90.51 80 135.74 131.95 85 197.07 189.52 90 280.35 274.41 95 387.76 414.11 100 562.50 1000.00 I think that every unbiased critic will admit that there exists a satisfactory agreement between the two tables in spite of the fact that we have worked throughout with basic data in 5-year age groups. Moreover, the actual arithmetical work in the case of the gi'aduation by means of compounded Gram or Charlier curves is much simpler than the usual methods of graduation by Makeham's formula and mechanical inteipola- tion formulas as employed by Mr. Henderson.* Another point speaking in favor of the frequency curve graduation is that our resulting functions are continuous functions for which standard tables of definite integrals have been prepared. It is therefore possible to use the elegant and continuous method originally introduced by Mr. Woolhouse in the computation of premiums and policy values. Unfortunately this is not the place to treat this interesting phase of the question, although we may in pass- ing it mention that a gi-aduation of the kind as here presented in practical computations of policy values and premiums is even easier to work with than the renowned graduation formula by Makeham, especially in the case of life contingencies involv- ing 2 or more lives. 130. Additional Examples. — As another illustration I pre- sent the following frequency distribution (arranged in groups of 3-year intervals) of the ages of a group of 19,274 male em- ployees of the Bell System of the American Telephone and *I do not wish to imply those remarks as a criticism of the able graduation by Henderson, however. 258 TRANSFORMED FREQUENCY CURVES [130 Telegraph Company, which most kindly has been furnished to me through the courtesy of this company. Age Distribution of Male Employees in the Bell System Ages X F(x) Ages X nx) 13-15 1 46-48 11 380 16-18 1 9 49-51 12 272 19-21 2 745 52-54 13 186 22-24 3 2,264 55-57 14 141 25-27 4 3,828 58-60 15 110 28-30 5 3,801 61-63 16 72 31-33 6 2,711 64-66 17 43 34-36 7 1,918 67-69 18 17 37-39 8 1,339 70-72 19 14 40-42 9 884 73-75 20 3 43^5 10 533 76-78 21 2 Choosing the provisional lower limit at age 14 we find the following values for the crude moments or power sums s. So = 19,274, 81 = 112,363, s. =794,771, §3=6,790,761 The values of the semi-invariants are Xi = 5.830, Xo = 7.2478, X3=27.4191. • The resulting cubic expansion is therefore 27.419>7''-157.592>72= 380.731 for which the solution is >7 = 6.1185. . We have furthermore 6.1185 =6™'^ ■•"'"' 7.2478 = e-'"'-'"'(e"'-l), or W' = 0.1768, w = 0.4205 and ?w = 1.5462 On the basis of an interval of one year we have therefore: z = [log(x) - 13.1) -2.6451 :0.421* as the value of the variate in the generating function (^0(2;). We have Tn= 1.54G2 + log 3 = 2.645. 130] ADDITIONAL EXAMPLES 259 The values of ko, h and ki as determined by the method of least squares are A:o = 3064.4, k3=A5.1, A;4 = 80.5, on basis of one year interval. A comparison between the calculated and observed values (the latter being shown by single ages) is given in the attached diagram, which evidently is satisfactory for all practical pur- poses. I wish here to mention that an attempt by some of the FIGURE 4 Diagram showing comparison between observed and theoretical frequency distribution of active group of male employees of the Bell System. statistical assistants of the A. T. & T. Co. to fit the above data by means of the Pearsonian curves proved futile. Personally I have not as yet made an attempt to verify this negative result. As a final illustration we quote from J0rgensen's monograph an application of the logarithmic transformaion of the pre- viously discussed observations on the number of petal flowers in Ranunculus Bulbosus. Since the variate in this instance is integral, the obsei-\' ations themselves clearly indicate that there must be a lower limit, or biological zero so to speak, at 4 petal flowers. The crude moments are then 5o = 222, 5i = 362, 5-2 = 794, from which we obtain 260 TRANSFORMED FREQUENCY CURVES [130 m =0.0440 n =0.5445 ko= 183.2, so that the formula reads 1 SQ O 1 Hog (X— 4)— .044-12 ^(^)._l^,-2i 0.5445 J .5445 V 27r The detailed calculations according to this formula are shown below : (1) (2) (3) (4) (5) lognat(x-4) log(x-4)-m (2):n 8 I X3C03 . soe^ "' " " "=.^(0) + '/(co)9 = pe'^[s-DL/(a;) -3 D:,/(co) -3 DiJ{<^) - DtKo^)] where d^lV. ' 2! ' 3! Letting co = we have therefore successively T^nrr \ rf"|^CO Xoco=^ ?..C0' 'uq = p{S-\i) 'f~2q = V{s-'h\-'K2) \zq ^p{s- Ai - 2X2 - A3) A49 = pis - Xi - 3X2 - 3X3 - X4) 131] THE BERNOULLIAN SERIES 263 or 11 = sp X2 = spq = (t2 'k3=spq{q-p) >v4=spg(l— G pq) The generating function [x) and C:nf3{x) As an illustration of the above formulas we shall now try to express a few Bernoullian point binomials by means of a Gram-Charlier series. Let us for instance try to express (.05 + . 95)'"" by a Gram- Charlier series. We have in this case the following values for the parameters Xi = 5.0, VXo = a=2.1795, 6-3= -0.0688, C4 = 0.00625 The substitution of these values in the Gram-Charlier series results in the following relative frequency distribution: X ^(.r) X Fix) .0084 8 .0614 1 .0312 9 .0343 2 .0763 10 .0179 3 .1356 11 .0081 4 .1812 12 .0031 5 .1865 13 .()()()9 6 .1522 14 .0003 7 .1028 15 .0001 A similar calculation in the case of the Bernoullian binomic^i r0.1+0.9;'"» gives X, = 10, a = 3, C3 = - .0445, c^ = 0.0021 with the following distribution: X Fix) X Fix) .0000 12 .0984 1 .0004 13 .0732 2 .0020 14 .0502 3 .0065 15 .0322 4 .0162 16 .0194 5 .03:i3 17 .0109 6 .0581 18 .0058 7 .0,S75 19 .0027 8 .1145 20 .0012 9 .1318 21 .0005 10 .1338 22 .0002 11 .1211 23 .OOOl 132; poisson's exponential 265 We shall presently have occasion to compare these distribu- tions with those obtained from a direct expansion of the point binomial. 132. Poisson's Exponential. The Law of Small Num- bers. — In certain statistical series it frequently happens that the semi-invariants of higher order than zero all are equal, or that Ai = Ao = A3 = . . • . . = A^ =^ A. We shall for the present limit our discussion to homograde series where the variate is always positive and integral, and where therefore the definition of the semi-invariants is of the form:. = ^(0)eO-+^.(l)ei-+^.(2)e--+^(3)e^''^+ . . . , or e^' "' '^' =e-V'''" = iV(a:)e""for x = 0, 1,2,3 ... , which also can be written as (Xp^ A2^2w \ l+-Y7+^r+ ; =f(0)l+v^(l)e'^+c(2)/'^+ The coefficient of e'"'^ gives the relative frequency or the probability for the occurrence of x = r, and we henceforth find that (r(x)=xp(r)= j— This is the famous Poisson Exponential, so called after the French mathematician, Poisson, who first derived this expres- sion in his Recherches sur la Prohabilite des jugesments, but in an entirely different manner than the one we have indicated above. The Poisson Exponential opens now a way to the treatment of the point binomial in the exceptional cases where the product sp (or sq) is small even when s is a very large number, or when more strictly speaking the expression lim sp = X where ?» is a finite number. 266 FREQUENCY CURVES AND BERNOULLI AN SERIES [132 Under such conditions p (or q) must approach zero and its complementary probability q (or p) must approach unity as their limiting values. The expressions for the semi-invariants as given in paragraph 131, i. e. ^2 = spq ^3 = spq(q-p) ?.i=spq{l-6pq) will under these conditions all approach the limit sp, and the general term in the Bernoullian expansion of the point binomial can therefore be expressed by means of the Poisson exponen- tial. In all cases where the semi-invariants of various orders hap- pen to be equal, or very nearly equal, the formula by Poisson will be preferable in place of the more general expansion by the Gram-Charlier series. As an illustration we may select the simple binomial (.001 + . .999) ""^ where the semi-invariants have the following values: ?.i=0.1, Xo = 0.0999, As = 0.099702, X, = 0.0994006, and therefore may be considered as being nearly equal. The general term, (f{x), in this particular point binomial can therefore be written as a Poisson exponential of the form: ,^(x)=,/.(r)=e-o-^0.1^:r! for r = 0, 1,2,3 .... The Russian statistician, Bortkewitsch, has given in his in- teresting and scholarly brochure Das Gesetz der kleinen Zahlen (1898) a four decimal place table of the Poisson exponential e~^ /l'':r ! for values of X from 0,1 to 10.0. The English biometri- cian, Soper, in 1914 published a 6 decimal place table from X = 0.1 to z*^ = 15.0. This table is found in Pearson's well-known Tables for Biometricians. For the above mentioned Bernoul- lian point binomial (0.0014-0.999)^'"', corresponding to the Poisson exponential e""" 'O.l'ir!, we find from Soper's table the following values of \l/{r). 132] poisson's exponential 267 r xjjir) .904837 1 .090484 2 .004524 3 .000151 4 .000004 While the exponential of Poisson requires theoretically at least, that the semi-invariants must all be of the same magni- tude, it will, however, often be found that this exponential will give a fair approximation to the true observed values of the frequency curve in cases where the semi-invariants Xi, X2, X3, X4 . . .do not differ greatly from each other. In this connec- tion it is of interest to compare the fits of Poisson's exponential and the Gram-Charlier series with the true values in the binom- ial expansion in the three examples we have given above. Through the courteous efforts of my translator and co-editor. Miss C. Dickson, the three point binomials (0.001+0.999)'°'', (0.05+0.95)1°'' and (0.10+0.90)""' have been expanded directly and the results as compared with the forms of Poisson and of Gram-Charlier are shown in the following tables: Values of .i=3.88; C2 = H[>.2-w] = -0.125 The equation for the frequency distribution of the total N = 2608 elements therefore becomes F(x)=Ml//3.88(x) + ( -0.125) A-1//3.88(X)]. The table below gives the values as fitted to the curve, F{x) : Alpha Particles Discharged from Film of Polonium (Rutherford and Geiger) A' =2608, m = 3.88, C2= -0.125 (1) (2) (3) (4) (5) (0) X -^Cx) A2'i(x) .VX(2) iVx(3)Xc3 (4) + (5) .020668 +.020668 .53.9 - 6.7 47 1 .080156 +.038820 209.0 -12.7 196 2 . 1 5.5455 +.015S11 405.4 - 5.2 400 3 .201015 -.029739 524.2 + 9.7 533 4 .194967 -.051608 508-5 + 16.8 525 5 .151265 - .037()54 .394.5 + 12.3 407 6 .097850 -.009714 254.9 + 3.2 258 7 .0.^4249 + .009814 141.2 - 3.2 138 8 .026316 +.015668 68.7 - 5.1 64 9 .011351 +.012968 29.6 - 4.2 25 10 .004407 +.00,S021 11.5 - 2.6 9 11 .001555 + .004092 4.1 - 1.2 3 12 .000503 +.001S00 1.3 - 0.6 1 13 .000150 +.000699 0.4 - 0.2 14 .000042 +.000245 0.1 - 0.1 15 .000010 + .000076 0.0 - 0.0 16 .000003 + .000025 17 .000001 +.000005 274 POISSON-CHARLIER FREQUENCY CURVES 136 Bateman has in Philosophical Transactions (1902) given a theoretical frequency distribution of the above series of observa- tions wherein he develops the Poisson probability function, being ignorant of the previous demonstration by Poisson. In a later note he mentions that the formula was given by the French mathematician in his work on probabilities, published in 1837. Bateman's calculation includes, however, only the first term of the Poisson-Charlier series and is, therefore not so close as the above fit. As a second example we offer our old friend, the distribution of fiower petals in Ranunculus Bulbosiis. Selecting the zero point at a: = 5 and computing the semi-invariants in the usual manner we obtain the following equation for the frequency curve. F{x) =222i//(x)+31.5A-i//(a;), ?w =0.631 A comparison between calculated and observed values fol- lows below: X F(x) Obs. 5 134.9 133 6 51.6 55 7 22.5 23 8 9.5 7 9 2.9 2 [0 0.6 2 136. Transformation of the Variate. — For integral vari- ates we have shown that the Poisson frequency curve possesses the important property that all its semi-invariants are equal. Now while a frequency distribution of a certain integral variate, X, may perhaps 7wt possess this property, it may, however, very well happen after a suitable linear transformation has been made, that the variate thus transformed will be subject to the laws of Poisson's function. Let z = ax — b represent the linear transformation which is subject to the above laws with a series of semi-invariants all equal to m. These semi-invariants according to the properties set forth in paragraph (104) are therefore 136] TRANSFORMATION OF THE VARIATE 275 m = Xi (2) = aXi (x) — 6 m = X3(z) ^a^Xiix) and our problem is to find the unknown parameters a, b and m. Simple algebraic methods, which it will not be necessary to dwell upon, give the following results: a = Xo :?.s As a numerical illustration of this transformation we choose from J0rgensen a series of observations by Davenport on the frequency distribution of glands in the right foreleg of 2000 female swine. No. of Glands 0123456789 10 Frequency . . 15 209 365 482 414 277 134 72 22 8 2 The values of the three first semi-invariants are Xi =8.501, P.o = 2.825, X3 =2.417, a = 2.825:2.417 = 1.168 m = 2.825-^:2.417- = 3.859 b = (1.168) (3.501) -3.859 =0.230. The new variable then becomes z=ax — b and the transformed Poisson probability function takes on the form: ... e-"'m' In general, however, we will find that z is not a whole number and the expression z\ therefore has no meaning from the point of view of factorials at least. This difficulty may, however, be overcome through the introduction of the well-known Gamma Function, 1X2; + 1), which holds true for any positive 276 POISSON-CHARLIER FREQUENCY CURVES [137 or negative real value of z and which in the case of integral values of z reduces to V{z^\)=z\ Hence we can write the transformed Poisson probability function as xfj{z) r(2+i) Tables to 7 decimal places of the Gamma Function, or rather for the expression —log Fl^ + l), have been computed by J0r- gensen in his aforementioned book from z= —3 to z = 15, pro- gressing by intervals of 0.01. By means of this table and the tables of ordinary logarithms it is now easy to find the values of i/>(z) in the case of the example relating to the number of glands in female swine. The detailed computation is shown below.* (1) (2) (■■i) (4) (o) (6) 7 X z — AogViz+l) log m^ (3) + (4)+log while ssful wTitcr of et^onomic sul)j(H't s, posing as a critic of such intellectual giants in the realm of mathematical science as Bernoulli, Laplace and Poisson (which of course presupposes that he must have read vi^ry carefully the various writings of those old masters) ; who calmly and in the most innocent manner admits that of the "labori- ous" mathematics involved in this question he is only acquainted with a rather clumsy demonstration publislicd in 180(). While I have not the slightest doubt as to the veracity of these facts so far as Mr. Keynos is concerned, it will be of interest to see what the actual historical facts are. Now in so far as the measurement of the assymmetry or skewness of the BernouUian point binomial is concerned this w'as already performed by Laplace himself, an ac(!omplishment which in itself creates a degree of doubt in the reader's mind as to whether Mr. Keynes really has stu(li(>d Laplace with the necessary care required of one who poses as a critic of the great Freneluiuiu. Ilnrahl Westergaard, the 138] REMARKS ON MR. KEYNES' CRITICISM 279 eminent Danish scholar, whose fame as a statistician surely rests on a far more secure foundation than that of Keynes takes in his Slatislikens Teori i Grundrids (Copenhagen, 1915), special pains to j)oint out that Laplace was the first to give a mathematical measure of the sktewness in a Ber- noullian frequency distribution. The Danish actuary. Gram, long before the intellect of our recent English critic saw the light, derived his general series for frequency cm'ves, which of course also applies to the Bernoullian case. Thiele in his Al- mindelig lagttagclseslaerc, at a time when the young Keynes probably was being piloted by his nursemaid or governess, discussed the same series from the point of view of semi-invariants. Later on Charlier continued in dii'ect line from where Laplace and Poisson concluded their labours. The necessary corrections to the generating functions, whether these be Laplace's or Poisson's probabilitj^ curves, as derived by these Scandi- navian writers are given in Chapters XVII and XVIII of this treatise. Not a single one of these demonstrations requires the "laborious" mathe- matics as mentioned by Mr. Keynes. The fact that our romancing English economist evidently is in blissful ignorance of the fundamental w^ork by the Scandinavian school is, however, no excuse for his superfi- cial knowledge of the expansion of statistical series, since much of this work has appeared ^n English. Mr. Keynes' misconception of the real significance of the Law of Small Numbers and his criticism of Bortkewicz may possibly also be traced to his apparent ignorance of the work of the Scandinavian authors. His criticism moves practically along the same lines as that of the \iews held by Miss Whitaker and described on page 270 of this treatise. He, like Miss Whitaker, fails to realize that the generating Poisson probability function arises from the more general fact that all its semi-invariants are equal, rather than from the more special fact that one of the limiting values of the point binomial reduces to a Poisson frequency curve. The very fact that Mr. Keynes in his large volume never mentions the semi-invari- ants leads one to inquire whether those important statistical parameters remain a closed book to him. * ♦Similar immature views as those lield by Keynes and Miss Whitaker are also ex- pressed by Mr. A. Mowbray in the May. 1920, Proceedings of the Casualty Actuarial Societn ff America (page 107). .ludging from iiis tedious and laborious analysis Mow- bray evidently never has heard of the semi-invariants. 280 ABRIDGED TABLE OF LAPLACE'S z fo{z) fziz) fiiz) foiz) /'(2/, 2), 1 - z 2/(1 - ?/) [1 - z(l - 2/)P [1 - 2(1 - yW z - (1 - y) 1 -y 1 - 2(1 - y) The transformation in a double integral implies in general three parts (1) the expression of ^(2/12/2) in terms of y, z(2) the determination of the new system of limits (3) substitution of dyidy^. The solution of the third part we just gave above. The solution of the two first is purely algebraically. The first part is a straightforward simple problem which should present no difficulty V V' =P c lu li' Hz V=a ' — ^- a ji 1 - Q 1 1 SeeGouus.\T: " Mathematical Analysis" (New York, 1904) pages 266-67. ADDENDA. 285 whatsoever to the student and which in conjunction with (3) brings the in- tegrands on the form given in formula (V). The easiest waj' to determine the new system of Hmits is probabl}- by con- structing the contour in the new field of integration. The hyperbolas yoji = a and ijiiji = /3 are in the new field of integration changed into the two straight lines y = a and y = which determine the limits for the variable y. A mere inspection of the expressions for OSf,(ra) S 1 and y> , Mr a) = 1 for v = 1, 2, 3, . . . s. r = — CO Consider now for the moment the following expression 7* = OO FrU) = ^j a fAray"^' where i = V^^ The coefficient of e''°-^^ in the sum is evidently the probability for the occur- rence of an error ra from the error source Qr. The probability of the occurrence of an error ra from another error source, Qu, may similarly be expressed as Fu(a:) = ^, afuira) e'"^^ and so on for all the s independent error sources, which we assumed to be operative on our statistical object. The probability that the resulting sum from the various combinations into which the elementarj errors from the s sources may enter is found by forming the product !■ = S * (co) = II Fv{^) = /i'lM F^{^) F^io,) . . . F,i^) v= 1 in accordance with the multiplication theorem of mathematical probabilities Writing the above products as * (co) = a [(p (0) + (p (a) e''"' + (p (2a) e^acoi _^ ^ (3^) ^,3aa;i ^ + «--^<^'^' + . . .] we notice that the coefficient, a(p(ja), of c'"'^"' is the probability that the 288 ADDENDA. elementary errors from the s error sources will enter into such a combination that their sum will fall between ra — 3^a and ra + J^a. .Multii)lyiiifj: on both sides of the above equation by c""'""^'^' (considering aoi as tile incle[)endent variable) and integrating with respect to aco between the limits — x and + tc we find that a(p (ra) = / * (co) e-'"'^"^d (aco). In the above integral aco is the independent variable. If co is chosen as the independent variable, we have a(p {ra) = ~- \ * (co) e"'"'^'^*' f/co. 2x,7 — x/a If now we let ra = x and let a converge towards zero, in which case a = clx, we evidently find that dx r + =° (P (x) dx = I * (co) e-"-""' do: 2-K J _ oo is the infinitely .small probability that the sum of the elementary errors from the s sources will fall in the infinitely small interval x — Hfix and X + I4dx. It is evident that by introducing a new function i^ (co), defined by the rela- tion #(co) = \/2x'A(w), the above equation reduces to (5b) on page 195 if we let If we now let [/ - n{r + 1)] = u, we have dt = du, and the last expression reduces to V2^ J e - t/a = A^e e 2 , since the + latter integral / ^ 2 ,_ J e " f^M = V2ic Now in the press Ready for distribution about August 1, 1922. An Elementary Treatise on Frequency Curves And their Application to the Construction of Mortality Tables By ARXE FISHER English translation by E. A. VIGFUSSON With an Introduction by Professor RAYMOND PEARL Department of Biometry and Vital Statistics of the Johns Hopiiins University (Pp. 225 +XV) This book falls into two parts of which the first gives an elemen- tary presentation of the theory of frecjiiency finictions along similar lines as those developed by ]Mr. Fisher in his book on Probabilities. The second part, as pointed out in Mr. Vigfusson's preface, con- stitutes an entirely new departure in the analysis of mortality statistics. The author has set himself the difficult task to con- struct a comjilete mortality table from mortuary records by sex, attained age at death and causes of death, but u-ithout knowledge of the exposed to risk at various ages. The accomplishment of this problem has been made possible by means of a biological hypothesis and a proper classification of the causes of death upon biologiccd principles. Once accepted the proposed hypothesis will make it possible to study the laws of human mortality in directions which hitherto have been regarded as impossible. .Mr. Fisher has applied his new method to more than 25 population or occupational groups and gives in this book the detailed results of some of his investiga- tions in the way of 6 complete mortality tal)les for Michigan Males (lf)09-191o), Massachusetts Males (1914-191(5), American Locomotive Engineers (1913-1917), American Coal Miners (1913-1917), Japanese Assured Males (19U-1917) and White Industrial Assured Males of the Metropolitan Lite insurance Co. (1911-1910). As a systematic treatise on frequency curves and tiieir applica- tion to mortality studies this book should prove of great practical ^•alue not only to students of statistical methods, hut to actu- aries, statisticians, health officers, biologists and students of general science as well. Comments of Specialists "Orthodoxy aiul discovery .ire ;is incoinpatihle intellectually as oil and water are physically, a cosmic law often overlooked by our "safe and sane" scientific gentry. This book is an outstanding feature that this law is still in operation It may fairly be regarded as finulameutalhj the most significant advance in actuarial theory since Ilalley It opens out wonderful possibilities of research on the laws of inortality in directions which hitherto have been wholly impossible of attack. The criterion by which the significance of a new technique in any branch of science is evaluated, is just this at tlie degree to which it opens up new fields of research. By this criterion Fisher's work stands in a high and secure position." (Extract from Professor Pearl's Introduction.) "Fishers novel method has injected new blood in the old body of actuarial science." {C. liurrau.) "This new and novel idea meets in reality a very frequent need. It represents a supplement to the former tools of the actuary and makes possible the utilization of a statistical material, which according to the requirements of the older systems was considered as being of no value." (Extract from Forsikringstidendc's report of discussion in the Norwegian Ac- tuarial Society, .lune, 1920.) "Since particularly in industrial statistics, or in general statistical in