key: cord-0277762-noa50kpj authors: Skarda, I.; Asaria, M.; Cookson, R. title: LifeSim: A Lifecourse Dynamic Microsimulation Model of the Millennium Birth Cohort in England date: 2021-02-16 journal: nan DOI: 10.1101/2021.02.12.21251642 sha: 93a9d5ddfde3d8f50794cff7d3d6c4fe979c66e3 doc_id: 277762 cord_uid: noa50kpj We present a novel dynamic microsimulation model that undertakes stochastic transition modelling of a rich set of developmental, economic, social and health outcomes from birth to death for each child in the Millennium Birth Cohort (MCS) in England. The model is implemented in R and draws initial conditions from the MCS by re-sampling a population of 100,000 children born in the year 2000, and simulates long-term outcomes using life-stage specific stochastic equations. Our equations are parameterised using effect estimates from existing studies combined with target outcome levels from up-to-date administrative and survey data. We present our baseline projections and a simple validation check against external data from the British Cohort Study 1970 and Understanding Society survey. In this paper we introduce a forward-looking dynamic childhood policy microsimulation model "LifeSim" which models the co-evolution of many economic, social and health outcomes from birth to death for each child in a general population birth cohort of 100,000 English children born in 2000-1. In addition to modelling the individual outcomes, LifeSim also models the costs and savings to the public budget associated with these outcomes. The version of LifeSim presented in this paper focuses on conduct problems, conduct disorder and cognitive skills as the core childhood outcomes, and hence is most useful for analysing childhood policies that drive these outcomes. However, the model can be easily extended to analyse policies with direct effects on a wide range of other childhood outcomes. Outcomes up to age 14 which include child-specific variables (e.g. the child's sex, cognitive skills, conduct problems) and family-specific variables (e.g. parental income, parental education, parental mental health) are primarily based on data from the Millennium Cohort Study (MCS) . The dynamic year-by-year co-evolution of later life outcomes is modelled using lifestage-specific equations representing stochastic transition pathways. These equations were parameterised using transition pathway estimates from previous peer-reviewed studies based on data representative of the British cohort that we simulate. The structure of our model -a set of causal pathway networks in early childhood, school age, working age and retirement -was designed to align with the large body of theory and knowledge about human capital formation in childhood and later life economic and health outcomes. Policymakers have indicated a strong need for better childhood policy simulation models (Feinstein, Chowdry and Asmussen, 2017; Allen, 2011; Dalziel, Halliday and Segal, 2015) . As a response, LifeSim has several new features that can contribute to more informative childhood policy analysis. More specifically, LifeSim: (i) jointly models the co-evolution of many economic, social and health outcomes, capturing how outcomes in multiple domains interact, compound and cluster over time, emphasising how early-life disadvantages can compound over life creating a spiral of multiple disadvantage; (ii) simulates long-run outcomes for a whole general population cohort of children, not just 1 one specific subpopulation of trial participants, allowing more informative policy analysis, including optimal policy targeting analysis, population-wide distributional impact analysis and assessment of the opportunity costs falling on the individuals not directly affected by the intervention; (iii) simulates individual-level outcomes for each heterogeneous child in the cohort, instead of only producing average-level outcomes, allowing us to produce multidimensional individual wellbeing measures, which have been discussed in the literature and have well-known advantages over unweighted cost-benefit analysis (Adler and Fleurbaey, 2016) ; (iv) simulates outcomes over the whole lifecourse from birth to death, enabling policy analysis to adopt a broad lifetime perspective; (v) is forward-looking and therefore relevant to drawing conclusions about the long-term consequences for cohorts living in the present, rather than historical cohorts born many decades ago that are not as relevant anymore to the current childhood policy context. The price of all these advantages lies in making numerous strong assumptions in order to combine multiple sources of data. We believe this is a price worth paying to provide decision makers with useful policy insights, and that it is better to make such assumptions explicit and subject to scrutiny rather than to leave them implicit. We use longitudinal data on children born in 2000 as our primary data source but supplement this with many other sources of data including more up-to-date cross-sectional administrative and survey data as well as older sources of longitudinal data on children born in earlier decades. In choosing how many assumptions to make and how many sources of data to use, there are trade-offs between internal and external validity. 1 Using a single source of experimental data with long-term follow-up over many decades would maximise internal validity, but is only possible for backward-looking evaluation of policy experiments many decades ago. Using assumptions and multiple sources of data is necessary to achieve external validity for forward-looking economic appraisal of current policy options in the current policy environment. Since our model is designed for the purpose of policy analysis rather than forecasting, the most important criteria for model credibility arguably relate to the quality of the underlying 1 Internal validity relates to claims about cause and effect within the study population, whereas external validity relates to how applicable the findings are to real world policy settings. conceptual framework and data sources rather than ability to predict external data sources or future trends (Kopec et al., 2010) . Nevertheless, we provide a simple comparison of our simulation with external data. First, we provide a comparison with data from the 1970 Birth Cohort Study up to age 46. We find that our simulation is broadly consistent with the external data and substantially divergent when appropriate -for example, our simulation for people born in 2000 has a much lower proportion of people smoking than the 1970 cohort, reflecting the reduction in smoking rates in the UK since the 1970s. Also, our simulation for people born in 2000 has a much larger proportion with young people having obtained a university degree at age 26 than the 1970 cohort at that age, reflecting the massive expansion in university provision in the UK since the 1970s. We also provide a comparison with recent external cross-sectional dataset -Understanding Society (in year 2016). Our simulated earnings outcome replicates reasonably well the sexage specific distributions observed in the Understanding Society data. Also, for our simulated key discrete outcomes -including health-related outcomes and unemployment -the sex specific prevalence trends against age are not too deviant from the trends observed in the Understanding Society data. Any minor discrepancies can be explained by differences in data collection methods for Understanding Society and our target datasets. Finally, we provide an additional check of LifeSim output against the various target datasets that we directly use to calibrate our equations, such as Health Survey for England, and Office for National Statistics datasets. As expected, our simulated outcomes match very well the trends and patterns observed in the target data. Because our model is flexible and can be used together with many data sources, if needed, one can easily substitute our target datasets with alternative datasets, to match the trends and patterns observed in these alternative sources. The rest of the paper proceeds as follows. Section 2 outlines the methods. Section 3 summarises our baseline simulation results. Section 4 provides a simple comparison of our simulation with existing datasets. Section 5 discusses and concludes. Additional details to this material can be found in the supplementary appendices. 3 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 To keep things simple, we focus on a single general population birth cohort of 100,000 children, rather than seeking to model the entire all-age population. We draw heavily on a recent longitudinal survey -the Millennium Cohort Study (MCS) of English children born in 2000-2001 -both to describe the initial characteristics of the cohort and to model most of the childhood outcomes up to age 14. After age 14, we model outcomes using equations consistent with our model structure and a set of key principles to combine quasi-experimental evidence and external sources of target data. The model links together a diverse set of individual-level life outcomes of interest to policymakers (see Figure 1 ). In choosing the model outcomes and formulating the model structure we consulted with experts in childhood development and childhood policy, demography, epidemiology, human capital economics and labour economics (see list of advisory group members in the acknowledgements) and were also guided by inter-disciplinary theory on human capital formation in childhood and how this influences educational attainment, earnings, physical illness, mental illness, mortality and other outcomes with important impacts on individual wellbeing and public cost (Almond, Currie and Duque, 2018; Nelson et al., 2020; Cunha and Heckman, 2010; Adler and Stewart, 2010; O'Donnell, Van Doorslaer and Van Ourti, 2015; Layard et al., 2014; Shonkoff, 2010; Black et al., 2017) . The model structure changes as individuals progress through key life stages. In each life stage, the dependencies between the initial conditions and the life-course outcomes are represented by a model structure diagram (e.g. Figure 2 and Figure 3 ). Each solid arrow in these diagrams is modelled using equations, as we will explain in Section 2.3. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Note: The solid arrows represent the causal pathways that we model using modelling equations as described in Section 2.3; the dashed arrows represent implicit causal pathways that we do not explicitly model, but that exist in the childhood dataset. The boxes represent life outcomes: the dashed boxes represent exogenous inputs into our model that are taken as given either from the childhood dataset or the previous life-stage, the thick boxes represent final outcomes that directly influence wellbeing or impose a cost to the public budget. 6 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Note: The solid arrows represent the causal pathways that we model using modelling equations as described in Section 2.3. The boxes represent life outcomes: the dashed boxes represent exogenous inputs into our model that are taken as given either from the childhood dataset or the previous life-stage, the thick boxes represent final outcomes that directly influence wellbeing or impose a cost to the public budget. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 LifeSim also models variables relevant to public budget ( Figure 4 ). This includes modelling the public costs over time associated with certain life outcomes, such as conduct disorder, being in prison, mental illness, coronary heart disease, as well as cash benefits paid to people who are in poverty and/or unemployed. This also includes modelling the taxes paid over time on individual earnings and financial gains. These can be aggregated, to assess the overall impact on the public budget as well as cost savings under different policy scenarios and over various time spans. Details of the evidence and assumptions about the unit costs of public services and our simple approach to modelling long-run taxes and benefits are found in the Appendix A. We use random draws of individuals from childhood survey dataset -MCS -to create distributions of the initial conditions, as well as the child's cognitive skills and conduct problems measures for the simulated cohort of 100,000 individual (see Table 1 for definitions and summary statistics). We re-sample the MCS dataset using sampling weights and random observations with replacement. We measure conduct problem severity during childhood using the parent-reported Strengths and Difficulties Questionnaire (SDQ) conduct problem subscale score, reported in the MCS. This score ranges from 0-10, with a higher score representing more conduct problems. We then model the actual child's probability of developing a conduct disorder using a more sophisticated predictive algorithm based on a combination of the SDQ conduct problem score 8 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint and a further parent-reported "behavioural impact" score, which provides a specific probability of conduct disorder based on a classification as either "possible" or "probable" . This modelled probability is then combined with a random draw from a uniform distribution over 0-1, which allows to simulate the discrete outcome of whether a child develops a conduct disorder or not. Our cognitive skills measure is an age-specific common factor extracted from the cognitive skills measures available in MCS, including the British Ability Scales II (for ages 3, 5, 7, 11), Bracken School Readiness Assessment (for age 3), National Foundation for Educational Research (NFER) Progress in Maths (for age 7), Cambridge Neuropsychological Test Automated Battery tests (for ages 11 and 14) and Applied Psychology Unit (for age 14). We extract a common factor for each age where test results are available using principal component analysis, and standardise it to be with a mean of 1.00 and standard deviation of 0.15 (following Jones and Schoon (2008) ). More details on the conduct problems and skills measures can be found in Appendix A. The other simulated characteristics of children and their parents are summarised in Table 1 . It should be noted that any other variable of interest that is reported in MCS or can be linked to MCS, can be easily added to LifeSim. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint Note: The analysis is for 100,000 individuals in the LifeSim cohort. MCSj denotes MCS sweep j (6 sweeps in total). Children were 9 months old in MCS1, 3 years old in MCS2, 5 years old in MCS3, 7 years old in MCS4, 11 years old in MCS5 and 14 years old in MCS6. SDQ conduct problem and impacts scores have a scale 0-10, with a higher value representing more problems/higher impact of problems. The cognitive skills measure is an age-specific common factor extracted from the various cognitive skills measures in MCS, including the British Ability Scales II, Bracken School Readiness Assessment, National Foundation for Educational Research (NFER) Progress in Maths, Cambridge Neuropsychological Test Automated Battery tests and Applied Psychology Unit. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint To model later life outcomes, we use equations which we: (i) calibrate using target data from observational studies, which describe expected levels of and associations between variables at a point in time; (ii) parametrise using effect estimates, which attempt to draw inferences about the effect of one variable on another variable, either at the same time or a future point in time. Table 2 summarises the target datasets that we use. Table 3 summarises the determinants of the modelled outcomes, as well as the parameter sources used, if applicable. More details are found in the Appendix A. Our target data comes from the most up-to-date and nationally representative available surveys and administrative records in England. Our effect estimates come from studies based on longitudinal data in a UK context, unless robust estimates are only available from other high-income countries. Where possible, we try to use causal inference studies, e.g. based on quasi-experimental design. Using estimates based on past cohorts of individuals relies on the assumption that historical cohorts are a reliable proxy for the modelled cohort. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; Note: MCS -Millennium Cohort Study, ONS -Office for National Statistics, IMD -Index of Multiple Deprivation. Our notation uses an overline to denote averages from a target dataset. 12 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint (2003)); depression (Lasser et al., 2000) , prison (Singleton, Farrell and Meltzer, 2003) ; Teenage smoking rates by sex (MCS, age 14) (param. from Jefferis et al. (2003) ), smoking rate in England by age, sex and IMD quintile group (HSE, 2006) ; Coronary heart disease (CHD) P, I L.prob. of CHD; L.smoking (Bazzano et al., 2003; Critchley and Capewell, 2003) ; L.poverty (Marmot et al., 1997) ; 13 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Most of our equations can be described as one of the following: (i) simple level equations based on target data only; (ii) complex level equations based on target data supplemented with effect estimates; (iii) simple difference equations based on target data only; (iv) complex difference equations based on target data supplemented with effect estimates. We illustrate each below in turn with a simple example. 14 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint To model the individual probability of dying, the simplest approach is to use historical mortality rates: 2 where dead[age i , sex i , imd i ] is the mean probability of dying conditional on age, sex and English index of multiple deprivation (IMD) quintile group, calculated using a target dataset such as the Office for National Statistics mortality data (see Table 2 ). We denote means from a target dataset using an overline. We can also supplement equation (1) is not independent from CHD, but the variable CHD is not observable in the ONS mortality target dataset, so we cannot directly condition the target mortality mean on the CHD status. After multiplying each term in the brackets by the beta coefficient, it can be seen that our approach is equivalent to subtracting the 'population attributable risk' from the risk of the simulated individual (Webb, Bain and Page, 2016 ). If a level of a variable is already known, we can proceed by modelling the evolution of a variable as a difference from the previous time period. For is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10. 1101 /2021 individual earnings during the subsequent periods as: where earnings i,age = earnings i,age − earnings i,age−1 is the change in earnings from the previous year, and trend.earnings[age i , sex i ] is a trend that governs the changes in earnings over time, calculated from a target dataset on earnings by age and sex. Similar to level-equations, we can supplement equation (4) with an effects estimate. For example, to model that developing a depression reduces earnings by a certain level represented by β earnings depressed we use: where depressed i,age is an indicator of an individual having a depression at a given age and depressed i,age = depressed i,age − depressed i,age−1 More details on the modelling equations are found in the Appendix A. Conventional methods of unweighted benefit-cost analysis can be criticised on two important is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint individual wellbeing in year t by a function w t () increasing in both consumption and health. More specifically, w(..) = health i,age + u(consumption i,age ) where u(.) is a standard isoelastic utility of income function defined as u(.) = A − B × consumption 1−η i,age . The parameter η > 1 captures diminishing marginal value of income, and A and B are constants which depend on normative parameters: η (already mentioned), minimal consumption for a life worth living and standard consumption for a good life. In the current application we set minimal consumption at £1,000 (estimated amount required to buy basic food supplies in the UK for a year) and standard consumption at £24,000 (the mean consumption in the LifeSim simulated cohort), and η = 1.26 (see Cookson et al. (2016) ). LifeSim is implemented in software R (tested on R version 3.6.2) using object-oriented programming for R (requires R6 and tidyverse packages). The code and related data files compressed in a zip-file 'LifeSim.zip' can be extracted and run on a high performance computing (HPC) cluster (Slurm Workload Manager). When we split the simulation into 500 parts, it takes 28 minutes to run it on the HPC cluster. The simulation can also be run on a standard PC, for any chosen number of individuals. The current code is written in a 'user-friendly' object-oriented way, allowing to easily add additional variables of interest. However, it would be possible to speed up the code by vectorising the simulation at the cost of making it less user friendly. 17 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10. 1101 /2021 In this section we show our baseline simulation results, and demonstrate some formats in which they can be analysed. Table 4 provides key summary statistics for the simulated outcomes, including child outcomes, adult outcomes and final wellbeing outcomes. We show means, standard deviations, and the minimum and maximum value of an outcome in the total distribution of the simulated individuals. Table 4 does not present the summary statistics of the the initial conditions, as well as the child's cognitive skills and conduct problems measures that we obtain from the childhood survey dataset (MCS), as these variables have already been summarised in Table 1 . Approximately 9% of 18 year-old adults develop conduct disorder in the LifeSim simulation. This estimate fits within the range of 1-10 %, commonly reported in the epidemiology literature on conduct disorder (see a review in Hinshaw and Lee (2003) , also Patel et al. (2018) ). Our estimate, however, slightly exceeds the 8% of young men and 5% of young women with conduct disorder estimated by Mental Health of Children and Young People in England survey in year 2017. This small difference may be caused by the fact that the algorithm by ; that we use to simulate conduct disorder incidence is based and validated on child samples attending child mental health clinics, and it may overestimate the actual conduct disorder prevalence in the general population. On the other hand, conduct disorder diagnosis in the clinic sample can be argued to be more precise and sensitive than in the survey data sample, because in the clinic sample diagnosis was made by mental health specialists using detailed information on symptoms and resultant impairments gathered from multiple informants, whereas in the specific survey sample diagnosis was based on a single specific tool -Development and Well-Being Assessment. In conclusion, we also find this difference in conduct disorder prevalence rates small, and our estimate is consistent with the more general findings in the literature and the concern that conduct disorder prevalence is often under-estimated in survey data. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Note: Mean calculated for the simulated population of 100,000 (for the lifetime aggregates, or yearly -for the annual variables); SD -standard deviation, Min -minimum value, Maxmaximum value; CHD -coronary heart disease. The time periods for calculating life-stage proportions are as follows: 'working years' refer to the period between ages 19-69; 'retirement' refers to the time period from age 70 up to death; adult years refer to the time period from age 19 up to death; lifetime refers to the entire period from birth to death. We use year 2015/16 prices and the annual discount rate of 1.5% (Paulden and Claxton, 2012) . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Note: All values are calculated per simulated individual, shown in year 2015/16 prices, and discounted at 1.5% annual rate. See details on cost sources in table A.5 in the Appendix. Table 6 provides two summary measures of inequality, based on differences in lifetime expected wellbeing between best off and worst off groups on the basis of the following early childhood circumstances -sex, parental income quintile group (poorest vs. richest 20%), parental mental health, parental education, and high baseline conduct problems (SDQ conduct problem score at age 5 equal to 7 or above). Our "extreme best off group" focuses on individuals in the top category of all four main markers of social disadvantage in early life (top 20% parental income, high parental education, no parental mental illness, high baseline conduct problems). Our "best off 20% group" focuses on the best off 20% of individuals in terms of predicted lifetime wellbeing based on all four main markers of social disadvantage in early life. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint We would expect some adult outcomes to be similar (e.g. health) but others to be substantially different (e.g. earnings, rates of smoking and university education), and so this can be seen as a simple validation check to ensure that our model provides broadly similar findings in the same ballpark where appropriate, and substantially different findings where we know different generations had very different experiences e.g. smoking. Nevertheless, most variables do not deviate substantially from the same quantities characterising the cohort born in 1970. One exception already mentioned is smoking, which is expected and can be explained by the change in smoking rates over time. Another exception is education -the proportion of people with a degree under 30 years old -is much higher in the LifeSim cohort. This can be explained by the change in higher education participation rates over time, and increased equality between 4 Standard estimates of gaps in healthy life expectancy by current socioeconomic status are substantially larger than our estimate of gaps by childhood circumstance, due to dynamic interdependence between health and social status over the lifecourse. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint the genders in the cohort born in 2000. Over time the 1970s cohort partially catches up with the LifeSim cohort by obtaining qualifications at a later age -at the age 46 the proportion of people with a university degree is more similar in both samples than at the age 26. Finally, the LifeSim earnings at all ages on average exceed the 1970s cohort earnings. This can be explained by cohort effects, such as general differences in economy, society, culture and politics experienced by the two cohorts. Note: N -number of observations, SD -standard-deviation. We quantify the difference between the LifeSim distribution and BCS70 distribution in terms of the absolute difference in their means and standard deviations. Earnings is the net pay from employment. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 To avoid such general cohort effects which arise when comparing two generations born 30 years apart, we also carry out a simple validity check using more recent cross-sectional datasets. More specifically, we compare our age-specific LifeSim outcomes with age-specific outcomes in cross-sectional data. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 be addressed as part of future work is modelling of the relatively longer right hand side tail which can be observed for the Understanding Society data and not for the LifeSim data. This tail represents the highest-earning people in the distribution, and the LifeSim earnings output does not have this tail, as we do not model the outcome of being employed in extremely-high earning jobs. Addressing this feature in LifeSim would require modelling the link with variables in early life that would lead to such extremely-high earning states. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; In Figure 8 we compare the prevalence of the different discrete outcomes in LifeSim cohort, and in our corresponding target datasets, which include Health Survey for England for the healthrelated outcomes, ONS Labour Force Survey for unemployment and Department for Education estimates for participation in higher education. The simulated outcomes matches the target data well, but there is some small discrepancy with the Understanding Society data, which can be explained by differences how data on similar outcomes is collected across different surveys. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 We present LifeSim -a novel microsimulation model for analysing the long-term consequences of childhood policies. Unlike previous models, LifeSim is capable of modelling a rich set of developmental, social, economic and health outcomes from birth to death for each child in a general population birth cohort of 100,000 English children born in the year 2000-1. The main strength of our model is that it captures the dynamic individual-level interaction between many outcomes across the social, economic and health domains over the entire lifecourse. Previous models have modelled either only a few individual-level outcomes over part of the lifecourse, or looked at aggregate-level outcomes only. Simultaneously analysing many outcomes is more informative as it allows capturing how many early life disadvantages can compound over life creating a spiral of multiple disadvantage. Another strength of LifeSim is that it simulates the long-run outcomes for a whole general population cohort of children, not just a narrow group of trial participants, which allows carrying out more complex and policy-relevant analysis. Our model is forward looking, which allows analysing the long-term childhood policy consequences for cohorts born now rather than analysing the past policies with consequences for historical cohorts, but which are not as relevant to current childhood policy context. LifeSim generates long-term individual-level data, which makes it compatible with applying new multidimensional summary indices of wellbeing recently proposed in the theoretical literature (Cookson et al., 2020; O'Donnell et al., 2014; . These indices are more informative than conventional monetary valuation based on aggregate outcomes, as they allow to account for the diminishing marginal value of consumption and other sources of heterogeneity in the marginal value of different life outcomes to different individuals. However, application of these indices in practice requires individual level long-term time series data on many outcomes across the health, social and economic outcome domains. Such rich long-term data is difficult to obtain from existing datasets, especially if we are interested in analysing cohorts living in present rather than historical cohorts of people born decades ago. Models such as LifeSim can compile the many data sources together to extrapolate the required individual-level long term outcomes. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 LifeSim can easily be extended to incorporate additional features. One extension would be to incorporate more outcomes. Our model includes many different categories of human capital (e.g. cognitive skills, social skills, educational attainment, health, employment) but within each category, more nuanced distinctions could be made. Health outcomes are modelled using just three binary variables -mental illness (depression), physical illness (CHD) and mortality -educational outcomes focus only on gaining a university degree; employment outcomes focus only on unemployment not precarious employment; and our modelling of the tax and benefit system and retirement savings is extremely stylised. Similarly, more individual-level factors could be included (e.g. ethnicity), more family-level factors (e.g. child abuse) and more neighbourhoodlevel factors (e.g. air quality). Also, our tax benefit modelling is highly stylised and could be improved by incorporating a standard static tax benefit calculator, such as Euromod (Sutherland and Figari, 2013) . Another extension would be to produce a more joined-up set of transition probability estimates by conducting a comprehensive re-analysis of longitudinal data, rather than piecing together estimates from existing peer reviewed studies, as set in detail in Appendix A. Specific transition pathway estimates could also be modified in specific cases to strengthen external validity for specific populations. For example, estimates based on long-term outcomes for mostly white children born in the 1970s may not be applicable to Asian British populations. Using external data sources to estimate long-run health effects for Asian British populations would produce more applicable estimates for those populations. Another extension would be to re-calibrate our model to other populations -e.g. the UK in 2025, or England or Scotland, or a sub-national area of England -by updating the initial conditions of the birth population and the external macro target data on average population level outcomes and associations within that birth population in subsequent years. Finally, our model structure could also be extended in more fundamental ways -for example, to model the all-age population rather than just a birth cohort, to model the dynamics of family formation and dissolution and spillover effects on other family members, and parental investment choices and other behavioural responses. It should be acknowledged that considering such extensions involves making trade-offs between model complexity and tractability, and in some cases it may be preferable to use other more 28 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint specialist models and combine the findings from different models, rather than expand an existing model. For example, as already mentioned -our model could be combined with Euromod (Sutherland and Figari, 2013 ) -the tax and benefit microsimulation model, to generate more comprehensive output on taxes and benefits for the assessment of the consequences to the public budget. Overall, LifeSim is a flexible and policy-relevant model which can be easily implemented to carry out long term childhood policy analysis. New variables of interest, in particular childhood variables, can be easily incorporated within LifeSim, and the input datasets can be updated, as required. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.12.21251642 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 The Oxford Handbook of Well-being and Public Policy Preface to the Biology of Disadvantage: Socioeconomic Status and Health Early Intervention: the Next Steps, an Independent Report to Her Majesty's Government by Graham Allen MP. The Stationery Office Childhood Circumstances and Adult Outcomes: Act II Early Childhood Development Coming of Age: Science Through the Life Course Axiomatic Foundations For Cost-Effectiveness Analysis Quality Adjusted Life Years Based on Health and Consumption: A Summary Wellbeing Measure for Cross-Sectoral Economic Evaluation Years of Good Life Based on Income and Health: Re-Engineering Cost-Benefit Analysis to Examine Policy Impact on Wellbeing and Distributive Justice Investing in Our Young People Assessment of the Cost-Benefit Literature on Early Childhood Education for Vulnerable Children: What the Findings Mean for Policy On Estimating the Fiscal Benefits of Early Intervention Behavioral Welfare Economics and Redistribution Equivalent Income and Fair Evaluation of Health Care Social and Emotional Skills in Childhood and Their Long-Term Effects on Adult Life Predicting Type of Psychiatric Disorder From Strengths and Difficulties Questionnaire (SDQ) Scores in Child Mental Health Clinics in London and Dhaka Using the Strengths and Difficulties Questionnaire (SDQ) to Screen for Child Psychiatric Disorders in a Community Sample Conduct and oppositional defiant disorders Millennium Cohort Study Third Survey: A User's Guide to Initial Findings Validation of Population-Based Disease Simulation Models: A Review of Concepts and Methods What Predicts a Successful Life? a Life-Course Model of Well-Being Adversity in Childhood is Linked to Mental and Physical Health Throughout Life Wellbeing and Policy. London, United Kingdom: Legatum Institute Health and inequality Understanding the demographic predictors and associated comorbidities in children hospitalized with conduct disorder Budget Allocation and the Revealed Social Rate of Time Preference for Health Building a New Biodevelopmental Framework to Guide the Future of Early Childhood Policy EUROMOD: the European Union taxbenefit microsimulation model Essential Epidemiology: An Introduction for Students and Health Professionals Parameter Sources Youth Depression and Future Criminal Behavior Health Care Costs in the English NHS: Reference Tables for Average Annual NHS Spend by Age, Sex and Deprivation Group Relationship Between Cigarette Smoking and Novel Risk Factors for Cardiovascular Disease in the United States The Returns to Higher Education in Britain: Evidence From a British Cohort Costs and Longer-Term Savings of Parenting Programmes for the Prevention of Persistent Conduct Disorder: A Modelling Study All-Cause Mortality Among People With Serious Mental Illness (SMI), Substance Use Disorders, and Depressive Disorders in Southeast London: A Cohort Study Mortality Risk Reduction Associated With Smoking Cessation in Patients With Coronary Heart Disease: A Systematic Review Unit Costs of Health and Social Care 2017 The economic and social costs of crime against individuals and households 2003/04 Parenting Programme for Parents of Children at Risk of Developing Conduct Cisorder: Cost Effectiveness Analysis The Effects of Parents' Psychiatric Disorders on Children's High School Dropout Show Me the Child at Seven: The Consequences of Conduct Problems in Childhood for Psychosocial Functioning in Adulthood Adolescent Depression and Educational Attainment: Results Using Sibling Fixed Effects Could Scale-Up of Parenting Programmes Improve Child Disruptive Behaviour and Reduce Social Inequalities? Using Individual Participant Data Meta-Analysis to Establish for Whom Programmes Are Effective and Cost-Effective Social and Emotional Skills in Childhood and Their Long-Term Effects on Adult Life Predicting Type of Psychiatric Disorder From Strengths and Difficulties Questionnaire (SDQ) Scores in Child Mental Health Clinics in London and Dhaka Using the Strengths and Difficulties Questionnaire (SDQ) to Screen for Child Psychiatric Disorders in a Community Sample Cigarette Consumption and Socio-Economic Circumstances in Adolescence as Predictors of Adult Smoking Smoking and Mental Illness: A Population-Based Prevalence Study The Economic Burden of Coronary Heart Disease in the UK The Social Distribution of Health: Estimating Quality-Adjusted Life Expectancy in England Trajectories of Preschool Disorders to Full DSM Depression at School Age and Early Adolescence: Continuity of Preschool Depression Contribution of Job Control and Other Risk Factors to Social Variations in Coronary Heart Disease Incidence Paying the Price: The Cost of Mental Health Care in England to 2026 Prevalence of Depression in Older People in England and Wales: The MRC CFA Study Economic Cost of Severe Antisocial Behaviour in Children-And Who Pays It Financial Cost of Social Exclusion: Follow up Study of Antisocial Children into Adulthood Substance Misuse among Prisoners in England and Wales Current Prevalence of Dementia, Depression and Behavioural Problems in the Older Adult Care Home Sector: The South East London Care Home Survey Catalogue of EQ-5D Scores for the United Kingdom Employment Transitions and Mental Health: An Analysis from the British Household Panel Survey Material Standard of Living, Social Class, and the Prevalence of the Common Mental Disorders in Great Britain We would first like to thank the members of our advisory group: Annalisa Belloni, Sarah Cattan, Leon Feinstein, Paul Frijters, Peter Goldblatt, Heather The errors and opinions expressed in this paper are our own.