key: cord-0768344-sfmhhyky authors: Roth, Gregory A.; Mensah, George A.; Johnson, Catherine O.; Addolorato, Giovanni; Ammirati, Enrico; Baddour, Larry M.; Barengo, Noël C.; Beaton, Andrea Z.; Benjamin, Emelia J.; Benziger, Catherine P.; Bonny, Aimé; Brauer, Michael; Brodmann, Marianne; Cahill, Thomas J.; Carapetis, Jonathan; Catapano, Alberico L.; Chugh, Sumeet S.; Cooper, Leslie T.; Coresh, Josef; Criqui, Michael; DeCleene, Nicole; Eagle, Kim A.; Emmons-Bell, Sophia; Feigin, Valery L.; Fernández-Solà, Joaquim; Fowkes, Gerry; Gakidou, Emmanuela; Grundy, Scott M.; He, Feng J.; Howard, George; Hu, Frank; Inker, Lesley; Karthikeyan, Ganesan; Kassebaum, Nicholas; Koroshetz, Walter; Lavie, Carl; Lloyd-Jones, Donald; Lu, Hong S.; Mirijello, Antonio; Temesgen, Awoke Misganaw; Mokdad, Ali; Moran, Andrew E.; Muntner, Paul; Narula, Jagat; Neal, Bruce; Ntsekhe, Mpiko; Moraes de Oliveira, Glaucia; Otto, Catherine; Owolabi, Mayowa; Pratt, Michael; Rajagopalan, Sanjay; Reitsma, Marissa; Ribeiro, Antonio Luiz P.; Rigotti, Nancy; Rodgers, Anthony; Sable, Craig; Shakil, Saate; Sliwa-Hahnle, Karen; Stark, Benjamin; Sundström, Johan; Timpel, Patrick; Tleyjeh, Imad M.; Valgimigli, Marco; Vos, Theo; Whelton, Paul K.; Yacoub, Magdi; Zuhlke, Liesl; Murray, Christopher; Fuster, Valentin title: Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019: Update From the GBD 2019 Study date: 2020-12-22 journal: J Am Coll Cardiol DOI: 10.1016/j.jacc.2020.11.010 sha: 34a9a8cfe00a5adadaa0c0fa12007feffc168205 doc_id: 768344 cord_uid: sfmhhyky Cardiovascular diseases (CVDs), principally ischemic heart disease (IHD) and stroke, are the leading cause of global mortality and a major contributor to disability. This paper reviews the magnitude of total CVD burden, including 13 underlying causes of cardiovascular death and 9 related risk factors, using estimates from the Global Burden of Disease (GBD) Study 2019. GBD, an ongoing multinational collaboration to provide comparable and consistent estimates of population health over time, used all available population-level data sources on incidence, prevalence, case fatality, mortality, and health risks to produce estimates for 204 countries and territories from 1990 to 2019. Prevalent cases of total CVD nearly doubled from 271 million (95% uncertainty interval [UI]: 257 to 285 million) in 1990 to 523 million (95% UI: 497 to 550 million) in 2019, and the number of CVD deaths steadily increased from 12.1 million (95% UI:11.4 to 12.6 million) in 1990, reaching 18.6 million (95% UI: 17.1 to 19.7 million) in 2019. The global trends for disability-adjusted life years (DALYs) and years of life lost also increased significantly, and years lived with disability doubled from 17.7 million (95% UI: 12.9 to 22.5 million) to 34.4 million (95% UI:24.9 to 43.6 million) over that period. The total number of DALYs due to IHD has risen steadily since 1990, reaching 182 million (95% UI: 170 to 194 million) DALYs, 9.14 million (95% UI: 8.40 to 9.74 million) deaths in the year 2019, and 197 million (95% UI: 178 to 220 million) prevalent cases of IHD in 2019. The total number of DALYs due to stroke has risen steadily since 1990, reaching 143 million (95% UI: 133 to 153 million) DALYs, 6.55 million (95% UI: 6.00 to 7.02 million) deaths in the year 2019, and 101 million (95% UI: 93.2 to 111 million) prevalent cases of stroke in 2019. Cardiovascular diseases remain the leading cause of disease burden in the world. CVD burden continues its decades-long rise for almost all countries outside high-income countries, and alarmingly, the age-standardized rate of CVD has begun to rise in some locations where it was previously declining in high-income countries. There is an urgent need to focus on implementing existing cost-effective policies and interventions if the world is to meet the targets for Sustainable Development Goal 3 and achieve a 30% reduction in premature mortality due to noncommunicable diseases. The general approach to estimating causes of death and disease incidence and prevalence for GBD 2019 is the same as for GBD 2017 (3, 4) . Here, we provide an overview of the methods, with an emphasis on the main methodology changes since GBD 2017. For each iteration of GBD, the estimates for the whole time series are updated on the basis of addition of new data and change in methods where appropriate. Thus, the GBD 2019 results supersede those from previous rounds of GBD. Geographical units, age groups, time periods, and cause levels GBD 2019 estimated each epidemiological quantity of interest-incidence, prevalence, mortality, years lived with disability (YLDs), years of life lost (YLLs), and disability-adjusted life-years (DALYs)-for 23 age groups; males, females, and both sexes combined; and 204 countries and territories that were grouped into 21 regions and seven super-regions. For GBD 2019, nine countries and territories (Cook Islands, Monaco, San Marino, Nauru, Niue, Palau, Saint Kitts and Nevis, Tokelau, and Tuvalu) were added, such that the GBD location hierarchy now includes all WHO member states. GBD 2019 includes subnational analyses for Italy, Nigeria, Pakistan, the Philippines, and Poland, and 16 countries previously estimated at subnational levels (Brazil, China, Ethiopia, India, Indonesia, Iran, Japan, Kenya, Mexico, New Zealand, Norway, Russia, South Africa, Sweden, the UK, and the USA). All subnational analyses are at the first level of administrative organisation within each country except for New Zealand (by Māori ethnicity), Sweden (by Stockholm and non-Stockholm), the UK (by local government authorities), and the Philippines (by province). At the most detailed spatial resolution, we generated estimates for 990 locations. The GBD diseases and injuries analytical framework generated estimates for every year from 1990 to 2019. Diseases and injuries were organised into a levelled cause hierarchy from the three broadest causes of death and disability at Level 1 to the most specific causes at Level 4. Within the three Level 1 causescommunicable, maternal, neonatal, and nutritional diseases; non-communicable diseases; and injuriesthere are 22 Level 2 causes, 174 Level 3 causes, and 301 Level 4 causes (including 131 level 3 causes that are not further disaggregated at Level 4) . 364 total causes are non-fatal and 286 are fatal. The GBD estimation process is based on identifying multiple relevant data sources for each disease or injury including censuses, household surveys, civil registration and vital statistics, disease registries, health service use, air pollution monitors, satellite imaging, disease notifications, and other sources. Each of these types of data are identified from systematic review of published studies, searches of government and international organisation websites, published reports, primary data sources such as the Demographic and Health Surveys, and contributions of datasets by GBD collaborators. 86 ,249 sources were used in this analysis, including 19,354 sources reporting deaths, 31,499 reporting incidence, 19 ,773 reporting prevalence, and 26,631 reporting other metrics. Each newly identified and obtained data source is given a unique identifier by a team of librarians and included in the Global Health Data Exchange (GHDx; http://ghdx.healthdata.org/). The GHDx makes publicly available the metadata for each source included in GBD as well as the data, where allowed by the data provider. Readers can use the GHDx source (http://ghdx.healthdata.org/gbd-2019/data-input-sources) tool to identify which sources were used for estimating any disease or injury outcome in any given location. A crucial step in the GBD analytical process is correcting for known bias by redistributing deaths from unspecified codes to more specific disease categories, and by adjusting data with alternative case definitions or measurement methods to the reference method. We highlight several major changes in data processing that in some cases have affected GBD results. Vital registration with medical certification of cause of death is a crucial resource for the GBD cause of death analysis in many countries. Cause of death data obtained using various revisions of the International Classification of Diseases and Injuries (ICD) (7) were mapped to the GBD cause list. Many deaths, however, are assigned to causes that cannot be the underlying cause of death (eg, cardiopulmonary failure) or are inadequately specified (eg, injury from undetermined intent). These deaths were reassigned to the most probable underlying causes of death as part of the data processing for GBD. Redistribution algorithms can be divided into three categories: proportionate redistribution, fixed proportion redistribution based on published studies or expert judgment, or statistical algorithms. For GBD 2019, data for 116 million deaths attributed to multiple causes were analysed to produce more empirical redistribution algorithms for sepsis (8) , heart failure, pulmonary embolism, acute kidney injury, hepatic failure, acute respiratory failure, pneumonitis, and five intermediate causes (hydrocephalus, toxic encephalopathy, compression of brain, encephalopathy, and cerebral oedema) in the central nervous system. To redistribute unspecified injuries, we used a method similar to that of intermediate cause redistribution, using the pattern of the nature of injury codes in the causal chain where the ICD codes X59 ("exposure to unspecified factor") and Y34 ("unspecified event, undetermined intent") and GBD injury causes were the underlying cause of death. These new algorithms led to important changes in the causes to which these intermediate outcomes were redistributed. Additionally, data on deaths from diabetes and stroke lack the detail on subtype in many countries; we ran regressions on vital registration data with at least 50% of deaths coded specifically to type 1 or 2 diabetes and ischaemic, haemorrhagic, or subarachnoid stroke to predict deaths by these subtypes when these were coded to unspecified diabetes or stroke. In previous cycles of GBD, data reported using alternative case definitions or measurement methods were corrected to the reference definition or measurement method primarily as part of the Bayesian meta-regression models. For example, in DisMod-MR, the population data were simultaneously modelled as a function of country covariates for variation in true rates and as a function of indicator variables capturing alternative measurement methods. To enhance transparency and to standardise and improve methods in GBD 2019, we estimated correction factors for alternative case definitions or measurement methods using network meta-regression, including only data where two methods were assessed in the same location-time period or in the exact same population. This included validation studies where two methods had been compared in populations that were not necessarily random samples of the general population. Clinical informatics data include inpatient admissions, outpatient (including general practitioner) visits, and health insurance claims. Several data processing steps were undertaken. Inpatient hospital data with a single diagnosis only were adjusted to account for non-primary diagnoses as well as outpatient care. For each GBD cause that used clinical data, ratios of non-primary to primary diagnosis rates were extracted from claims in the USA, Taiwan (province of China), New Zealand, and the Philippines, as well as USA Healthcare Cost and Utilization Project inpatient data. Ratios of outpatient to inpatient care for each cause were extracted from claims data from the USA and Taiwan (province of China). The log of the ratios for each cause were modelled by age and sex using MR-BRT (Meta-Regression-Bayesian Regularised Trimmed), the Bayesian meta-regression tool. To account for the incomplete health-care access in populations where not every person with a disease or injury would be accounted for in administrative clinical records, we transformed the adjusted admission rates using a scalar derived from the Healthcare Access and Quality Index (9) . We used this approach to produce adjusted, standardised clinical data inputs. For most diseases and injuries, processed data are modelled using standardised tools to generate estimates of each quantity of interest by age, sex, location, and year. There are three main standardised tools: Cause of Death Ensemble model (CODEm), spatiotemporal Gaussian process regression (ST-GPR), and DisMod-MR. Previous publications (3, 4, 10) provide more details on these general GBD methods. Briefly, CODEm is a highly systematised tool to analyse cause of death data using an ensemble of different modelling methods for rates or cause fractions with varying choices of covariates that perform best with out-of-sample predictive validity testing. DisMod-MR is a Bayesian meta-regression tool that allows evaluation of all available data on incidence, prevalence, remission, and mortality for a disease, enforcing consistency between epidemiological parameters. ST-GPR is a set of regression methods that borrow strength between locations and over time for single metrics of interest, such as risk factor exposure or mortality rates. In addition, for select diseases, particularly for rarer outcomes, alternative modelling strategies have been developed. In GBD 2019, we designated a set of standard locations that included all countries and territories as well as the subnational locations for Brazil, China, India, and the USA. Coefficients of covariates in the three main modelling tools were estimated for these standard locations only-ie, we ignored data from subnational locations other than for Brazil, China, India, and the USA. Using this set of standard locations will prevent changes in regression coefficients from one GBD cycle to the next that are solely due to the addition of new subnational units in the analysis that might have lower quality data or small populations. Changes to CODEm for GBD 2019 included the addition of count models to the model ensemble for rarer causes. We also modified DisMod-MR priors to effectively increase the out-of-sample coverage of uncertainty intervals (UIs) as assessed in simulation testing. DisMod-MR was used to estimate deaths from three outcomes (dementia, Parkinson's, and atrial fibrillation), and to determine the proportions of deaths by underlying aetiologies of cirrhosis, liver cancer, and chronic kidney disease deaths. The GBD 2019 estimation of attributable burden followed the general framework established for comparative risk assessment (CRA) (11, 12) used in GBD since 2002. Here, we provide a general overview and details on major innovations since GBD 2017. CRA can be divided into six key steps: inclusion of risk-outcome pairs in the analysis; estimation of relative risk as a function of exposure; estimation of exposure levels and distributions; determination of the counterfactual level of exposure, the level of exposure with minimum risk called the theoretical minimum risk exposure level (TMREL); computation of population attributable fractions and attributable burden; and estimation of mediation of different risk factors through other risk factors such as high body-mass index (BMI) and ischaemic heart disease, mediated through elevated systolic blood pressure (SBP), elevated fasting plasma glucose (FPG), and elevated LDL cholesterol, to compute the burden attributable to various combinations of risk factors (13) . GBD 2019 estimated prevalence of exposure and attributable deaths, YLLs, YLDs, and DALYs for 23 age groups; males, females, and both sexes combined; and 204 countries and territories that were grouped into 21 regions and seven super-regions. GBD 2019 includes subnational analyses for Italy, Nigeria, Pakistan, the Philippines, and Poland, and 16 countries previously estimated at subnational levels (Brazil, China, Ethiopia, India, Indonesia, Iran, Japan, Kenya, Mexico, New Zealand, Norway, Russia, South Africa, Sweden, the UK, and the USA). All subnational analyses are at the first level of administrative organisation within each country except for New Zealand (by Māori ethnicity), Sweden (by Stockholm and non-Stockholm), the UK (by local government authorities), and the Philippines (by province). For this cycle, nine countries and territories (Cook Islands, Monaco, San Marino, Nauru, Niue, Palau, Saint Kitts and Nevis, Tokelau, and Tuvalu) were added, such that the GBD location hierarchy now includes all WHO member states. These new locations were previously included in regional totals by assuming that age-specific rates were equal to the regional rates. At the most detailed level, we generated estimates for 990 locations. The GBD diseases and injuries analytical framework generated estimates for every year from 1990 to 2019. Individual risk factors such as low birthweight or ambient ozone pollution are evaluated in the GBD CRA. In addition, there has been policy interest in groups of risk factors such as household air pollution combined with ambient particulate matter. To accommodate these diverse interests, the GBD CRA has a risk factor hierarchy. Level 1 risk factors are behavioural, environmental and occupational, and metabolic; Level 2 risk factors include 20 risks or clusters of risks; Level 3 includes 52 risk factors or clusters of risks; and Level 4 includes 69 specific risk factors. Counting all specific risk factors and aggregates computed in GBD 2019 yields 87 risks or clusters of risks. Since GBD 2010, we have used the World Cancer Research Fund criteria for convincing or probable evidence of risk-outcome pairs (14) . For GBD 2019, we completely updated our systematic reviews for 81 risk-outcome pairs. Convincing evidence requires more than one study type, at least two cohorts, no substantial unexplained heterogeneity across studies, good-quality studies to exclude the risk of confounding and selection bias, and biologically plausible dose-response gradients. For GBD, for a newly proposed or evaluated risk-outcome pair, we additionally required that there was a significant association (p<0·05) after taking into account sources of potential bias. To avoid risk-outcome pairs repetitively entering and leaving the analysis with each cycle of GBD, the criteria for exclusion requires that with the available studies the association has a p value greater than 0·1. On the basis of these reviews and meta-regressions, 12 risk-outcome pairs included in GBD 2017 were excluded from GBD 2019: vitamin A deficiency and lower respiratory infections; zinc deficiency and lower respiratory infections; diet low in fruits and four outcomes: lip and oral cavity cancer, nasopharynx cancer, other pharynx cancer, and larynx cancer; diet low in whole grains and two outcomes: intracerebral haemorrhage and subarachnoid haemorrhage; intimate partner violence and maternal abortion and miscarriage; and high FPG and three outcomes: chronic kidney disease due to hypertension, chronic kidney disease due to glomerulonephritis, and chronic kidney disease due to other and unspecified causes. In addition, on the basis of multiple requests to begin capturing important dimensions of climate change into GBD, we evaluated the direct relationship between high and low non-optimal temperatures on all GBD disease and injury outcomes. Rather than rely on a heterogeneous literature with a small number of studies examining relationships with specific diseases and injuries, we analysed individuallevel cause of death data for all locations with available information on daily temperature, location, and International Classification of Diseases-coded cause of death. These data totalled 58.9 million deaths covering eight countries. On the basis of this analysis, 27 GBD cause Level 3 outcomes met the inclusion criteria for each non-optimal risk factor and were included in this analysis. Other climate-related relationships, such as between precipitation or humidity and health outcomes, have not yet been evaluated. Estimating relative risk as a function of exposure for each risk-outcome pair In GBD, we use published systematic reviews and for GBD 2019, we updated these where necessary to include any new studies that became available before Dec 31, 2019. We did meta-analyses of relative risks from these studies as a function of exposure. For GBD 2019, 81 new systematic reviews were done, including for 44 diet risk-outcome pairs. To allow for risk functions that might not be log-linear, we relaxed the meta-regression assumptions to allow for monotonically increasing or decreasing but potentially non-linear functions for 147 risk-outcome pairs. 218 risk-outcome pairs were estimated assuming log-linear relationships. For 126 risk-outcome pairs, exposure was dichotomous or polytomous. For 37 risk-outcome pairs, the population attributable fractions were assumed by definition to be 100% (eg, 100% of diabetes is assumed to be, by definition, related to elevated FPG). For 32 risk-outcome pairs, other approaches were used that reflected the nature of the evidence that has been collected for those risks. For risks that affect cardiovascular outcomes, we adjusted relative risks by age such that they follow the empirical pattern of attenuation seen in published studies for elevated SBP, FPG, and LDL cholesterol. For each risk factor, we systematically searched for published studies, household surveys, censuses, administrative data, ground monitor data, or remote sensing data that could inform estimates of risk exposure. To estimate mean levels of exposure by age-sex-location-year, specific methods varied across risk factors. For many risk factors, exposure data were modelled using either spatiotemporal Gaussian process regression or DisMod-MR 2.1 (4, 15) , which are Bayesian statistical models developed over the past 12 years for GBD analyses. For most risk factors, the distribution of exposure across individuals was estimated by modelling a measure of dispersion, usually the SD, and fitting an ensemble of parametric distributions to the predicted mean and SD. Ensemble distributions for each risk were estimated based on individual-level data. Because of the strong dependency between birthweight and gestational age, exposure for these risks was modelled as a joint distribution using the copula method (16) . In many cases, exposure data were available for the reference method of ascertainment and for alternative methods, such as tobacco surveys reporting daily smoking versus total smoking; in these cases, we estimated the statistical relationship between the reference and alternative methods of ascertainment using network meta-regression and corrected the alternative data using this relationship. For harmful risk factors with monotonically increasing risk functions, the theoretical minimum risk level was set to 0. For risk factors with J-shaped or U-shaped risk functions, such as for sodium and ischaemic heart disease or BMI and ischaemic heart disease, the TMREL was determined as the low point of the risk function. When the bottom of the risk function was flat or poorly determined, the TMREL uncertainty interval (UI) captured the range over which risks are indistinguishable. For protective risks with monotonically declining risk functions with exposure, namely risk factors where exposure lowers the risk of an outcome, the challenge is selecting the level of exposure with the lowest level of risk strongly supported by the available data. Projecting beyond the level of exposure supported by the available studies could exaggerate the attributable burden for a risk factor. In these cases, for each riskoutcome pair, we determined the exposure level at the 85th percentile of exposure in the cohorts or trials used in the risk meta-regression. We then generated the TMREL by weighting each risk-outcome pair by the relative global magnitude of each outcome. For each risk factor j, we computed the population attributable fraction (PAF) by age-sex-location-year using the following general formula for a continuous risk: where PAFjoasgt is the PAF for cause o, for age group a, sex s, location g, and year t; RRjoasg(x) is the relative risk as a function of exposure level x for risk factor j, for cause o controlled for confounding, age group a, sex s, and location g with the lowest level of observed exposure as l and the highest as u; Pjasgt(x) is the distribution of exposure at x for age group a, sex s, location g, and year t; and TMRELjas is the TMREL for risk factor j, age group a, and sex s. Where risk exposure is dichotomous or polytomous, this formula simplifies to the discrete form of the equation. Estimation of the PAF took into account the risk function and the distribution of exposure across individuals in each age-sex-location-year. By drawing 1000 samples from the risk function, 1000 distributions of exposure for each age-sex-location-year, and 1000 samples from the TMREL, we propagated all of these sources of uncertainty into the PAF distributions. PAFs were also applied at the draw level to the uncertainty distributions of each associated outcome for that age-sex-location-year. For the estimation of each specific risk factor, the counterfactual distribution of exposure is the TMREL for that specific risk with no change in other risk factors. Thus, the sum of these risk-specific estimates of attributable burden can exceed 100% for some causes, such as cardiovascular diseases. It is also useful to assess the PAF and attributable burden for combinations of risk factors, such as all diet components together or household air and ambient particulate matter pollution. To estimate the combined effects of risk factors, we should take into account how one risk factor might be mediated through another (eg, the effect of fruit intake might be partly mediated through fibre intake). We used the mediation matrix as developed in GBD 2017 (6) to try to correct for overestimation of the PAF and the attributable burden for combinations of risks if we were to simply assume independence without any mediation. We computed risk-deleted death rates as the death rates that would be observed if all risk factors were set to their respective TMRELs. This was calculated as the death rate in each age-sex group multiplied by 1 minus the all-risk PAF for that age-sex group in each location. Vital registration and verbal autopsy data were used to model the parent cardiovascular envelope. We outliered non-representative subnational verbal autopsies from a number of Indian states and verbal autopsy data in Nepal and Papua New Guinea that were implausible in terms of time and age trends. We also outliered verbal autopsy data sources that were implausibly low in all age groups and ICD8 and ICD9BTL data points that were inconsistent with the rest of the data and created implausible time trends. We used a standard CODEm approach to model deaths from cardiovascular diseases. The covariates included in the ensemble modelling process are listed in the table below. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from cardiovascular disease. In addition, the dietary covariate for whole grains (kcal/capita, adjusted) the covariate for socio-demographic index as exploratory analyses indicated that these covariates were not predictive of the outcome. The summary exposure value scalar for CVD was dropped as this covariate was not produced for Level 2 causes in GBD 2019. Apart from these changes to the covariates, there are no other substantive changes from the approach used in GBD 2017. Vital registration and verbal autopsy data were used to model ischaemic heart disease. We outliered verbal autopsy data in countries and subnational locations where high-quality vital registration data were also available. We also outliered non-representative subnational verbal autopsy data points, ICD8 and ICD9BTL data points which were inconsistent with the rest of the data and created implausible time trends, and data in a number of Indian states identified by experts as poor-quality. We used a standard CODEm approach to model deaths from ischemic heart disease. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from ischaemic heart disease. We changed the direction of the alcohol variable from 0 to 1 to reflect our a priori hypothesis about the expected direction of the association between this risk factor and mortality risk of ischaemic heart disease. In addition, we changed the level of the covariate for trans fatty acid from 1 to 3. Besides these covariate changes, there are no other substantive changes from the approach used in GBD 2017. Verbal autopsy and vital registration data were used to model cerebrovascular disease (stroke). We reassigned deaths from verbal autopsy reports for cerebrovascular disease to the parent cardiovascular disease for both sexes for those under 20 years of age. We outliered non-representative subnational verbal autopsy datapoints. We also outliered ICD8, ICD9BTL, and tabulated ICD10 datapoints which were inconsistent with the rest of the data and created implausible time trends. Datapoints from sources which were implausibly low in all age groups and data points that were causing the regional estimates to be improbably high were outliered. We used a standard CODEm approach to model deaths from stroke. The covariates included in the ensemble modelling process are listed in the table below. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids (PUFA) were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from stroke. We dropped the dietary covariate for whole grains (kcal/capita, adjusted) and the socio-demographic index covariate as exploratory analyses indicated that these variables were not predictive of stroke mortality. In addition, we changed the direction of the alcohol consumption covariate from 0 to 1 to reflect the expected direction of the association for this risk factor with stroke mortality. Apart from these covariate changes, there are no substantive changes from the approach used in GBD 2017. Vital registration data were used to model deaths from ischaemic stroke. We outliered ICD8 data points which were inconsistent with the rest of the data and created implausible time trends. We also outliered ICD10 data points in The Republic of Tajikistan due to unstable and implausible estimates in similar age groups. We used a standard CODEm approach to model deaths from ischemic stroke. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from ischaemic stroke. In addition, the dietary covariate for whole grains (kcal/capita, adjusted) and the socio-demographic index covariate were dropped as exploratory analyses indicated that the covariates were not predictive of the outcome. In addition, we changed the direction of the alcohol variable from 0 to 1 to reflect our a priori hypothesis about the expected direction of the association between this risk factor and mortality risk of ischaemic stroke. We also changed the level of the trans fatty acid covariate from 1 to 3. Besides these covariate changes, there are no other substantive changes from the approach used in GBD 2017. Vital registration data were used to model intracerebral haemorrhage. We outliered ICD8 data points which were inconsistent with the rest of the data and created implausible time trends. In addition, we outliered vital registration data points in certain countries in Latin American countries due to implausibly high values at the oldest age groups resulting in inconsistencies in time trends. We used a standard CODEm approach to model deaths from intracerebral haemorrhage. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from intracerebral haemorrhage. In addition, the dietary covariate for whole grains (kcal/capita, adjusted) and the social demographic index covariate were dropped as exploratory analyses indicated that these covariates were not predictive of the mortality risk from intracerebral haemorrhage. We changed the direction of the covariate for alcohol from 0 to 1 due to our a priori hypothesis about the direction of the association for this covariate. We also changed the level of the cholesterol covariate from 1 to 3 and the direction from 0 to -1 to reflect the mixed and inconclusive evidence regarding cholesterol levels and risk of intracerebral haemorrhage. In addition, we changed the level of the trans fatty acid from covariate from 1 to 3 in accordance with the expected importance of this risk factor on mortality from intracerebral haemorrhage. Besides these covariate changes, there are no other substantive changes from the approach used in GBD 2017. Vital registration data were used to model subarachnoid haemorrhage. We outliered ICD8 datapoints which were inconsistent with the rest of the data and created implausible time trends. In addition, we outliered vital registration data in Tibet that was implausibly high for all years and age groups. We used a standard CODEm approach to model deaths from subarachnoid haemorrhage. The covariates chosen for inclusion in the ensemble modelling process are listed in the table below. For GBD 2019, we dropped the Socio-demographic Index covariate as exploratory analyses indicated that it was not predictive of the outcome. We also changed the direction of the alcohol covariate from 0 to 1 to reflect the expected direction of the association of this risk factor with mortality risk. Apart from these changes to the covariates, there are no substantive changes from the approach used in GBD 2017. Vital registration data were used to model cause-specific mortalty for hypertensive heart disease. We outliered ICD9BTL data points, which were inconsistent with the rest of the data and created implausible time trends. In addition, we outliered vital registration data from Grenada in 2017 for being implausibly low across all age groups. We used a standard CODEm approach to model deaths from hypertensive heart disease. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from hypertensive heart disease. We also changed the direction of the covariates for alcohol and socio-demographic index from 0 to 1 to reflect the expected direction of these covariates with mortality risk. Apart from these covariate updates, there are no other substantive changes from the approach used in GBD 2017. For GBD 2019, input data for estimating mortality due to congenital anomalies was centrally extracted, processed, and stored in cause of death (CoD) database. Vital registration (VR) was the dominant data type, followed by verbal autopsy (VA) and surveillance. Those CoD data sources that specified the subcause of birth defect were included in estimation of both the parent congenital anomalies model as well as in subtype-specific models. For GBD 2019, data exclusions were limited. The majority of VA data were outliered in those over 5 years old as the age patterns were unreliable and led to poor model performance in the under-5 age groups. We also excluded some data sources from the parent model where only a subset of subcauses were specified (e.g., congenital heart disease, neural tube defects, and other congenital anomalies) and the sum of the subcauses clearly represented systematic underreporting of one of the subcauses. Systematic underreporting was suspected when sex-and age-specific rates were more than an order of magnitude lower than neighbouring or comparable locations. Data sources for those locations were still included by default for subcause specific models because underreporting of the total was not assumed to necessarily be associated with underreporting of all of the component conditions. All types of congenital anomalies were estimated using cause of death ensemble modelling (CODEm) for GBD 2019, as was done for previous iterations of the GBD study. Specific causes included neural tube defects, congenital heart anomalies, orofacial clefts, Down syndrome, other chromosomal anomalies, congenital musculoskeletal anomalies, urogenital congenital anomalies, digestive congenital anomalies, and other congenital birth defects. We assumed no mortality from either Klinefelter syndrome or Turner syndrome, for which we model nonfatal outcomes only. For GBD 2019, we modelled congenital anomalies as a cause of death for ages 0-69 years only, assuming that all mortality from congenital conditions occurs before age 70 years of age. For GBD 2016, we added three new causes to the congenital anomalies: congenital musculoskeletal and limb anomalies; urogenital congenital anomalies; and digestive congenital anomalies. We made no additions to the causes of congenital anomalies for GBD 2017 or 2019. Vital registration and surveillance data were used to model rheumatic heart disease. We outliered ICD8 and ICD9 BTL datapoints which were inconsistent with the rest of the data and created implausible time trends. We also outliered datapoints which were too high after the redistribution process in a number of age groups. In addition, we outliered verbal autopsy datapoints in Nepal and Pakistan which created an implausibly low cause fraction. We used a standard CODEm approach to model deaths from rheumatic heart disease. There have been no substantive changes from the approach used in GBD 2017, including any covariate changes. Vital registration data were used to model deaths due to cardiomyopathy and myocarditis. We outliered data points in Central Asia, Central Europe, and Eastern Europe due to implausibly high values which we attributed to variation in local coding practices. We also outliered ICD8 and ICD9BTL data points in countries where they were discontinuous with other data in the time series or were implausibly high or low. Additionally, we outliered ICD10 data points in Grenada that were improbably low and causing inconsistencies in the time pattern. We used a standard CODEm approach to model deaths from cardiomyopathy and myocarditis. The covariates selected for inclusion in the CODEm modelling process can be found in the table below. A select few changes were made to the covariates as compared with GBD 2017. We dropped the alcohol (litres per capita) covariate as exploratory analyses indicated that it was not predictive of the outcome. We also changed the directions of the socio-demographic index covariate and lag distributed income (per capita) covariate from 0 to -1 to reflect our a priori hypotheses about the relationships of these covariates with mortality risk from cardiomyopathy and myocarditis. Aside from these covariate changes, there have been no substantive changes to the modelling strategy since GBD 2017. Vital registration data were used to model deaths due to myocarditis. We used a standard CODEm approach to model deaths from myocarditis. The covariates selected for evaluation in the CODEm ensemble modelling process can be found in the table below. We changed the direction on the lag distributed income per capita and socio-demographic index covariates from 0 for both to -1 and 1, respectively, to reflect our a priori hypotheses regarding these associations. Aside from these changes, there have been no substantive changes to the modelling strategy since GBD 2017. (mm Hg) none 1 1 Healthcare access and quality index none 2 -1 Lag distributed income per capita (I$) log 3 -1 Socio-demographic Index none 3 1 Other cardiomyopathy Vital registration data were used to model deaths due to other cardiomyopathy. We outliered datapoints in Central Asia and Central and Eastern Europe due to implausibly high values which we attributed to variation in local coding practices after review with experts. We used a standard CODEm approach to model deaths from other cardiomyopathy. The covariates selected for inclusion in the CODEm modelling process can be found in the table below. We changed the directions of the Socio-demographic Index and lag distributed income per capita covariates from 0 for both to 1 and -1, respectively. Aside from these covariate changes, there have been no substantive changes to the modelling process since GBD 2017. Vital registration data were used to model deaths due to alcoholic cardiomyopathy. We outliered ICD9 data points in Cyprus that were implausibly high and discontinuous with the rest of the time series. We also dropped ICD9BTL data points in locations in Central and Eastern Europe where we were unable to disaggregate them appropriately. Additionally, we outliered tabulated ICD10 data points in locations where unreliable estimates caused an abrupt inconsistency with detailed ICD10 data. We used a standard CODEm approach to model deaths from alcoholic cardiomyopathy. The covariates selected for inclusion in the CODEm modelling process can be found in the table below. For GBD 2019, we dropped the covariate on socio-demographic index as exploratory analyses indicated that it was not predictive of the outcome. Additionally, we changed the direction of the lag distributed income per capita covariate from 0 to -1 to reflect our a priori hypothesis about the expected relationship between this covariate and deaths from alcoholic cardiomyopathy. Aside from these covariate changes, there have been no substantive changes from the approach used in GBD 2017. Vital registration (VR) data: We outliered ICD8 and ICD9 data points that were discontinuous from other data in the time series and created an unlikely time trend. We also outliered data points that were implausibly low in multiple age groups. In order to address changes in coding practices for atrial fibrillation, we used an integrated approach that combined DisMod-MR 2.1 and CODEm models to estimate deaths from atrial fibrillation and flutter. This approach allowed us to adjust estimates to more accurately reflect the number of deaths for which atrial fibrillation was the true underlying cause of death. Due to the restrictions of the decomposition analysis implemented for GBD 2019, we utilized the CSMR from the final GBD 2017 DisMod-MR 2.1 model to inform the misdiagnosis correction described below. The modelling steps are illustrated in the above flowchart. Covariates included in both the DisMod-MR 2.1 and CODEm models can be found in the table below. In Step 1, we estimated deaths for atrial fibrillation using a standard CODEm approach. In Step 2, we estimated prevalence rates in DisMod-MR 2.1 using data from published reports of cross-sectional and cohort surveys, as well as primary care facility data. We also used claims data covering inpatient and outpatient visits for the United States along with inpatient hospital data from 163 locations in 15 countries. Inpatient hospital data were adjusted using age-and sex-specific information for: 1) readmission within one year; 2) primary diagnosis code to secondary codes; and, 3) the ratio of inpatient to outpatient visits. We set priors of no remission and no excess mortality prior to age 30. Step 3, we calculated the excess mortality rate (EMR) for 2017 (defined as the cause-specific mortality rate (CSMR) estimated from CODEm divided by the prevalence rate from DisMod-MR 2.1). We then selected 17 countries based on four conditions: 1) ranking of 4 or 5 stars on the newly developed system for assessing the quality of VR data; 2) prevalence data available from the literature were included in the DisMod-MR 2.1 estimation; 3) prevalence rate ≥ 0.005; and, 4) CSMR ≥ 0.00002. Using information from these countries as input data, we ran a linear mixed-effects regression of logEMR on sex, age, and location. Sex and age were treated as fixed effects for the regression, while location was considered a random effect. We then predicted age-and sex-specific EMR using the results of this regression for all non-selected countries. Countries included in the regression were assigned their directly calculated values. These EMR data points were assigned to the time period 1990-2017 and uploaded into the nonfatal database in order to be used in modelling. Step 4, we reran DisMod-MR 2.1 including the EMR estimated in Step 3 as input data using the same priors as in Step 2 to obtain CSMR estimates from DisMod-MR 2.1 that are consistent with the available data for incidence and prevalence. As DisMod-MR 2.1 only generates estimates for six years (1990, 1995, 2000, 2005, 2010 , 2017), we interpolated using a log-linear approach for 1990-2017. Estimates for 1980-1990 were generated via regression on the entire time series, using sociodemographic index as a predictor. Step 5, the CSMR estimates were divided by the all-cause mortality estimates used in DisMod-MR 2.1 to calculate the cause fraction for atrial fibrillation and flutter. We then calculated the difference between the cause fraction estimated by DisMod-MR 2.1 and the cause fraction in the VR data generated by the Cause of Death data preparation process. This yielded the cause fraction that would need to be retrieved from other causes via the process described in Section 2.6: Correction for miscoding of Alzheimer's and other dementias and Parkinson's disease. After this correction process, the cause fraction data are processed through the standard redistribution and noise reduction processes. In Step 6, these adjusted cause fraction data are then used as inputs for a final CODEm model, using the covariates described below. The results from the CODEm model are processed through CoDCorrect; these post-CoDCorrected results are the final estimates for cause-specific mortality for atrial fibrillation and flutter. We used a standard CODEm approach to model deaths from ischemic heart disease. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from ischaemic heart disease. We changed the direction of the alcohol variable from 0 to 1 to reflect our a priori hypothesis about the expected direction of the association between this risk factor and mortality risk of ischaemic heart disease. In addition, we changed the level of the covariate for trans fatty acid from 1 to 3. Besides these covariate changes, there are no other substantive changes from the approach used in GBD 2017. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from atrial fibrillation. In addition, the dietary covariate for whole grains (kcal/capita, adjusted) was dropped as exploratory analyses indicated that it was not associated with mortality risk. The direction for the alcohol and socio-demographic index covariates was changed from 0 to 1 to reflect our a priori hypotheses about the expected directions of the associations between these covariates and mortality risk of atrial fibrillation. Besides these covariate changes, there are no other substantive changes from the approach used in GBD 2017. Vital registration data were used to model cause-specific mortality for aortic aneurysm. We outliered data in Oman as they were improbably high in comparison with the rest of the region. We also outliered ICD8 data that were discontinuous with the rest of the time series and created implausible time trends. In addition, we outliered a subset of vital registration data points in Latin America due to implausibly high values at the oldest age groups that resulted in inconsistencies in time trends. We used a standard CODEm approach to model deaths from aortic aneurysm. The covariates selected for inclusion in the CODEm modelling process can be found in the table below. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from aortic aneurysm. We also changed the direction of the covariates for alcohol consumption and the socio-demographic index from 0 to 1. Besides these covariate changes, there are no other substantive changes from the approach used in GBD 2017. Non-rheumatic valvular heart disease: Non-rheumatic calcific aortic valve disease, non-rheumatic degenerative mitral valve disease, and other non-rheumatic valvular heart diseases Vital registration data were used to model non-rheumatic valvular heart disease, non-rheumatic calcific valve disease, non-rheumatic degenerative mitral valve disease, and other non-rheumatic valve diseases. We outliered ICD8, ICD9BTL, and tabulated ICD10 datapoints which were inconsistent with the rest of the data and created implausible time trends. Datapoints from sources which were implausibly low in all age groups and datapoints that were causing the regional estimates to be improbably high were outliered. We used a standard CODEm approach to model deaths from non-rheumatic valvular heart disease, nonrheumatic calcific valve disease, non-rheumatic degenerative mitral valve disease, and other nonrheumatic valvular diseases. The covariates used in the GBD 2019 models, along with their transformations, importance levels, and imposed directions are reported by cause in the tables below. For non-rheumatic valvular heart disease and non-rheumatic calcific aortic valve disease, we added the appropriate summary exposure value, setting both the direction and level to 1. We changed the direction of the Socio-demographic Index covariate from 0 to 1; this change affected the non-rheumatic valve disease, non-rheumatic calcific aortic valve disease, and non-rheumatic degenerative mitral valve disease models. We also changed the direction of the alcohol consumption variable from 0 to 1; this update affected the non-rheumatic valvular heart disease and calcific aortic valve disease models. All covariates for the other non-rheumatic valvular heart disease model were changed. In GBD 2017, we had included only the summary exposure value for cardiovascular diseases in the model. For GBD 2019, we updated the model to include the summary exposure value for non-rheumatic valvular heart disease (level 1, direction 1), Healthcare Access and Quality Index (level 1, direction -1), and Socio-demographic Index (level 2, direction -1). Vital registration data were used to model endocarditis. We outliered data in Mozambique as these were non-representative for sub-Saharan Africa and were causing regional estimates to be implausibly low. We also outliered ICD8 data that were discontinuous from the rest of the data series and created an implausible time trend. We used a standard CODEm approach to model deaths from endocarditis. Covariates selected for inclusion in the CODEm ensemble modelling process are listed in the table below. For GBD 2019, the same covariates as GBD 2017 were used. We changed the level of the healthcare access and quality index covariate from 1 to 2 for consistency with our a priori hypothesis about the relative impact of the covariate on mortality from endocarditis. We also changed the direction of the socio-demographic index covariate from 0 to -1. Apart from these updates to the covariates, there have been no substantive changes from the approach used in GBD 2016. Vital registration data were used to model peripheral artery disease. We outliered all datapoints with less than 1 death in Egypt per expert review. We used a standard CODEm approach to model deaths from peripheral artery disease. For GBD 2019, adjusted dietary covariates for consumption of fruits, omega-3 fatty acids, vegetables, nuts and seeds, and polyunsaturated fatty acids were replaced with the summary exposure value scalars for diet low in each of these factors. The direction for each dietary covariate was changed from -1 to 1 to as our a priori assumption is that low levels of intake of these dietary factors are associated with increasing mortality risk from peripheral arterial disease. In addition, we dropped the dietary covariates for whole grains (kcal/capita, adjusted) and trans fatty acid (percent). We changed the direction of the alcohol and the Socio-demographic Index covariates from 0 to 1 to reflect the expected direction of the association for these risk factors with mortality risk. Apart from these changes, there are no substantive changes from the approach used in GBD 2017. Case definitions: 1) Acute myocardial infarction (MI): Definite and possible MI according to the third universal definition of myocardial infarction: a. When there is clinical evidence of myocardial necrosis in a clinical setting consistent with myocardial ischaemia or b. Detection of a rise and/or fall of cardiac biomarker values and with at least one of the following: i) symptoms of ischaemia, ii) new or presumed new ST-segment-T wave changes or new left bundle branch block, iii) development of pathological Q waves in the ECG, iv) imaging evidence of new loss of viable myocardium or new regional wall motion abnormality, or v) identification of an intracoronary thrombus by angiography or autopsy. c. Sudden (abrupt) unexplained cardiac death, involving cardiac arrest or no evidence of a non-coronary cause of death d. Prevalent MI is considered to last from the onset of the event to 28 days after the event and is divided into an acute phase (0-2 days) and subacute (3-28 days). 2) Chronic IHD a. Angina; clinically diagnosed stable exertional angina pectoris or definite angina pectoris according to the Rose Angina Questionnaire, physician diagnosis, or taking nitrate medication for the relief of chest pain. b. Asymptomatic ischaemic heart disease following myocardial infarction; survival to 28 days following incident MI. The GBD study does not use estimates based on ECG evidence for prior MI, due to its limited specificity and sensitivity (1). ICD codes used for inclusion of hospital and claims data for MI and angina can be found elsewhere in the appendix. The total source counts for non-fatal ischaemic heart disease are shown in the table below by measure. Myocardial infarction A systematic review was done for myocardial infarction for GBD 2019 in order to update our current database. The search strings used were (("myocardial infarction"[tiab] AND (incidence OR "case fatality" OR "excess mortality")) OR ("acute coronary syndrome"[tiab] AND (incidence OR "case fatality" OR "excess mortality")) OR (angina[tiab] AND (incidence OR prevalence OR "case fatality" OR "excess mortality"))) AND ( The last systematic review for myocardial infarction was done for GBD 2015. The dates of the search were 1/1/2009 -2/3/2015. 38,522 studies were returned; 194 were extracted (this number includes extractions that were done for STEMI/NSTEMI models and revascularisation models that are not currently part of the MI modelling process but may be in the future). A systematic review for myocardial infarction was also done for GBD 2013. The extensive search terms for that review will be provided on request. Apart from inpatient hospital and inpatient claims data, we did not include any data from sources other than the literature for myocardial infarction. We also split excess mortality data points where the age range was greater than 25 years. Age splitting was based on the global sex-specific age pattern from a Dismod model that only used excess mortality input data from scientific literature with less than a 25year age range. We excluded incidence data with broad age ranges where it was impossible to obtain more granular data, as these data caused the known age pattern for increased risk of myocardial infarction to be masked in the estimates generated from DisMod. We crosswalked incidence measurements for myocardial infarction literature data with alternative definitions to agree with our case reference definition using MR-BRT (Meta Regression -Bayesian, Regularized, Trimmed) modeling tool. MR-BRT and the process of data adjustment are discussed elsewhere in the appendix. For myocardial infarction we crosswalked using multiple different covariates: a covariate to capture only first-ever MI, using studies where all events were included as the reference; a covariate to adjust estimates from studies that only included non-fatal cases, using sources that included fatal and non-fatal cases as reference; and a covariate to adjust for studies that did not use troponin measurements in their case diagnosis, using sources that did include troponin measurements in their diagnostic method. The coefficients in Table 2 below can be used to calculate adjustment factors for alternative definitions. The formula for computing adjustment factors is given in equation 1 below. We also included a standardized age variable (age scaled) and a sex variable to the regression to adjust for the possibly of bias. We included survey data (including NHANES and World Health Study questionnaires) which included the RAQ items. Prevalence of angina was calculated using the standard algorithm to determine whether the RAQ was positive or negative. We excluded data with broad age ranges where it was impossible to obtain more granular data, as these data caused the known age pattern for increased risk of angina to be masked in the estimates generated from DisMod. We also included US claims data, but did not include inpatient hospital data from any locations. Stable angina (unstable angina is modeled as part of MI) is expected to be rare in inpatient but common in outpatient data as it is a condition usually managed on an outpatient basis, except for specific surgical interventions. This discrepancy leads to implausible correction factors based on inpatient/outpatient information from claims data (~150X); thus adjusted data cannot be used. Including uncorrected data in the model is likely to lead to incorrect estimates as hospitalisation and procedure rates are likely to vary between geographies based on access to and patterns of care. All outpatient data were excluded as they were implausibly low for all locations when compared with literature and claims data. We crosswalked prevalence data obtained from survey data using the RAQ using claims data as a reference since the RAQ has been shown to be neither sensitive nor specific. Specifics on the crosswalking process are discussed elsewhere in the appendix. Table 2b shows the coefficients adjustments made to the alternative definition. Acute myocardial infarction was split into two severity levels by length of time since the event -days 1 and 2 versus days 3 through 28. Disability weights were established for these two severities using the standard approach for GBD 2019. Asymptomatic ischaemic heart disease following myocardial infarction was all assigned to the asymptomatic severity level. No disability weight is assigned to this level. Angina was split into asymptomatic, mild, moderate, and severe groups using information from MEPS. Disability weights were established for these severities using the standard approach for GBD 2019. Gets short of breath after heavy physical activity, and tires easily, but has no problems when at rest. The person has to take medication every day and has some anxiety. Asymptomatic ischaemic heart disease following myocardial infarction Myocardial infarction  We first calculated custom cause-specific mortality estimates using cause of death data prior to garbage code redistribution, generating age-sex-country-specific proportions of IHD deaths that were due to MI (acute IHD) versus those due to other causes of IHD (chronic IHD). Estimates of this proportion for all locations were then generated using a DisMod proportion-only model. Due to a high degree of variability in pre-redistribution coding practices by location, we used the global age-, sex-, and year-specific proportions of acute deaths in subsequent calculations. The global proportions were multiplied by post-Fauxcorrect (final GBD 2019 CoD estimates with GBD 2017 scalers) IHD deaths by location to generate CSMR estimates for MI. These data, along with incidence and excess mortality data, informed a DisMod model to estimate the prevalence and incidence of myocardial infarction due to ischaemic heart disease.  These estimates were split into estimates for days 1-2 and days 3-28 post-event. Disability weights were assigned to each of these two groupings.  We set a value prior of one month for remission (11/13) from the MI model. We also set a value prior for the maximum excess mortality rate of 10 for all ages. We included the Healthcare Access and Quality (HAQ) Index as a fixed-effect country-level covariate on excess mortality, forcing an inverse relationship. Asymptomatic ischaemic heart disease  Excess mortality estimates from the myocardial infarction model were used to generate data of the incidence of surviving 28 days post-event.  We used these data, along with the estimates of CSMR due to chronic IHD (the other part of the proportion described in step 1) and excess mortality data in a DisMod model to estimate the prevalence of persons with IHD following myocardial infarction. This estimate included subjects with angina and heart failure; a proportion of this prevalence was removed in order to avoid doublecounting based on evidence from the literature (2) . The result of this step generates estimates of asymptomatic ischaemic heart disease following myocardial infarction.  We set a value prior of 0 for remission for all ages.  We also included the log-transformed, age-standardised SEV scalar for IHD as a fixed effect, countrylevel covariate on prevalence and LDI (I$ per capita) as a fixed-effect country-level covariate on excess mortality, forcing an inverse relationship for LDI. Angina  We used prevalence data from the literature and USA claims databases, along with data on mortality risk to estimate the prevalence and incidence of angina for all locations. Data which used the Rose Angina Questionnaire to determine prevalence of angina was adjusted using MR-BRT as described above.  The proportion of mild, moderate, and severe angina was determined by the standard approach for severity splitting for GBD 2019.  We included a value prior of 0 for remission for all ages. We also included a value prior of 1 for excess mortality for all ages.  We also included the log-transformed, age-standardised SEV scalar for IHD as a fixed effect, countrylevel covariate on prevalence and LDI (I$ per capita) as a fixed effect, country-level covariate on excess mortality, forcing an inverse relationship LDI. Stroke was defined according to WHO criteria -rapidly developing clinical signs of focal (at times global) disturbance of cerebral function lasting more than 24 hours or leading to death with no apparent cause other than that of vascular origin (1). Data on transient ischaemic attack (TIA) were not included. Acute stroke: Stroke cases are considered acute from the day of incidence of a first-ever stroke through day 28 following the event. Chronic stroke: Stroke cases are considered chronic beginning 28 days following the occurrence of an event. Chronic stroke includes the sequelae of an acute stroke AND all recurrent stroke events. GBD 2015 adopts this broader definition of chronic stroke than was used in prior iterations in order to model acute strokes using only first-ever incident events. Ischaemic stroke: an episode of neurological dysfunction caused by focal cerebral, spinal, or retinal infarction Intracerebral haemorrhage: a focal collection of blood within the brain parenchyma or ventricular system that is not caused by trauma Subarachnoid haemorrhage: bleeding into the subarachnoid space (the space between the arachnoid membrane and the pia mater of the brain or spinal cord) ICD codes used for inclusion of hospital and claims data can be found elsewhere in the appendix. Tables 1a, 1b, and 1c display source count information for non-fatal ischaemic stroke, intracerebral haemorrhage, and subarachnoid haemorrhage respectively. Excess mortality rate 88 28 A systematic review was not performed for GBD 2019. However, a systematic review was performed for GBD 2017. Search terms, dates of search, and databases queried follow: 1) Ischaemic stroke a. Google scholar: ("ischemic stroke" OR "cerebral infarction" OR "ischaemic stroke") AND (incidence OR prevalence OR mortality OR epidemiology). Reviewed first 1000 hits, sorted by relevance b. Global Index Medicus search: (tw:("ischemic stroke") OR tw:("cerebral infarction" OR tw:("ischaemic stroke")) AND (tw:(incidence) OR tw:(prevalence) OR tw:(mortality) OR tw:(epidemiology)) AND NOT (tw:(rats) OR tw:(mice) OR tw:(dogs) OR tw:(apes) OR tw:(monkeys)). Dates of search: 01Jan2010 -31Aug2017 2) Intracerebral haemorrhage a. Google scholar: ("hemorrhagic stroke" OR "intracerebral hemorrhage" OR "haemorrhagic stroke" OR "intracerebral haemorrhage") AND (incidence OR prevalence OR mortality OR epidemiology). Reviewed first 1000 hits, sorted by relevance b. GIM search: (tw:("intracerebral hemorrhage") OR tw:("intracerebral haemorrhage") OR tw:("hemorrhagic stroke") OR tw:("haemorrhagic stroke")) AND (tw:(incidence) OR tw:(prevalence) OR tw:(mortality) OR tw:(epidemiology)) AND NOT (tw:(rats) OR tw:(mice) OR tw:(dogs) OR tw:(apes) OR tw:(monkeys)). Dates of search: 01Jan2010 -31Aug2017 3) Subarachnoid haemorrhage a. Google scholar search: ("subarachnoid hemorrhage" OR "subarachnoid haemorrhage") AND (incidence OR prevalence OR mortality OR epidemiology). Reviewed first 1000 hits, sorted by relevance. b. GIM search: (tw:("subarachnoid hemorrhage") OR tw:("subarachnoid haemorrhage")) AND (tw:(incidence) OR tw:(prevalence) OR tw:(mortality) OR tw:(epidemiology)) AND NOT (tw:(rats) OR tw:(mice) OR tw:(dogs) OR tw:(apes) OR tw:(monkeys)). Dates of search: 01Jan2010 -31Aug2017 We included inpatient hospital data, adjusted for readmission and primary to any diagnosis using correction factors estimated from US claims data. We excluded data for locations where the data points were implausibly low (Vietnam, Philippines, India). In addition, we included unpublished stroke registry data for acute ischaemic stroke, acute intracerebral haemorrhage, and acute subarachnoid haemorrhage. We also included survey data for chronic stroke. These surveys were identified based on expert opinion and review of major survey series focused on world health that included questions regarding self-reported history of stroke. For GBD 2019, we split unspecified strokes (ICD-10 I64) into ischaemic stroke, intracerebral haemorrhage, and subarachnoid haemorrhage according to the proportions of subtype-specific coded strokes in the original data. We also split ICD-10 I62 into intracerebral haemorrhage, and subarachnoid haemorrhage using the same approach. As with many models in GBD, the diversity of data sources available means that we needed to adjust available data to our reference case definition. We thus crosswalked incidence and excess mortality data that did not meet our reference case definitions using MR-BRT, a Bayesian meta-regression tool develop for the GBD. More information on MR-BRT can be found elsewhere in the appendix. We adjusted data points for first and recurrent strokes combined, using data for first strokes only as reference. For ischaemic stroke and intracerebral haemorrhage, we also adjusted data points that reported all stroke subtypes combined, using as reference studies with subtype-specific information. We also adjusted data which included only persons who survived to hospital admission, using as reference data on both fatal and nonfatal strokes. In addition, we adjusted subtype-specific, inpatient clinical informatics data using subtype-specific literature estimates as a reference. These adjustments can be examined more closely in Table 2 . The coefficients in Tables 2a, 2b , and 2c below can be used to calculate adjustment factors for alternative definitions. The formula for computing adjustment factors is given in equation 1 below. We also included a standardized age variable (age scaled) and a sex variable to the crosswalking procedure to adjust for the possibly of bias. No data adjustments were necessary for the chronic stroke models. The table below illustrates the severity level, lay description, and disability weights for GBD 2019. In previous iterations of GBD, severity splits for stroke were based on the standard approach described elsewhere (3) . For GBD 2016, we undertook a review to identify epidemiologic literature which reported the degree of disability at 28 days (for acute stroke) or one year (for chronic stroke) using the modified Rankin scale (mRS) and the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA). The mRS assesses functional capabilities, while the MMSE and MoCA tests provide evaluations of cognitive functioning. We then mapped these measures to the existing GBD categories as indicated below. This approach allowed us to include location-specific information and can be updated as more data on functional or cognitive status become available. Table 4 : Data input counts for the estimation process for the custom severity splits. Site-years (total) 9 16 Number of countries with data 6 13 Number of GBD regions with data (out of 21 regions) 6 7 Number of GBD super-regions with data (out of 7 super-regions) 4 5 We used DisMod-MR, a Bayesian meta-regression tool, to model the six severity levels, with an independent proportion model for each. Reports which grouped mRS scores differently than our mapping (eg, 0-2) were adjusted in DisMod by estimating the association between these alternate groupings and our preferred mappings. These statistical associations were used to adjust data points to the referent category as necessary. The six models were scaled such that the sum of the proportions for all levels equaled 1. The general approach employed for all of the components of the stroke modelling process is detailed in the table below. o Data points were adjusted from alternative to reference case definitions using estimates from statistical models generated by MR-BRT (discussed elsewhere in the appendix) for the acute models. Coefficients for these crosswalks can be found in Table 2a , 2b, and 2c. o The GBD summary exposure values (SEV), which are the relative risk-weighted prevalence of exposure, were included as covariates for the ischaemic stroke or intracerebral haemorrhage models as appropriate, and a covariate for country income was used as a country-level covariate for both models (4). Subarachnoid haemorrhage did not included an SEV covariate, but did include a covariate for country income for excess mortality. Coefficients for these covariates can be found in Table 5a , 5b, 5c for fixed effects located below. o We used the ratio of acute:chronic cause-specific mortality estimated by the final GBD 2017 dismod model estimates to divide GBD 2019 stroke deaths into acute and chronic stroke deaths, using the global average for the proportion of acute:chronic stroke mortality. The acute and chronic models were then run using the same incidence, prevalence, and case fatality data as well as the custom cause-specific mortality rates as input data. o We ran the first-ever acute subtype-specific models with CSMR as derived from FauxCorrect and epidemiological data as described above using Dismod-MR. o We then calculated the rate of surviving until 28 days after an acute event for all three subtypes using the modelled estimates of excess mortality and incidence from the acute stroke models. o Twenty-eight-day survivorship data was uploaded into the chronic subtype-specific with CSMR models. These chronic models also use CSMR as derived from FauxCorrect and epidemiological data as described above. Models were evaluated based on expert opinion, comparison with previous iterations, and model fit. Table 5a , 5b, 5c below indicate the covariates used by cause in the estimation process, as well as the beta and exponentiated beta values. We have estimated the prevalence and associated disability of the following categories of congenital birth defects (those in bold are GBD causes): This appendix will first describe the input data sources and aspects of the modelling strategy that are common to all sub-types of congenital anomalies. We will then provide a description of the case definitions, ICD-10 codes, and health states associated with each of the component congenital causes, as well as the specific modelling strategies employed in each congenital cause, including the model settings, study-level and country-level covariates, and other modelling decisions made. congenital heart anomalies, neural tube defects, orofacial clefts, urogenital congenital anomalies, digestive congenital anomalies, congenital musculoskeletal and limb anomalies, and other chromosomal abnormalities The GBD case definition of congenital anomalies includes any condition present at birth that is a result of abnormalities of embryonic development, excluding those that are directly the result of infections or substance abuse (e.g. fetal alcohol syndrome, congenital syphilis) modeled elsewhere in GBD and excludes minor anomalies as they are defined by EUROCAT. Several types of data sources are used in the estimation of congenital anomalies: literature prevalence, with-condition mortality and excess mortality data, birth prevalence and neonatal with-condition mortality data from a number of international birth defects registries and surveillance systems, inpatient hospital and Marketscan claims data prepared internally by the GBD research team, and cause-specific mortality estimates produced by the causes of death analysis. Second, we used ionpatient hospital and claims data (from USA, Taiwan, and Singapore) for all congenital anomalies causes and sub-cause models. These data were prepared centrally by the clinical informatics research team and is described in detail in the Clinical Informatics section of this appendix. Four rounds of data bias correction were employed in the processing of clinical data. This included 1) adjustment for readmission, 2) correction of primary diagnoses to all diagnoses, 3) adjustment for inpatient-to-outpatient ratio, and 4) adjustment based on Healthcare Access and Quality Index (HAQI). Of note, in GBD 2017 we used congenital birth defects data only using the first two corrections, but changed in GBD 2019 to use clinical data that had all four corrections applied. This change was factilitated by improvements in analysis of corrections by the clinical informatics team and was a change made across GBD. Of note, we also changed the mapping of club foot and hip dysplasia in GBD 2019. Previously they were mapped to "limb reduction defects," but in preparatioin for disaggregated models (which is planned for the next time they are estimated in GBD), they are now included only in the total for musculoskeletal birth defects. Third, we included data from a systematic review of the available literature for all types of congenital birth defects that was completed in GBD 2015 by constructing search strings designed to capture information on the prevalence, associated mortality and long-term health outcomes associated with each sub-category of congenital anomalies. All results were screened -first abstracts, then full-text screenings -to ensure the availability of required information and the representativeness of the reported population, and the exclusion of duplicate data also reported as part of the birth registry data inputs. Data processing Any data that was not sex-specific or did not fit entirely within GBD age-groups were age-and sex-split to fit these groups prior to modeling using empirical age-and sex-patterns derived from previous DisMod-MR 2.1 models of the same condition. This is a change from GBD 2017 when age-and sexsplitting of data was not completed prior to modeling and had a substantial effect on the magnitude of estimates in those causes for which cause-specific mortality rate (CSMR) data was used in modeling. This is described further below. A number of the input data sources used for the estimation of congenital birth defects are known to have biases leading to under-reporting or over-reporting relative to the true prevalence of congenital anomalies among live births and all subsequent age groups. We used Meta Regression -Bayesian, Regularised Trimmed (MR-BRT) to develop statistical models that were used to adjust non-reference data. The alternate definitions that were crosswalked are described below. The specifics of each MR-BRT crosswalk are shown in the corresponding cause-specific sections. Live/Stillbirths: Where necessary, we used a crosswalk to adjust for the inclusion of stillbirths in the reported birth prevalence estimates in literature and registry data sources, as stillbirths are not included in our case definition of prevalence among live births. Each of these crosswalks used a spline on logtransformed neonatal mortality rate. Exclusion of chromosomal conditions: Some sources report birth defects on in isolation (i.e. excluding any persons who have a coexisting genetic or chromosomal disorder). Our reference definition is the inclusion of chromosomal diagnoses. No splines were used in these crosswalks. Registry to total: For a subset of congenital causes, particularly the congenital heart defects, we noted substantial differences in the lists of case definitions being reported to the various congenital registries. Across all types of congenital heart defects, the National Birth Defects Prevention Network (NBDPN) had the most complete list of reported case definitions -i.e. the highest case ascertainment -and was considered the gold standard among all birth registry data sources. We used registry-specific crosswalks to adjust all other birth defects registries to match the case ascertainment seen in the NBDPN. No splines were used in these crosswalks. Underreporting of congenital birth defects is common and can vary by source, location, year, sex, and age. In order to have an empirical, systematic approach to outliering of data, we adapted the non-zero floor approach used by the GBD cause-specific mortatlity analysis. After all age-sex splitting and crosswalking was complete, the first step was to calculate median absolute deviation (MAD) for the age group of birth, where registry and literature data were combined with all clinical data for the early neonatal age group (0 to 6 days). The thresholds chosen were -0.5 MAD and +3 MAD with any data outside of these bounds being identified as outliers. This was determined based on the right skewed distribution observed in most of the congenital data and the expert prior that underreporting is far more prevalent than overreporting -and therefore the bias is asymmetric. In any case where the lower MAD bound was negative, we used a threshold of 0. For most models, we calculated the MADs using only the EUROCAT data, which we found to be the most reliable source for prevalence of congenital disorders. Exceptions were neural tube defects (all data sources), Urinary birth defects (EUROCAT and USA claims data), musculoskeletal defects (only USA claims data), and chromosomal anomalies, which differed by condition given the high volume of zeroes in the data. For Down Syndrome, we used all data. For Edward Syndrome and Patau Syndrome, we used all non-zero EUROCAT data. For Turner and Klinefelter syndrome, we used EUROCAT data and logged mean absolute deviation and exponentiated this to determine bounds for these data. To evaluate data for older age groups, we employed two approaches. First, we outliered data from any location-year-source that was outliered for the first stage MAD algorithm. Second, using all clinical and literature data, we developed a model with fixed effects by age to estimated implied MAD bounds for each non-zero age group and again applied the same thresholds of -0.5 MAD and +3 MAD. Overview All available input data was utilized in a series DisMod-MR 2.1 models in order to estimate the prevalence of each category of congenital anomalies across the full life course for each location/age/sex combination. Incidence was set to 0 for all congenital models, as congenital conditions occur at the time of birth and by GBD case definition, congenital cases do not occur after birth. Remission was allowed only in the models of a select subset of causes for which surgical intervention or spontaneous remission can completely eliminate the disability due to that congenital condition. Cause-specific priors and slope priors were used to guide biologically plausible DisMod-MR 2.1 estimates of excess mortality and remission where applicable. For most of the congenital birth defects causes, we ran DisMod-MR 2.1 models of all defects combined (termed "parent" models). This allowed us to use data on all anomalies within each cause as well as to leverage cause-specific mortality rate (CSMR) results from the GBD cause of death (COD) analysis. When CSMR data is used as an input, DisMod-MR 2.1 pairs each CSMR datum with a mathing prevalence data point by age, sex, location, and year. After matching, CSMR is divided by prevalence to calculate an implied excess-mortality rate (EMR) datum. All EMR data is then used in driving the model. Of note, EMR data is not calculated when prevalence data is of broader than GBD age groups or is for both sexes combined. We used CSMR as input to all of the models except congenital heart disease, chromosomal anomalies, digestive anomalies, musculoskeletal birth defects, and urogenital congenital anomalies. For congenital heart defects, the reason is that excess mortalilty would be underestimated in older ages if CSMR results are used because despite continuing higher rates of mortality through adolescence and adulthood, many of these deaths are not coded as being due to congenital heart disease. Similarly, musculoskeletal and gastrointestinal anomalies estimates for CSMR in older children, adolescents, and adults are much lower than would be suggested by cohort and cross-sectional studies of survival as few of these deaths are coded as being due to the congenital birth defect present. Finally, for urogenital congenital anomalies, in addition to our modeling urinary and genital anomalies separately, the mechanism of death in older ages will typically be via development of chronic kidney disease and these deaths are classified in GBD as being due to chronic kidney disease due to other conditions. Details are in each cause-specific section below. Location-level covariates were used in each of the congenital DisMod-MR 2.1 models based on published information about the risk factors for these birth defects. Folic acid availability was used as a covariate on prevalence for all neural tube defects models and a subset of the congenital musculoskeletal anomalies models. A folic acid fortification covariate was used in the neural tube defects and cleft models, which was modelled based on data from the Global Fortification Data Exchange. The legality of abortion was used as a covariate on prevalence for conditions in which prenatal diagnosis is commonly available and the prognosis is severe enough to cause high rate of termination of pregnancy following prenatal diagnosis: these include all chromosomal conditions and a subset of the congenital heart defects. Maternal consumption of alcohol during pregnancy, as a proportion of all pregnancies, was used as a covariate on prevalence for all congenital heart defects. The proportion of live births by mothers age 35+ was used as a covariate on all chromosomal models. Across many of the congenital models, the Health Access and Quality Index (HAQI) covariate was used to guide the global pattern of with-condition mortality and excess mortality, as was the natural log of the lagdistributed income per capita (LN-LDI). For most of the severe congenital conditions, the mortality associated with the condition is highly dependent on access to adequate surgical interventions and other medical care during the first hours, weeks, and years of life. For those causes with a parent model (neural tube defects, we then squeezed the sum of the specific sub-cause prevalence estimates to these total prevalence estimates in order to ensure internal consistency of our cause-level and sub-cause estimates. The prevalence of other heart, musculoskeletal, and gastrointestinal anomalies was derived by reducing the total envelope model for each cause by its sub-causes to derive the difference that was attributable to other anomalies in that category. Assigning health states and sequelae for long-term outcomes To determine the distribution of health outcomes associated with the congenital causes, we performed a review of available literature on the long-term health outcomes of survivors in cohorts born with each type of congenital malformation. For conditions requiring surgical intervention shortly after birth to ensure survival, the health states included in the disability weight calculations correspond to the postsurgery outcomes reported in cohorts of individuals born with these life-threatening congenital conditions. Where data was available from multiple cohorts, we pooled these cohorts together to calculate the proportion of individuals with each health state. Where data on the joint distribution of the long-term health outcomes was not available, we assumed independence of each long-term health outcome. Combined disability weights were calculated for all necessary combinations of existing disability weights. Summary and associated health states There are many distinct types of congenital heart anomalies with a range of anatomical patterns, severities, and requirements for medical treatment. For the purpose of estimating nonfatal outcomes, in GBD 2017 congenital heart anomalies were split into five-sub categories based on both the anatomical characteristics and the treatment requirements of each condition. We also began development of a model of total congenital heart anomalies, but this was not used in scaling the subcauses for GBD 2019. Instead, we used claims data to calculate a ratio of other-to-total and this was applied to the sum of the other four subcauses for each location, age group, sex, and year. Every case of congenital heart defects was associated with a health state of congenital heart disease, except for a proportion of ventricular and atrial septal defects which are considered asymptomatic. All congenital heart defects cases were split into a proportion without intellectual disability and a proportion with every severity from borderline to profound intellectual disability. The proportion of congenital heart anomalies cases experiencing each severity of intellectual disability were calculated using available literature sources on the prevalence and severity of intellectual disability in congenital heart defect populations 1,2,3 . The proportion of VSD/ASD cases attributed to the asymptomatic category was derived from literature sources on the long-term outcomes of patients diagnosed with septal defects at birth 4,5,6 . GBD estimates of congenital heart failure were assigned to the congenital heart defect categories according to the proportion of total congenital heart cause-specific mortality assigned to each category of congenital heart defects. The MR-BRT crosswalk results are shown below. In the DisMod-MR 2.1 model of total congenital heart anomalies, random effects on prevalence were limited to +-0.5 in order to limit geographic variation in the estimates of birth prevalence. The minimum excess mortality rate for the neonatal age range was set to 5.0. The smoothness on excess mortality rate was increased to Xi=5.0 in order to allow high excess mortality in the neonatal age groups and lower excess mortality rates in older ages. The MR-BRT crosswalk results are shown below. In the DisMod-MR 2.1 model of single ventricle and single ventricle pathway heart defects, random effects on prevalence were limited to +-0.5 in order to limit the estimated geographic variation in birth prevalence. A minimum excess mortality rate of 8 was set for the early neonatal period in order to capture the high mortality risk, based on expert priors and a review available literature on the mortality risk among infants born with single ventricle and single ventricle pathway heart defects. The smoothness on excess mortality rate was set to 5.0 in order to fit steep changes in the excess mortality rate during the first weeks of life, as the risk of death due to these congenital heart anomalies is greatest shortly after birth and diminishes over the life course. The MR-BRT crosswalk results are shown below. In the DisMod-MR 2.1 model of congenital heart defects excluding single ventricle and single ventricle pathway defects, random effects on prevalence were limited to +-0.5. A minimum excess mortality rate of 1.0 for the early neonatal period was enforced in order to capture the high risk of mortality associated with these conditions, and a decreasing slope prior on excess mortality rate was applied for all ages. The smoothness on excess mortality rate was set to Xi = 3.0 in order to allow the model to fit steep changes in the mortality rate of these conditions in the neonatal age period. The MR-BRT crosswalk results are shown below. In the DisMod-MR 2.1 model of critical malformations of great vessels, congenital valvular heart disease and patent ductus arteriosis, random effects on prevalence were limited to +-0.5. A minimum excess mortality rate of 1.0 was set for the early neonatal period in order to capture the high mortality risk associated with these conditions. The smoothness on excess mortality was increased to Xi = 3.0 in order to fit steep changes in the mortality associated with these conditions during and after the neonatal period, as the risk of death due to congenital heart anomalies is highest shortly after birth. Ventricular septal defects and atrial septal defects, includes holes in the walls separating the chambers of the heart. Many of these septal defects close spontaneously, while other require surgical care. The ICD-10 codes corresponding to ventricular septal defect and atrial septal defect are Q21.0 and Q21.1, respectively. The MR-BRT crosswalk results are shown below. In the DisMod-MR 2.1 model of ventricular septal defects and atrial septal defects (VSD/ASD), remission was set to zero for all ages. Cases of septal defects that spontaneously close over time were considered as part of the asymptomatic proportion of VSD/ASD rather than remitted cases. Random effects on prevalence were limited to +-0.3 in order to limit the random geographic variation in the estimated birth prevalence. No minimum excess mortality rate was set in this model, as VSD/ASD cases are not associated with excess mortality rates as high as the other subtypes of congenital heart defects. The smoothness on excess mortality rate was set to Xi=3.0, and a decreasing slope prior was set on remission for all ages, with remission set to 0 past age 10. Other congenital cardiovascular anomalies are modeled by applying the ratio of other congenital heart anomalies to total congenital heart anomalies as it is reflected in Marketscan data (a trusted data source), to the sum of the sub-causes of congenital cardiovascular anomalies. The result is prevalence of other congenital cardiovascular anomalies by age/year/sex/location. Specifically, we use claims data to calculate the proportion of cases that are due to the other causes. To do that, we sum the cases for the specified congenital subcauses and the other category subcauses. We divide the number of other subcause cases by the total number of cases to obtain the proportion. In order to have a valid proportion, we only use datapoints for which we have the combination of age, sex, location and year for all subcauses. We then calculate the prevalence of other: p_other = (p_sum_subcauses / 1-prop_other) -p_sub_subcauses. Rheumatic heart disease (RHD) was defined as a clinical diagnosis by a physician with or without confirmation using echocardiography. This case definition for echocardiographic confirmation of RHD follows the World Heart Federation criteria for echocardiographic diagnosis of rheumatic heart disease (1). 1. Echocardiography Prevalent rheumatic heart disease based on echocardiographic assessment and clinical confirmation 2. Clinical diagnosis Prevalent rheumatic heart disease based on physician diagnosis ICD codes for data included from hospital records can be found elsewhere in the appendix. We did not include any non-literature-based data types other than the hospital and claims data described elsewhere. Prevalence from hospital and claims data sources were included only for the nonendemic country model. Inpatient data were adjusted for multiple visits, non-primary diagnoses, and inpatient to outpatient utilisation ratios. This methodology is detailed elsewhere in the appendix. Severity level Lay description DW (95% CI) Rheumatic heart disease, not including heart failure Has a chronic disease that requires medication every day and causes some worry but minimal interference with daily activities. 0.049 (0.031-0.072) For GBD 2019 estimation, we ran two models using DisMod-MR -one for non-endemic countries and one for endemic countries. For GBD 2016, we identified locations as endemic if the estimated death rate due to RHD was greater than 0.15 per 100,000 in the 5 to 9 age group, or if that location had an SDI less than 0.6. Beginning in GBD 2017, we identified locations as endemic if the estimated death rate due to RHD was greater than 0.15 per 100,000 in the 10 to 14 age group, or if that location had an SDI less than 0.6. This change in age group was made based on feedback from RHD expert reviewers due to concerns that the death rate in 5 to 9 age group would not capture endemicity in locations where RHD is common only in later age groups. Each location estimated as part of GBD 2019 is listed below as either "Endemic" or "Non-endemic". In GBD 2016, we assumed that there was no remission from RHD. Beginning in GBD 2017, we estimated remission in both the endemic and non-endemic DisMod models. This decision was based on two studies 2,3 that observed remission among confirmed RHD cases. We used the equation below to convert reported proportion of remitted individuals in each study to a remission rate, defined as the number of remitted cases divided by the total person-years of disease: Where proportion remitted is the reported proportion of all individuals with RHD at baseline who ended up remitting, and years of followup is the mean follow-up time in the study. In order to acknowledge the uncertainty in these calculated remission rates and to allow DisMod flexibility in estimating remission, we input 0.2 as the upper bound for remission the remission prior and 0.00 as the lower bound for remission the remission prior. Because the two studies used to estimate remission were done only in children, we applied these remission priors to only those younger than age 20, and setting a remission prior of zero for adults older than age 20. Non-endemic model: We included hospital data, claims data, and limited literature data on prevalence. We also included CSMR from our mortality estimates of RHD for non-endemic locations only. A prior of no remission was set, and excess mortality was capped at 0.1 for all ages. Coefficients for selected covariates are listed in the table below. Endemic model: We included prevalence data from surveys published in the literature. As with the highincome model, we included CSMR from our mortality estimates of RHD for endemic locations only. A prior of no remission was set for all ages, and excess mortality was capped at 0.07, the highest observed mean excess mortality rate data point observed in this model. We also set priors of 0 on incidence for ages 0 to 1 and 50 to 100 to account for patterns of incidence in endemic countries. We used lnLDI as a fixed-effect country-level covariate on prevalence and excess mortality, enforcing an inverse relationship for both. The log-transformed, age-standardised SEV scalar was also used as a fixed-effect country-level covariate on prevalence. We combined estimates from the endemic and non-endemic models, selecting estimates for the locations identified as non-endemic from the non-endemic model and estimates for the locations identified as endemic from the endemic model. Estimates of heart failure due to RHD were then subtracted from the estimates for RHD, giving the overall prevalence of RHD without heart failure. A description of the modelling strategy for heart failure due to RHD can be found in the heart failure appendix. We evaluated models based on comparing estimates with input data as well as estimates from previous rounds of GBD. The Myocarditis refers to a heterogenous group of diseases with variable clinical and pathological features. Acute myocarditis was defined for GBD as the acute and time-limited symptoms of myocarditis separate from its chronic heart failure-related sequelae. Heart failure due to myocarditis is estimated separately in GBD (see methods for heart failure). Symptoms of acute myocarditis are nonspecific and include a flu-like or gastrointestinal syndrome, followed by anginal-type chest pain, arrhythmias, syncope, or heart failure. A list of the ICD codes included can be found in elsewhere in the appendix. The preferred data sources for acute myocarditis were hospital admission data and other health facility data identifying cases of acute myocarditis. Table 1 shows the source counts for acute myocarditis. We did not include any non-literature-based data, apart from the hospital and claims data described elsewhere. We used inpatient hospital data adjusted for readmission, primary to any diagnosis, and inpatient to outpatient utilisation based on correction factors generated using USA claims data. We excluded all outpatient data, as they were implausibly low when compared with inpatient data from the same locations and with claims data. Inpatient hospital data points that were more than two-fold higher or 0.5-fold lower than the median absolute deviation value for high-income North America, Central Europe, and Western Europe for that age-sex group were excluded. For GBD 2019, we estimated acute myocarditis using a DisMod-MR Bayesian meta-regression model, setting a minimum of 3 and maximum of 5 as value priors on remission to establish an average duration of three months. We set a value prior of 0 for all ages on excess mortality. In GBD 2017, the countrylevel covariates used included the cardiomyopathy and myocarditis summary exposure variable (SEV) on incidence and the Healthcare Access and Quality index (HAQ Index) on excess mortality. For GBD 2019, The only country level covariate used was Healthcare Access and Quality Index (HAQ Index) on excess mortality. Table 3 below gives the parameters, betas, and exponentiated betas for study-level and country-level covariates used in the model Atrial fibrillation is a supraventricular arrhythmia due to disorganised depolarisation of the atrium. Atrial flutter is a macro-reentrant supraventricular arrhythmia, usually involving the cavo-tricuspid isthmus. Diagnosis requires an ECG demonstrating: 1) irregularly irregular RR intervals (in the absence of complete AV block); 2) no distinct P waves on the surface ECG, and; 3) an atrial cycle length (when visible) that is usually variable and less than 200 milliseconds. ICD codes used for inclusion of hospital and claims data can be found elsewhere in the appendix. Model inputs With-condition mortality rate 6 6 We did not perform a systematic review for GBD 2019. A systematic review was performed for GBD 2015 with the following search terms: ("atrial fibrillation" AND epidemiology[MeSH Subheading]) OR ("atrial flutter" AND epidemiology[MeSH Subheading]) OR ("atrial fibrillation" AND (prevalence OR incidence OR "case fatality")) OR ("atrial flutter" AND (prevalence OR incidence OR "case fatality")) OR ("heart atrium fibrillation" AND epidemiology[MeSH Subheading]) OR ("heart atrium fibrillation" AND (prevalence OR incidence OR "case fatality")) Apart from hospital and claims data points on prevalence, no non-literature-based data were included. We included hospital data corrected for readmission, primary to any diagnosis, and inpatient to outpatient utilisation ratios using adjustment factors calculated from US claims data. We excluded hospital data in certain geographies (eg, Philippines, China, India, Mexico, Botswana) where the data were implausibly low. We also excluded all outpatient administrative data as the values for all locations were implausibly low. We adjusted claims and inpatient hospital data using literature data in which an ECG reading was used as a reference using MR-BRT crosswalking procedures. These procedures are discussed in detail elsewhere in the appendix. Table 2 shows the adjustment factors produced by the crosswalking procedure. The crosswalking coefficients in Table 2 below can be used to calculate adjustment factors for alternative definitions. The formula for computing adjustment factors is given in equation 1 below. We also included a standardized age variable (age scaled) and a sex variable to the crosswalking procedure to adjust for the possibly of bias. Atrial fibrillation is split into symptomatic and asymptomatic based on standard GBD proportion information. The table below includes lay descriptions and disability weights for the severity levels of atrial fibrillation: In order to address changes in coding practices for atrial fibrillation that resulted in an implausible trend of increasing death-certificate-based mortality rates, we used a prevalence-based modelling approach that combined DisMod-MR and CODEm models to generate estimates for atrial fibrillation and flutter. This approach, first used in GBD 2015, allowed us to generate more accurate estimates, using observed prevalence and incidence rates along with modelled excess mortality rates generated from prevalence and cause-specific mortality estimates. The modelling steps are illustrated in the above flowchart. Effect sizes for covariates included in both the DisMod-MR 2.1 and CODEm models can be found in the table below.  In Step 1, we estimated deaths for atrial fibrillation using a standard CODEm approach.  In Step 2, we estimated prevalence rates in DisMod-MR using data from published reports of crosssectional and cohort surveys, as well as primary care facility data. We also used claims data covering inpatient and outpatient visits for the United States along with inpatient hospital data from 247 locations in 15 countries. For GBD 2019, inpatient hospital data were adjusted using age-and sexspecific information for: 1) readmission within one year; 2) primary diagnosis code to secondary codes; and, 3) the ratio of inpatient to outpatient visits. These clinical informatics data were then further adjusted using MR-BRT to account for misclassification compared with reference data. We set priors of no remission and capped excess mortality at 0.4 for all ages. We included the Healthcare Access and Quality (HAQ) index as a country-level, fixed-effect covariate on excess mortality and the log-transformed, age-standardised SEV scalar for atrial fibrillation and flutter as a country-level, fixed-effect covariate on prevalence.  In Step 3, we calculated the excess mortality rate (EMR) for 2019 (defined as the cause-specific mortality rate [CSMR] estimated from CODEm divided by the prevalence rate from DisMod-MR). We then selected 17 countries based on four conditions: 1) ranking of 4 or 5 stars on the system for assessing the quality of VR data; 2) prevalence data available from the literature were included in the DisMod-MR estimation; 3) prevalence rate ≥ 0.005; and, 4) CSMR ≥ 0.00002. Using information from these countries as input data, we ran a MR-BRT model of logEMR on sex, a cubic spline of age, and HAQI. Specifics on the MR-BRT framework can be found elsewhere in the appendix. We then predicted year-, age-and sex-specific EMR using the results of this regression for all non-selected countries. Countries included in the regression were assigned their directly calculated values. These EMR data points were assigned to the time period 1990-2017 and uploaded into the non-fatal database in order to be used in modelling. Step 4, we re-ran DisMod-MR using the input data described in Step 2 along with the EMR estimated in Step 3. We included Healthcare access and quality index (HAQI) as a fixed-effect, country-level covariate on excess mortality and the log-transformed, age-standardised SEV scalar for atrial fibrillation and flutter as a fixed-effect, country-level covariate on prevalence. We included a value prior of 0 for remission for all ages and set a value prior of 0 for excess mortality for ages 0-30. The prevalence from the DisMod-MR model in Step 4 was used as the finalised output for upload to COMO and further processing into YLDs and DALYs. Models were evaluated based on expert opinion, comparison with results from previous rounds of GBD, and model fit. The tables below include the study covariates, parameters, betas, and exponentiated betas. Calcific aortic valve disease was defined as clinical diagnosis of aortic valve stenosis or regurgitation due to progressive calcification of the aortic valve or annulus leading to haemodynamically moderate or severe aortic stenosis or regurgitation. Cases were determined by echocardiography. Calcific aortic valve disease in the GBD did not include aortic valve disease with an aetiology that was congenital, rheumatic, or infectious. Disease due to these aetiologies are modelled in other causes in the GBD. Information on unicuspid or bicuspid valves was generally not available and is often unknown in advanced calcific disease. Therefore, we included cases of unicuspid or bicuspid valves in our case definition if they developed clinically significant aortic stenosis. The criteria for aortic stenosis follow the American Heart Association/American College of Cardiology definition of haemodynamically moderate or severe aortic stenosis and are listed in Table 1 . The criteria for aortic regurgitation follow the American Heart Association/American College of Cardiology definition of haemodynamically moderate or severe aortic regurgitation and are listed in Table 2 . Mild haemodynamic aortic stenosis or regurgitation was not included in our case definition because mildly abnormal haemodynamic parameters are difficult to differentiate from non-pathological stenosis and/or regurgitation, and are generally not reported in population-based studies. Degenerative mitral valve disease was defined as myxomatous degeneration of the mitral valve leading to regurgitation or prolapse. Cases were determined by echocardiography by a physician. Degenerative mitral valve disease did not include mitral valve disease with an aetiology that was congenital, rheumatic, infectious, traumatic, carcinoid, or functional (ie, secondary to left ventricular remodeling due to heart failure from another cause). Mitral valve stenosis was always considered to have a rheumatic aetiology and therefore was not included in the definition of degenerative mitral valve disease. Degenerative mitral valve disease was restricted to persons at or above the age of 15 in order to exclude congenital mitral valve disorders. This age restriction is consistent with other progressive cardiovascular diseases modelled in the GBD. The criteria for mitral regurgitation follow the American Heart Association/American College of Cardiology definition of haemodynamically progressive or severe mitral regurgitation and are listed in Table 3 . Mild haemodynamic mitral regurgitation was not included in our case definition because mild mitral valve disease cannot be differentiated from nonpathological regurgitation and is generally not reported in population-based studies. Other non-rheumatic valve disease Other non-rheumatic valve disease is a residual category that captures non-rheumatic, non-congenital valve disorders of the tricuspid and pulmonary valves. This includes tricuspid regurgitation, tricuspid stenosis, pulmonary regurgitation, and pulmonary stenosis. Other non-rheumatic valve disease did not include tricuspid or pulmonary valve disease with an aetiology that was congenital, rheumatic, infectious, traumatic, carcinoid, or functional (ie, secondary to heart failure due to another cause). Data on the prevalence, incidence, treatment, haemodynamic severity, and asymptomatic status were collected from PubMed using the following search strings on 8/21/2017: Other non-rheumatic valve disease We did not run a literature review for "other non-rheumatic valve diseases" because we did not directly model non-fatal burden due to this cause. We excluded literature that was not representative, included rheumatic, endocarditic, or congenital heart disease in its case definition, or included haemodynamically mild valve disease in its case definition. Data on the prevalence of calcific aortic valve and degenerative mitral valve disease were also obtained from inpatient hospital data. These data were adjusted for multiple visits, non-primary diagnoses, and inpatient to outpatient utilisation ratios. Hospital data were excluded below age 30 or if the age-series for a given hospital data source was implausible. Prevalence data from both inpatient and outpatient hospital claims were used in the United States. For GBD 2019, we used the modeling software Meta-Regression, Baysian Regularized Trimming (MR-BRT) to correct for biases in data types, replacing the in-DisMod crosswalks used in GBD 2017. We used a network meta-analysis to adjust inpatient data, MarketScan data from 2010-2016, and MarketScan data from 2000, which used a different sampling methodology than other years, to literature and inpatient data. Tables 4 and 5 show MR-BRT crosswalk adjustment factors. MR-BRT was used to split both-sex data points into sex-specific estimates. This methodology is detailed elsewhere in the appendix. We also split data points where the age range was greater than 25 years. Age splitting was based on the global sex-specific age pattern from a Dismod model that only used input data from scientific literature with less than a 25-year age range. For other non-rheumatic valve diseases, we estimated nonfatal burden using the cause of death heart failure approach. This method is used for most cardiovascular diseases that cause heart failure and is described in detail in the appendix section on heart failure. In order to estimate non-fatal burden for calcific aortic valve disease and degenerative mitral valve disease, we first determined the sequelae and corresponding health states that result from these conditions. This information, along with the disability weights applied to each health state, are displayed in Table 6 . To model the burden due to each of the sequela above, we first modelled the overall prevalence of combined haemodynamically moderate and severe calcific aortic valve disease and degenerative mitral valve disease. We then estimated the proportion of those with prevalent disease who were haemodynamically moderate, assuming that this would approximate the proportion who were asymptomatic. We next estimated the proportion of those with symptomatic disease (ie, those with haemodynamically severe disease) who were treated. The remaining proportion -those with untreated symptomatic disease -was split into four proportions: 1) controlled, medically managed; 2) mild; 3) moderate; and 4) severe heart failure. All proportions were calculated and converted to population prevalence at the draw level, thus propagating uncertainty from each step through to all subsequent steps. Population prevalence for each severity level are necessary in order to accurately calculate the burden for these diseases. Figure 1 visualises this framework. Each of these modelling steps is outlined in greater detail below. We separately modelled the overall prevalence of calcific aortic valve disease and degenerative mitral valve disease in DisMod-MR 2.1. We used cause-specific mortality rates from the fatal modelling process as inputs. These two models estimate the prevalence of these two valve diseases for each age, sex, location, and year. Covariates included in the DisMod models for prevalence of calcific aortic valve and degenerative mitral valve disease are presented in tables 9 and 10. We estimated the proportion of individuals with haemodynamically moderate or severe valve disease who were haemodynamically moderate. As mentioned above, we assumed that individuals with haemodynamically moderate disease were asymptomatic. There were a total of five data sources that reported the proportion of individuals who were haemodynamically moderate. Because of the sparsity of data, we modelled the haemodynamically moderate proportion together for both calcific aortic valve disease and degenerative mitral valve disease. We modelled a proportion with uncertainty that varied by age with the following regression: Where is the proportion of haemodynamically moderate disease, age is the midpoint age for each data point, and is a random effect for each data source. The regression coefficients are reported in Table 11 . The prevalence of those with haemodynamically moderate valve disease and the prevalence of those with haemodynamically severe disease were calculated using the prevalence envelope and the proportion of those with haemodynamically moderate disease for each five-year age group, sex, location, and year. We estimated the proportion of individuals who had haemodynamically severe disease who had been treated. Treatment was defined as valve replacement or repair. We assumed that treatment was not performed on any individuals with only haemodynamically moderate disease. The number of data points are reported in Table 10 . These data were all from relatively high-income geographies, yet it is important that we capture the difference in treatment between high-and low-income locations. Because of this challenge, we ran a regression using the Healthcare Access and Quality (HAQ) index predicting the level of treatment and set a prior that the proportion of individuals with a valve replacement or repair was zero where HAQ index was equal to zero. This assumption allowed us to estimate an increasing relationship between HAQ index and proportion treated, where the estimated proportion treated was based on data where HAQ index was high. We used the regression equation: where is the proportion of individuals with haemodynamically severe disease who had a valve replacement or repair, ℎ is the Healthcare Access and Quality index, is the midpoint of the age range for a given data point, and is an indicator variable to adjust for data points where the denominator of the proportion treated included both haemodynamically moderate and haemodynamically severe individuals. The prevalence of those with treated valve disease and the prevalence of those with untreated haemodynamically severe disease were calculated using the prevalence of haemodynamically severe disease and the proportion of those with treated valve disease. The results of this regression are reported in Table 13 and plotted for three ages in Figure 2 . Figure 2 : Results of treatment model for three ages The proportions of 1) controlled, medically managed, 2) mild, 3) moderate and 4) severe heart failure due to valve disease were estimated using the approach described in the heart failure section of the appendix. Prevalence for each of these health states was estimated using the prevalence of haemodynamically severe disease and the corresponding proportion for each severity of heart failure. Burden due to each severity of valve disease was estimated by multiplying the prevalence of each severity by the corresponding disability weight. Our case definition for acute endocarditis was a clinical diagnosis of infective endocarditis. The ICD codes included can be found elsewhere in the appendix. We did not include any non-literature-based data types, apart from the hospital and claims data described elsewhere. We excluded all outpatient data, as they were implausibly low when compared with inpatient data from the same locations and claims data. We used hospital data corrected for readmission and primary to any diagnosis based on the correction factors generated by the clinical informatics team. We excluded any inpatient hospital data points which were more than two-fold higher or 0.5-fold lower than the median absolute deviation value for high-income North America, Central Europe, and Western Europe for that age-sex group. No data adjustments was done for acute endocarditis in GBD 2019. We used the standard GBD approach, which utilises MEPS data to split overall estimates of endocarditis into moderate and severe categories. The table below includes the severity level, lay descriptions, and DWs associated with acute endocarditis. For GBD 2019, we estimated endocarditis using a DisMod-MR Bayesian meta-regression model, setting a minimum of 11 and maximum of 13 as value priors on remission to establish an average duration of one month. For GBD 2019, we outliered cause specific mortality rate data from Mali due to implausibly high estimates. Country-level covariates used included the endocarditis summary exposure variable (SEV) on incidence and Health Access and Quality Index on excess mortality. We evaluated models by comparing model fits with the data and with results from previous GBD estimation cycles. The table below gives the parameters, betas, and exponentiated betas for study-level and country-level covariates used in the model. For GBD 2019, peripheral arterial disease was defined as having an ankle-brachial index (ABI) < 0.9. Intermittent claudication was defined clinically. Specific ICD codes for claims data included can be found elsewhere in the appendix. The search was conducted from 1/1/2013 to 3/16/2015. 1,658 results were returned, of which six were extracted. A systematic review was also performed for peripheral arterial disease and intermittent claudication for GBD 2013. Search terms can be provided upon request. Apart from the claims data from the United States, we did not include any non-literature-based data types. We did not use inpatient hospital data, as peripheral arterial disease is expected to be rare in inpatient data but common in outpatient data as it is a condition usually managed on an outpatient basis, except for specific surgical interventions. This discrepancy leads to implausible correction factors based on inpatient/outpatient information from claims data (~150X); thus, adjusted data cannot be used. Including uncorrected data in the model is likely to lead to incorrect estimates as hospitalisation and procedure rates are likely to vary between geographies based on access to and patterns of care. For GBD 2019 we adjusted prevalence data from claims using the MR-BRT data adjustment procedure described elsewhere the appendix. Our reference data was from literature in which the prevalence of PAD was based on directly-measured ABI values. The coefficients in Table 2 below can be used to calculate adjustment factors for alternative definitions. The formula for computing adjustment factors is given in equation 1 below. We also included a standardized age variable (age scaled) and a sex variable to the crosswalking procedure to adjust for the possibly of bias. = ( ( ) − − * − * ) We used the proportion of intermittent claudication to split the overall prevalence of peripheral arterial disease into symptomatic and asymptomatic peripheral vascular disease. The table below illustrates these values: For GBD 2019, we used DisMod MR 2.1 to model the overall prevalence of peripheral arterial disease using prevalence data from literature studies and and crosswalked claims data. We included the log-transformed, age-standardised SEV scalar for PAD and log-transformed LDI as fixedeffect, country-level covariates. We set value priors of 0 for incidence from ages 0 to 30. We also set a value prior of 0 for remission for all ages. Additionally, we set a value prior of 0 for excess mortality inbetween ages 0 and 30 as well as a value prior between 0 and 0.05 for excess mortality inbetween ages 30 and 100. The table below illustrate the beta values and and exponentiated beta values for the covariates chosedn for the overall peripheral vascular disease model. We used DisMod MR to model the proportion of peripheral vascular disease with intermittent claudication. We set a value prior of 0 for proportion for ages 0 to 40. We included the Health Access and Quality Index score as a country-level covariate for excess mortality. The table below illustrate the study covariates, parameters, beta, and exponentiated beta values for the proportion model for intermittent claudication. To obtain final estimates for the sequelae of interest, we multiplied the prevalence model by the proportion model at the draw level to generate the prevalence of symptomatic and asymptomatic peripheral vascular disease. Models were evaluated based on expert review, comparisons with estimates from prior rounds of GBD, and assessing model fit. There have been no substantive changes from GBD 2017 in terms of modelling strategy for peripheral arterial disease. High systolic blood pressure Input data and methodological summary Exposure Brachial systolic blood pressure in mmHg. We utilised data on mean systolic blood pressure from literature and from household survey microdata and reports (e.g. STEPS, NHANES). For GBD 2019, we did not carry out a systematic review of the literature for new data. Counts of the data inputs used for GBD 2019 are show in Tables 1 and 2 below. Details of inclusion and exclusion criteria and data processing steps follow. Studies were included if they were population-based and directly measured systolic blood pressure using a sphygmomanometer. We assumed the data were representative if the geography or the population were not selected because it was related to hypertension or hypertensive outcomes. Data were utilised in the modelling process unless an assessment strongly suggested that the source was biased. A candidate source was excluded if the quality of study did not warrant a valid estimate because of selection (non-representative populations) or if the study did not provide methodological details for evaluation. In a small number of cases, a data point was considered to be an outlier candidate if the level was implausibly low or high based on expert judgement and data from other country data. Where possible, individual-level data on blood pressure estimates were extracted from survey microdata. These data points were collapsed across demographic groupings to produce mean estimates in the standard GBD five-year age-sex groups. If microdata were unavailable, information from survey reports or from literature were extracted along with any available measure of uncertainty including standard error, uncertainty interval, and sample size. Standard deviations were also extracted. Where mean systolic blood pressure was reported split out by groups other than age, sex, location, and year (e.g. by hypertensive status), a weighted mean was calculated. Incorporating United States prevalence data Survey reports and literature often report information only about the prevalence, but not the level, of hypertension in the population studied. These sources were not used to model systolic blood pressure, with the exception of data from the Behavioral Risk Factors Surveillance System (BRFSS) because of the availability of a similarly structured exam survey that is representative of the same population (NHANES). BRFSS is a telephone survey conducted in the United States for all US counties. It collects self-reported diagnosis of hypertension. These self-reported values of prevalence of raised blood pressure were adjusted for self-report bias and tabulated by age group, sex, US state, and year. These prevalence values were used to predict a mean systolic blood pressure for the same strata with a regression using data from the National Health and Nutrition Examination Survey, a nationally representative health examination survey of the US adult population. The regression was run separately by sex, and was specified as: SBP , , , = β + β prev , , , where SBP , , , is the location, age, time, and sex specific mean systolic blood pressure and prev , , , is the location, age, time, and sex specific prevalence of raised blood pressure. The coefficients for both models are reported in Table 3 . Out of sample RMSE was used to quantify the predictive validity of the model. The regression was repeated 10 times for each sex, each time randomly holding out 20% of the data. The RMSEs from each holdout analysis were averaged to get the average out of sample RMSE. The results of this holdout analysis are reported in Table 4 . Age and sex splitting Prior to modelling, data provided in age groups wider than the GBD five-year age groups were processed using the approach outlined in Ng and colleagues. 2 Briefly, age-sex patterns was identified using 115 sources of microdata with multiple age-sex groups, and these patterns were applied to estimate agesex-specific levels of mean systolic blood pressure from aggregated results reported in published literature or survey reports. In order to incorporate uncertainty into this process and borrow strength across age groups when constructing the age-sex pattern, we used a model with auto-regression on the change in mean SBP over age groups: Where is the mean predicted value for age group a, is the mean predicted value for the age group previous to age group a, is the difference in mean between age group a and age group a-1, is the difference between age group a-1 and age group a-2, and is a user-input prior on how quickly the mean SBP changes for each unit increase in age. We used a of 1.5 mmHg for this model. Draws of the age-sex pattern were combined with draws of the input data needing to be split in order to calculate the new variance of age-sex split data points. Exposure estimates were produced from 1980 to 2019 for each national and subnational location, sex, and for each five-year age group starting from 25+. As in GBD 2017, we used a spatiotemporal Gaussian process regression (ST-GPR) framework to model the mean systolic blood pressure at the location-, year-, age-, sex-level. Details of the ST-GPR method used in GBD 2019 can be found elsewhere in the appendix. The first step of the ST-GPR framework requires the creation of a linear model for predicting SBP at the location-, year-, age-, sex-level. Covariates for this model were selected in two stages. First a list of variables with an expected causal relationship with SBP was created based on significant association found within high-quality prospective cohort studies reported in the published scientific literature. The second stage in covariate selection was to test the predictive validity of every possible combination of covariates in the linear model, given the covariates selected above. This was done separately for each sex. Predictive validity was measured with out of sample root-mean-squared error. In GBD 2016, the linear model with the lowest root-mean-squared error for each sex was then used in the ST-GPR model. Beginning in GBD 2017, we used an ensemble model of the 50 models with the lowest root-mean-squared error for each sex. This allows us to utilise covariate information from many plausible linear mixed-effects models. The 50 models were each used to predict the mean SBP for every age, sex, location, and year, and the inverse-RMSE-weighted average of this set of 50 predictions was used as the linear prior. The relative weight contributed by each covariate is plotted by sex in Figure 1 . Currently, the ST-GPR model only produces an estimate of mean exposure level without standard deviation. Therefore, the standard deviation of systolic blood pressure within a population was estimated for each national and subnational location, sex, and five-year age group starting from age 25 using the standard deviation from person-level and some tabulated data sources. Person-level microdata accounted for 10 375 of the total 12 570 rows of data on standard deviation. The remaining 2195 rows came from tabulated data. Tabulated data were only used to model standard deviation if it was sex-specific and five-year-age-group-specific and reported a population standard deviation of systolic blood pressure. The systolic blood pressure standard deviation function was estimated using a linear regression: log SD , , , = β + β log (mean_SBP , , , )+β sex + β I where mean_SBP , , , is the location-, age-, time-, and sex-specific mean SBP estimate from ST-GPR, and I is a dummy variable for a fixed effect on a given five-year age group. To account for in-person variation in systolic blood pressure, a "usual blood pressure" adjustment was done. The need for this adjustment has been described elsewhere. 5 Briefly, measurements of a risk factor taken at a single time point may not accurately capture an individual's true long-term exposure to that risk. Blood pressure readings are highly variable over time due to measurement error as well as diurnal, seasonal, or biological variation. These sources of variation result in an overestimation of the variation in cross-sectional studies of the distribution of SBP. To adjust for this overestimation, we applied a correction factor to each location-, age-, time-, and sexspecific standard deviation. These correction factors were age-specific and represented the proportion of the variation in blood pressure within a population that would be observed if there were no withinperson variation across time. Four longitudinal surveys were used to estimate these factors: the China Health and Retirement Longitudinal Survey (CHRLS), the Indonesia Family Life Survey (IFLS), the National Health and Nutrition Examination Survey I Epidemiological Follow-up Study (NHANES I/EFS), and the South Africa National Income Dynamics Survey (NIDS). The sample size and number of blood pressure measurements at each measurement period for each survey is reported in Table 5 . For each survey, the following regression was created for each age group: SBP , = β + β sex+β age + +υ where SBP , is the systolic blood pressure of an individual i at age a, sex is a dummy variable for the sex of an individual, age is a continuous variable for the age of an individual, and υ is a random intercept for each individual. Then, a blood pressure value SBP , was predicted for each individual i for his/her age at baseline b. The correction factor cf for each age group within each survey was calculated as variation in these predicted blood pressures was divided by the variation in the observed blood pressures at baseline, SBP , : The average of the correction factors was taken over the three surveys to get one set of age-specific correction factors, which were then multiplied by the square of the modelled standard deviations to estimate standard deviation of the "usual blood pressure" of each age, sex, location, and year. Because of low sample sizes, the correction factors for the 75-79 age group was used for all terminal age groups. The final correction factors for each age group are reported in Table 6 . Figure 2 shows the correction factors by survey and age group ID. Figure 2 : Correction factor by survey and age group id. The correction factor is equal to the variance of the predictions divided by the variance of the raw dataset. In pink is the average correction factor for each age group, summarised in Table 6 . A visualisation of how the uncorrected blood pressure measurements overestimate the "usual" blood pressure variation is shown in Figure 3 . This image shows the density of the distribution of the observed blood pressure values SBP , in participants in the Indonesian Family Life Study survey in red, and the density of the predicted blood pressure values SBP , in blue. The ratio of the variance of the blue distribution to the variance of the red distribution is an example of the scalar adjustment factor being applied to the modelled standard deviations. Estimating the exposure distribution shape The shape of the distribution of systolic blood pressure was estimated using all available person-level microdata sources, which was a subset of the input data into the modelling process. The distribution shape modelling framework for GBD 2019 is detailed in the elsewhere in the appendix. Briefly, an ensemble distribution created from a weighted average of distribution families was fit for each individual microdata source, separately by sex. The weights for the distribution families for each individual source were then averaged and weighted to create a global ensemble distribution for each sex. No changes have been made to the TMREL used for systolic blood pressure since GBD 2015. We estimated that the TMREL of SBP ranges from 110 to 115 mmHg based on pooled prospective cohort studies that show risk of mortality increases for SBP above that level. 3, 4 Our selection of a TMREL of 110-115 mmHg is consistent with the GBD study approach of estimating all attributable health loss that could be prevented even if current interventions do not exist that can achieve such a change in exposure level, for example a tobacco smoking prevalence of zero percent. To include the uncertainty in the TMREL, we took a random draw from the uniform distribution of the interval between 110 mmHg and 115 mmHg each time the population attributable burden was calculated. No changes have been made to the relative risk estimates for blood pressure outcomes used since GBD 2016. RRs for chronic kidney disease are from the Renal Risk Collaboration meta-analysis of 2.7 million individuals in 106 cohorts. For other outcomes, we used data from two pooled epidemiological studies: For cardiovascular disease, epidemiological studies have shown that the RR associated with SBP declines with age, with the log (RR) having an approximately linear relationship with age and reaching a value of 1 between the ages of 100 and 120. RRs were reported per 10 mmHg increase in SBP above the TMREL value (115 mmHg), calculated as in the equation below: Where ( ) is the RR at exposure level x and is the increase in RR for each 10 mmHg above the TMREL. We used DisMod-MR 2.1 to pool effect sizes from included studies and generate a doseresponse curve for each of the outcomes associated with high SBP. The tool enabled us to incorporate random effects across studies and include data with different age ranges. RRs were used universally for all countries and the meta-regression only helped to pool the three major sources and produce RRs with uncertainty and covariance across ages taking into account the uncertainty of the data points. Data inputs come from 3 sources:  Estimates of mean FPG in a representative population  Individual-level data of fasting plasma glucose measured from surveys  Estimates of diabetes prevalence in a representative population Data sources that did not report mean FPG or prevalence of diabetes are excluded from analysis. When a study reported both mean fasting plasma glucose (FPG) and prevalence of diabetes, we use the mean FPG for exposure estimates. Where possible, individual-level data supersede any data described in a study. Individual-level data are aggregated to produce estimates for each 5-year age group, sex, location, and year of a survey. We perform several processing steps to the data in order to address sampling and measurement inconsistencies that will ensure the data are comparable. Estimates in a sex and age group with a sample size <30 persons is considered a small sample size. In order to avoid small sample size problems that may bias estimates, data are collapsed into the next age group in the same study till the sample size reach at least 30 persons. The intent of collapsing the data is to preserve as much granularity between age groups as possible. If the entire study sample consists of <30 persons and did not include a population-weight, the study is excluded from the modelling process. We predicted mean FPG from diabetes prevalence using an ensemble distribution. We characterized the distribution of FPG using individual-level data. Details on the ensemble distribution can be found elsewhere in the Appendix. Before predicting mean FPG from prevalence of diabetes, we ensured that the prevalence of diabetes was based on the reference case definition: fasting plasma glucose (FPG) >126 mg/dL (7 mmol/L) or on treatment. For more details on how the case-definition crosswalk is conducted, please see the diabetes mellitus appendix in Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Exposure estimates are produced for every year between 1980 to 2019 for each national and subnational location, sex, and for each 5-year age group starting from 25 years. As in previous rounds of GBD, we used a Spatio-Temporal Gaussian Process Regression (ST-GPR) framework to model the mean fasting plasma glucose at the location-, year-, age-, and sex-level. Updates to the ST-GR modelling framework for GBD 2019 are detailed elsewhere in the Appendix. Fasting plasma glucose is frequently tested or reported in surveys aiming at assessing the prevalence of diabetes mellitus. In these surveys, the case definition of diabetes may include both a glucose test and questions about treatment for diabetes. People with positive history of diabetes treatment may be excluded from the FPG test. Thus, the mean FPG in these surveys would not represent the mean FPG in the entire population. In this event, we estimated the prevalence of diabetes assuming a definition of FPG>126 mg/dL (7mmol/L), then crosswalked it to our reference case definition, and then predicted mean FPG. To inform our estimates in data-sparse countries, we systematically tested a range of covariates and selected age specific prevalence of obesity as a covariate based on direction of the coefficient and significance level. Mean FPG is estimated using a mixed-effects linear regression, run separately by sex: where p , , is the prevalence of overweight, I [ ] is an indicator variable for a fixed effect on a given 5-year age group, and α α α are random effects at the super-region, region, and country level, respectively. The estimates were then propagated through the ST-GPR framework to obtain 1000 draws for each location, year, age, and sex. The theoretical minimum-risk exposure level (TMREL) for FPG is 4.8-5.4 mmol/L. This was calculated by taking the person-year weighted average of the levels of FPG that were associated with the lowest risk of mortality in the pooled analyses of prospective cohort studies. 1 We estimate 15 outcomes due to high fasting plasma glucose (continuous risk) or diabetes (categorical risk). Outcome After a review of the chronic kidney disease literature, we determined that there is only an attributable risk of chronic kidney disease due to diabetes type 1 and chronic kidney disease due to diabetes type 2 to FPG. Thus, in GBD 2019 we removed chronic kidney disease due to glomerulonephritis, chronic kidney disease due to hypertension, chronic kidney disease due to other causes as an outcome. Relative risks (RR) were obtained from dose-response meta-analysis of prospective cohort studies. Please see the citation list for a full list of studies that are utilized. For cardiovascular outcomes, we estimated age-specific RRs using DisMod-MR 2.1 with log (RR) as the dependent variable and median age at event as the independent variable with an intercept at age 110. Morbidity and mortality directly caused by diabetes type 1 and diabetes type 2 is considered directly attributable to FPG. Relative risks were obtained from meta-analysis of cohort studies. Please see the citation list for a full list of studies that are utilized. Input data and methodological summary Exposure In earlier iterations of the GBD study, we estimated burden attributable to total cholesterol. Beginning in GBD 2017, we modelled blood concentration of low-density lipoprotein (LDL) in units of mmol/L. We used data on blood levels for low-density lipoprotein, total cholesterol, triglyceride, and high-density lipoprotein from literature and from household survey microdata and reports. We adjusted data for total cholesterol, triglycerides, and high-density lipoprotein using the correction approach described in the Lipid Crosswalk section below. Counts of the data inputs used for GBD 2019 are show in Tables 1 and 2 below. Details of inclusion and exclusion criteria and data processing steps follow. Studies were included if they were population-based and measured total LDL, total cholesterol (TC), high-density lipoprotein (HDL), and/or triglycerides (TG) were available from blood tests or if LDL was calculated using the Friedewald equation. We assumed the data were representative of the location if the geography or population chosen were not related to the diseases and if it was not an outlier compared to other data in the country or region. Data were utilised in the modelling process unless an assessment of data strongly suggested that the data were biased. A candidate source was excluded if the quality of study did not warrant a valid estimate because of selection (non-representative populations) or if the study did not provide methodological details for evaluation. In a small number of cases, a data point was considered to be an outlier candidate if the level was implausibly low or high based on expert judgement and other country data. Where possible, individual-level data on LDL estimates were extracted from survey microdata and these were collapsed across demographic groupings to produce mean estimates in the standard GBD five-year age-sex groups. If microdata were unavailable, information from survey reports or from literature were extracted along with any available measure of uncertainty including standard error, uncertainty intervals, and sample size. Standard deviations were also extracted. Where LDL was reported split out by groups other than age, sex, location, and year (eg, by diabetes status), a weighted mean was calculated. Total cholesterol consists of three major components: LDL, HDL, and TG. LDL is often calculated for an individual using the Friedewald equation, shown below: We utilised this relationship at the individual level to impute the mean LDL for a study population when only data on TC, HDL, and TGL were available. Because studies report different combinations of TC, HDL, and TGL, we constructed a single regression to utilise all available data to evaluate the relationship between each lipid and LDL at the population level. We used the following regression: Where , , and are indicator variables for whether data are available for a given lipid, is an indicator variable a given set of available lipids . is a unique intercept for each set of available lipid combinations. For example, for sources that only reported TC and HDL, , should account for the missing lipid data, ie, TGL. The form of this regression allows us to estimate the betas for each lipid using all available data. As a sensitivity analysis, we also ran separate regressions for each set of available lipids and found that the single regression method had much lower root-mean-squared error. A comparison of the observed versus predicted LDL for each set of available lipids is shown in Figure 1 . We found almost no relationship between LDL and HDL or TGL when TC was not available, so only studies that reported TC were adjusted to LDL. Survey reports and literature often report information only about the prevalence, but not the level, of hypercholesterolemia in the population studied. These sources were not used to model LDL, with the exception of data from the Behavioral Risk Factors Surveillance System (BRFSS) because of the availability of a similarly structured exam survey covering the identical population (NHANES). BRFSS is a telephone survey conducted in the United States for all counties. It collects self-reported diagnosis of hypercholesterolemia. These self-reported values of prevalence of raised total cholesterol in each age group, sex, US state, and year were used to predict a mean total cholesterol for the same strata with a regression using data from the National Health and Nutrition Examination Survey, a nationally representative health examination survey of the US adult population. The regression was: TC l,a,t,s = β 0 + β 1 prev l,a,t,s where TC l,a,t,s is the location, age, time, and sex specific mean total cholesterol and prev l,a,t,s is the location, age, time, and sex specific prevalence of raised total cholesterol. The coefficients for both models are reported in Table 1 . Out of sample RMSE was used to quantify the predictive validity of the model. The regression was repeated 10 times for each sex, each time randomly holding out 20% of the data. The RMSEs from each holdout analysis were averaged to get the average out of sample RMSE. The results of this holdout analysis are reported in Table 2 . Total cholesterol estimates were crosswalked to LDL using the lipid crosswalk reported above. Age and sex splitting Prior to modelling, data provided in age groups wider than the GBD five-year age groups were processed using the approach outlined in Ng and colleagues. 2 Briefly, age-sex patterns were identified using person-level microdata (58 sources), and estimate age-sex-specific levels of total cholesterol from aggregated results reported in published literature or survey reports. In order to incorporate uncertainty into this process and borrow strength across age groups when constructing the age-sex pattern, we used a model with auto-regression on the change in mean LDL over age groups: Where is the mean predicted value for age group a, is the mean predicted value for the age group previous to age group a, is the difference in mean between age group a and age group a-1, is the difference between age group a-1 and age group a-2, and is a user-input prior on how quickly the mean LDL changes for each unit increase in age. We used a of 0.05 mmol/L for this model. Draws of the age-sex pattern were combined with draws of the input data needing to be split in order to calculate the new variance of age-sex-split data points. Exposure estimates were produced from 1980 to 2019 for each national and subnational location, sex, and for each five-year age group starting from 25. As in GBD 2017, we used a spatiotemporal Gaussian process regression (ST-GPR) framework to model the mean LDL at the location-, year-, age-, and sexlevel. Details of the ST-GPR method used in GBD 2019 can be found elsewhere in the appendix. The first step of the ST-GPR framework requires the creation of a linear model for predicting LDL at the location-, year-, age-, sex-level. Covariates for this model were selected in two stages. First a list of variables with an expected causal relationship with LDL was created based on significant association found within high-quality prospective cohort studies reported in the published scientific literature. The second stage in covariate selection was to test the predictive validity of every possible combination of covariates in the linear model, given the covariates selected above. This was done separately for each sex. Predictive validity was measured with out of sample root-mean-squared error. In GBD 2016, the linear model with the lowest root-mean-squared error for each sex was then used in the ST-GPR model. Beginning in GBD 2017, we used an ensemble model of the 50 models with the lowest root-mean-squared error for each sex. This allows us to utilise covariate information from many plausible linear mixed-effects models. The 50 models were each used to predict the mean LDL for every age, sex, location, and year, and the inverse-RMSE-weighted average of this set of 50 predictions was used as the linear prior. The relative weight contributed by each covariate is plotted by sex in Figure 2 . The standard deviation of LDL within a population was estimated for each national and subnational location, sex, and five-year age group starting from age 25 using the standard deviation from personlevel and some tabulated data sources. Person-level microdata accounted for 3009 of the total 4001 rows of data on standard deviation. The remaining 992 rows came from tabulated data. Tabulated data were only used to model standard deviation if they were sex-specific and five-year-age-group-specific and reported a population standard deviation LDL. The LDL standard deviation function was estimated using a linear regression: log SD , , , = β + β log (mean_LDL , , , )+β sex + β I [ ] where mean_LDL , , , is the country-, age-, time-, and sex-specific mean LDL estimate from ST-GPR, and I [ ] is a dummy variable for a fixed effect on a given five-year age group. The shape of the distribution of LDL was estimated using all available person-level microdata sources, which was a subset of the input data into the modelling process. The distribution shape modelling framework for GBD 2019 is detailed elsewhere in the appendix. Briefly, an ensemble distribution created from a weighted average of distribution families was fit for each individual microdata source, separately by sex. The weights for the distribution families for each individual source were then averaged and weighted to create a global ensemble distribution for each sex. For GBD 2017, we reviewed the literature to select a TMREL for LDL. A meta-analysis of randomised trials has shown that outcomes can be improved even at low levels of LDL-cholesterol, below 1.3 mmol/L. 3 Recent studies of PCSK-9 inhibitors support these results. 4 We therefore used a TMREL with a uniform distribution between 0.7 and 1.3 mmol/L; this value remained unchanged for GBD 2019. After a systematic search, we were unable to find relative risks for LDL that were reported by age and level of LDL. Given this evidence that the relative risks for LDL and TC are very similar 5 and the strong linear correlation between TC and LDL at the individual level, we used relative risks reported for TC to approximate the relative risks for LDL. We used DisMod-MR 2.1 to pool effect sizes from included studies and generate a dose-response curve for each of the outcomes associated with LDL. The tool enabled us to incorporate random effects across studies and include data with different age ranges. RRs were used universally for all countries and produce RRs with uncertainty and covariance across ages, considering the uncertainty of the data points. As in GBD 2017, RRs for IHD and ischaemic stroke are obtained from meta-regressions of pooled epidemiological studies: the Asia Pacific Cohort Studies Collaboration (APCSC) and the Prospective Studies Collaboration (PSC). 6 RRs for IHD were modelled with log (RR) as the dependent variable and median age at event as the independent variable with an age intercept (RR equals 1) at age 110. For LDL and ischaemic stroke, a similar approach was used, except that there was no age intercept at age 110, due to the fact that there was no statistically significant relationship between LDL and stroke after age 70 with a mean RR less than one. We assumed that there is not a protective effect of LDL and therefore did not include an RR for ages 80+. Case definitions In GBD 2019, new data were added from sources included in the annual GHDx update of known survey series. We conducted a systematic review in GBD 2017 to identify studies providing nationally or subnationally representative estimates of overweight prevalence, obesity prevalence, or mean bodymass index (BMI). We limited the search to literature published between January 1, 2016, and December 31, 2016, to update the systematic literature search previously performed as part of GBD 2015. The search for adults was conducted on 4 January 2017, using the following terms: We included representative studies providing data on mean BMI or prevalence of overweight or obesity among adults or children. For adults, studies were included if they defined overweight as BMI≥25 kg/m 2 and obesity as BMI≥30 kg/m 2 , or if estimates using those cutoffs could be back-calculated from reported categories. For children (children ages [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] , studies were included if they used International Obesity Task Force (IOTF) standards to define overweight and obesity thresholds. We only included studies reporting data collected after January 1, 1980. Studies were excluded if they used non-random samples (eg, case-control studies or convenience samples), conducted among specific subpopulations (eg, pregnant women, racial or ethnic minorities, immigrants, or individuals with specific diseases), used alternative methods to assess adiposity (eg, waist-circumference, skin-fold thickness, or hydrodensitometry), had sample sizes of less than 20 per age-sex group, or provided inadequate information on any of the inclusion criteria. We also excluded review articles and non-English-language articles. Where individual-level survey data were available, we computed mean BMI using weight and height. We then used BMI to determine the prevalence of overweight and obesity. For individuals aged over 19 years, we considered them to be overweight if their BMI was greater than or equal to 25 kg/m 2 , and obese if their BMI was greater than or equal to 30 kg/m 2 . For individuals aged 2 to 19 years, we used monthly IOTF cutoffs 2 to determine overweight and obese status when age in months was available. When only age in years was available, we used the cutoff for the midpoint of that year. Obese individuals were also considered to be overweight. We excluded studies using the World Health Organization (WHO) standards or country-specific cutoffs to define childhood overweight and obesity. At the individual level, we considered BMI<10 kg/m 2 and BMI>70 kg/m 2 to be biologically implausible and excluded those observations. The rationale for choosing to use the IOTF cutoffs over the WHO standards has been described elsewhere. 1 Briefly, the IOTF cutoffs provide consistent child-specific standards for ages 2-18 derived from surveys covering multiple countries. By contrast, the WHO growth standards apply to children under age 5, and the WHO growth reference applies to children ages 5-19. The WHO growth reference for children ages 5-19 was derived from United States data, which are less representative than the multinational data used by IOTF. Additionally, the switch between references at age 5 can produce artificial discontinuities. Given that we estimate global childhood overweight and obesity for ages 2-19 (with ages 19 using standard adult cutoffs), the IOTF cutoffs were preferable. Additionally, we found that IOTF cutoffs were more commonly used in scientific literature covering childhood obesity. From report and literature data, we extracted data on mean BMI, prevalence of overweight, and prevalence of obesity, measures of uncertainty for each, and sample size, by the most granular age and sex groups available. Additionally, we extracted the same study-level covariates as were extracted from microdata (measurement, urbanicity, and representativeness), as well as location and year. In addition to the primary indicators described above, we extracted relevant survey-design variables, including primary sampling unit, strata, and survey weights, which were used to tabulate individual-level microdata and produce accurate measures of uncertainty. We extracted three study-level covariates: 1) whether height and weight data were measured or self-reported; 2) whether the study was predominantly conducted in an urban area, rural area, or both; and 3) the level of representativeness of the study (national or subnational). Finally, we extracted relevant demographic indicators, including location, year, age, and sex. We estimated the standard error of the mean from individual-level data, where available, and used the reported standard error of the mean for published data. When multiple data sources were available for the same country, we included all of them in our analysis. If data from the same data source were available in multiple formats such as individual-level data and tabulated data, we used individual-level data. Age and sex splitting Any report or literature data provided in age groups wider than the standard five-year age groups or as both sexes combined were split using the approach used by Ng and colleagues. 2 Briefly, age-sex patterns were identified using sources with data on multiple age-sex groups and these patterns were applied to split aggregated report and literature data. Uncertainty in the age-sex split was propagated by multiplying the standard error of the data by the square root of the number of splits performed. We did not propagate the uncertainty in the age pattern and sex pattern used to split the data as they seemed to have small effect. We included both measured and self-reported data. We tested for bias in self-report data compared to measured data, which is considered to be the gold-standard. There was no clear direction of bias for children ages 2-14, so for these age groups we only included measured data. For individuals ages 15 and above, we adjusted self-reported data for overweight prevalence and obesity prevalence. In GBD 2017, the self-report bias adjustment used a nested hierarchical mixed-effects regression model. This approach was updated in GBD 2019 to utilise the power of MR-BRT. For both overweight and obesity, we fit sex-specific MR-BRT models on the logit difference between measured and self-reported with a fixed effect on super-region. The bias coefficients derived from these two models are in Table 1 and 2. After adjusting for self-report bias and splitting aggregated data into five-year age-sex groups, we used spatiotemporal Gaussian process regression (ST-GPR) to estimate the prevalence of overweight and obesity. This modelling approach has been described in detail elsewhere. The linear model, which when added to the smoothed residuals forms the mean prior for GPR is as follows: logit(overweight) , , = β + β energy , + β SDI , + β vehicles , + β agriculture , + β I [ ] + α + α + α logit(obesity/overweight) , , = β + β energy , + β SDI , + β vehicles , + β I [ ] + α + α + α where energy is ten-year lag-distributed energy consumption per capita, SDI is a composite index of development including lag-distributed income per capita, education, and fertility, vehicles is is the number of two-or four-wheel vehicles per capita, and agriculture is the proportion of the population working in agriculture. I [ ] is a dummy variable indicating specific age group A that the prevalence point captures, and α , α , and α are super-region, region, and country random intercepts, respectively. Random effects were used in model fitting but were not used in prediction. We tested all combinations of the following covariates to see which performed best in terms of insample AIC for the overweight linear model and the obesity as a proportion of overweight linear model: ten-year lag-distributed energy per capita, proportion of the population living in urban areas, SDI, lagdistributed income per capita, educational attainment (years) per capita, proportion of the population working in agriculture, grams of sugar adjusted for energy per capita, grams of sugar not adjusted for energy per capita, and the number of two-or four-wheeled vehicles per capita. We selected these candidate covariates based on theory as well as reviewing covariates used in other publications. The final linear model was selected based on 1) if the direction of covariates matched what is expected from theory, 2) all the included covariates were significant, and 3) minimising in-sample AIC. The covariate selection process was performed using the dredge package in R. To estimate the mean BMI for adults in each country, age, sex, and time period 1980-2019, we first used the following nested hierarchical mixed-effects model, fit using restricted maximum likelihood on data from sources containing estimates of all three indicators (prevalence of overweight, prevalence of obesity, and mean BMI), in order to characterise the relationship between overweight, obesity, and mean BMI: where ow , , , is the prevalence of overweight in country c, age a, sex s, and year t, ob , , , is the prevalence of obesity in country c, age a, sex s, and year t, sex is a fixed effect on sex, IA[a] is an indicator variable for age, and α , α , and α are random effects at the super-region, region, and country, respectively. The model was run in Stata 13. We applied 1000 draws of the regression coefficients to the 1000 draws of overweight prevalence and obesity prevalence produced through ST-GPR to estimate 1000 draws of mean BMI for each country, year, age, and sex. This approach ensured that overweight prevalence, obesity prevalence, and mean BMI were correlated at the draw level and uncertainty was propagated. We used the ensemble distribution approach described in the manuscript. We fit ensemble weights by source and sex, with source-and sex-specific weights averaged across all sources included to produce the final global weights. The ensemble weights were fit on measured microdata. The final ensemble weights were exponential = 0.002, gamma = 0.028, inverse gamma = 0.085, log-logistic = 0.187, Gumbel = 0.220, Weibull = 0.011, log-normal = 0.058, normal = 0.012, beta = 0.136, mirror gamma = 0.008, and mirror Gumbel = 0.113. One thousand draws of BMI distributions for each location, year, age group, and sex estimated were produced by fitting an ensemble distribution using 1000 draws of estimated mean BMI, 1000 draws of estimated standard deviation, and the ensemble weights. Estimated standard deviation was produced by optimising a standard deviation to fit estimated overweight prevalence draws and estimated obesity prevalence draws. Risk-outcome pairs were defined based on strength of available evidence supporting a causal effect. We performed a systematic review of published meta-analyses, pooled analyses, and systematic reviews available through PubMed using the following search string: ("Body Mass Index"[Mesh] OR "Overweight"[Mesh] OR "Obesity"[Mesh]) AND (Meta-Analysis[ptyp] OR "systematic review"[tiab] OR "pooled analysis"[tiab]). Inclusion criteria are 1) the health outcome is included in GBD, 2) at least one prospective cohort is included, and 3) that the summary effect size is statistically significant. For outcomes meeting inclusion criteria we completed causal criteria tables to evaluate the strength of evidence supporting a causal relationship (see Appendix Table 4 ). Gallbladder disease, cataract, multiple myeloma, gout, non-Hodgkin lymphoma, asthma, Alzheimer's disease, and atrial fibrillation were added as new outcomes in GBD 2016, resulting in a total of 38 outcomes. For adults (ages 20+), the theoretical minimum risk exposure level (TMREL) of BMI (20-25 kg/m 2 ) was determined based on the BMI level that was associated with the lowest risk of all-cause mortality in prospective cohort studies. 3 For children (ages 2-19), the TMREL is "normal weight," that is, not overweight or obese, based on IOTF cutoffs. The relative risk per five-unit change in BMI for each disease endpoint was obtained from metaanalyses, and where available, pooled analyses of prospective observational studies. In cases where a relative risk per five-unit change in BMI was not available we computed our own dose-response metaanalysis using two-step generalised least squares for time trends estimation methods. For childhood outcomes (ages 2-19), we computed categorical relative risks for overweight and obesity using a random effects meta-analysis. The kidney dysfunction risk factor exposure is divided into four categories of renal function defined by urinary albumin to creatinine ratio (ACR) and estimated glomerular filtration rate (eGFR):  Albuminuria with preserved eGFR (ACR >30 mg/g & eGFR >=60 ml/min/1.73m 2 ); this corresponds to stages 1 and 2 chronic kidney disease (CKD) in the Kidney Disease Improving Global Outcomes (KDIGO) classification  CKD stage 3 (eGFR of 30-59 ml/min/1.73m 2 );  CKD stage 4 (eGFR of 15-29 ml/min/1.73m 2 ); and  CKD stage 5 (eGFR <15ml/min/1.73m 2 , not (yet) on renal replacement therapy). The modelling of renal function prevalence estimates is described in detail in the CKD section of the appendix to the GBD 2019 disease and injury paper. The theoretical minimum-risk exposure level is ACR 30 mg/g or less and eGFR greater than 60ml/min/1.73m 2 . An ACR above 30 mg/g and eGFR below 60ml/min/1.73m 2 have been demonstrated in the literature to be the thresholds at which increased cardiovascular and gout events occur secondary to kidney dysfunction.(1-10) The last systematic review of prevalence of low glomerular filtration rate was conducted for GBD 2016, updating searches done in GBD 2015, GBD 2013, and GBD 2010. Exclusion criteria included surveys that were not population-representative and studies not reporting on CKD by stage. Input data Exposure Source count (total) 98 Number of countries with data 35 Input data Relative risk Source count (total) 9 We model the proportion of cardiovascular and musculoskeletal diseases attributable to kidney dysfunction. This is performed by 1) running DisMod-MR 2.1 models to estimate the prevalence of albuminuria, stage 3 CKD, stage 4 CKD, and stage 5 CKD; 2) estimate relative risks from available data on cardiovascular outcomes and gout; 3) calculate the population attributable fraction of those outcomes to IKF. The prevalence of exposure to albuminuria and CKD were obtained from the GBD 2019 non-fatal burden of disease analysis. Data on relative risks were contributed by the Chronic Kidney Disease Prognosis Consortium (CKD-PC). The Chronic Kidney Disease Prognosis Consortium is a research group composed of investigators representing cohorts from around the world. Investigators share data for the purpose of collaborative meta-analyses to study prognosis in CKD. We estimate burden attributable to kidney dysfunction for cardiovascular diseases, chronic kidney diseases, and gout. In GBD 2017, we relied on a pooled cohort analysis of six cohort studies from the CKD-PC. For GBD 2019, in collaboration with CKD-PC, we got data on 38 new cohorts and continued to use the original from the previous analysis. We ran these new data through MR-BRT meta-regression to determine the relationship between age and outcomes based on exposure to IKF. Estimates were nested within cohorts. A three-degree spline was placed on age with decreasing monotonicity. All relative risk estimates for stroke and ischaemic heart disease above age 85 were set equal to the risk at age 85 to control for lack of data in older age groups. Gout currently uses GBD 2017 estimations of relative risk. We ran some sensitivity analyses with and without controlling for blood pressure. This is because IKF increases the risk of cardiovascular diseases directly, as well as through blood pressure. We wanted to understand how estimates of risk would differ. Generally, the relative risk of cardiovascular disease was lower when controlling for blood pressure. We decided to go with this lower risk that controlled for hypertension for a more conservative estimate. The following plot shows the relative risks for heart disease and stroke by each stage of CKD. As expected, stage 5 and stage 4 CKD have higher risks overall. Risks is also higher at younger ages and lower at the oldest age, likely reflecting competing risk factors. While the risks themselves dip below zero at the oldest age, we believe this is merely a function of lack of data above age 85. Because of this, our estimates for relative risk above age 85 take the estimate at age 85. We also include two forest plots to show the distribution of risk estimates for heart disease and stroke across our studies. In general, we see an expected pattern, with earlier stages of CKD with lower risks. where RRi is the relative risk for exposure level i, Pi is the proportion of the population in that exposure category, and n is the number of exposure categories. (11) Primary changes between GBD 2017 and GBD 2019 The following are the main changes in the GBD 2019 modelling strategy compared to GBD 2017: 1. In GBD 2019, we used MR-BRT to run a nested meta-regression analysis on the within-study sex ratios to estimate a pooled sex ratio with 95% confidence intervals. In GBD 2017, this was estimated in DisMod-MR 2.1. 3. In GBD 2017, the RRs were estimated via a pooled cohort meta-regression conducted in R using the metafor package. In GBD 2019, we made use of MR-BRT to run a nested meta-regression analysis that allowed more flexibility in the estimation process. Exposure to ambient particulate matter pollution is defined as the population-weighted annual average mass concentration of particles with an aerodynamic diameter less than 2.5 micrometers (PM2.5) in a cubic meter of air. This measurement is reported in µg/m 3 . The data used to estimate exposure to ambient particulate matter pollution comes from multiple sources, including satellite observations of aerosols in the atmosphere, ground measurements, chemical transport model simulations, population estimates, and land-use data. Table 1 summarizes exposure input data. The following details the updates in methodology and input data used in GBD 2019. Ground measurements used for GBD 2019 include updated measurements from sites included in 2017 and additional measurements from new locations. New and up-to-date data (mainly from the USA, Canada, EU, Bangladesh, China and USA embassies and consulates), were added to the data from the 2018 update of the WHO Global Ambient Air Quality Database used in GBD 2017. The updated data included measurements of concentrations of PM10 and PM2.5 from 10,408 ground monitors from 116 countries from 2010 to 2017. The majority of measurements were recorded in 2016 and 2017 (as there is a lag in reporting measurements, few data from 2018 or newer were available). Annual averages were excluded if they were based on less than 75% coverage within a year. If information on coverage was not available, then data were included unless there were already sufficient data within the same country (monitor density greater than 0.1). For locations measuring only PM10, PM2.5 measurements were estimated from PM10. This was performed using a hierarchy of conversion factors (PM2.5/PM10 ratios): (i) for any location a 'local' conversation factor was used, constructed as the ratio of the average measurements (of PM2.5 and PM10) from within 50km of the location of the PM10 measurement, and within the same country, if such measurements were available; (ii) if there was not sufficient local information to construct a conversion factor then a country-wide conversion factor was used; and (iii) if there was no appropriate information within a country, then a regional factor was used. In each case, to avoid the possible effects of outliers in the measured data (both PM2.5 and PM10), extreme values of the ratios were excluded (defined as being greater/lesser than the 95% and 5% quantiles of the empirical distributions of conversion factors). As with GBD 2013, 2015, 2016, and 2017 databases, in addition to values of PM2.5 and whether they were direct measurement or converted from PM10, the database also included additional information, where available, related to the ground measurements such as monitor geo-coordinates and monitor site type. The global geophysical PM2. The following is a summary of the modelling approach, known as the Data Integration Model for Air Quality (DIMAQ) used in GBD 2015, 2016, 2017, and now in GBD 2019. 3, 4 Before the implementation of DIMAQ (ie, in GBD 2010 and GBD 2013), exposure estimates were obtained using a single global function to calibrate available ground measurements to a "fused" estimate of PM2.5; the mean of satellite-based estimates and those from the TM5 chemical transport model, calculated for each 0.1 o ×0.1 o grid cell. This was recognised to represent a tradeoff between accuracy and computational efficiency when utilising all the available data sources. In particular, the GBD 2013 exposure estimates were known to underestimate ground measurements in specific locations (see discussion in Brauer and colleagues, 2015) . 5 This underestimation was largely due to the use of a single, global calibration function, whereas in reality the relationship between ground measurements and other variables will vary spatially. In GBD 2015 and GBD 2016, coefficients in the calibration model were estimated for each country. Where data were insufficient within a country, information can be "borrowed" from a higher aggregation (region) and, if enough information is still not available, from an even higher level (superregion). Individual country-level estimates were therefore based on a combination of information from the country, its region, and its super-region. This was implemented within a Bayesian hierarchical modelling (BHM) framework. BHMs provide an extremely useful and flexible framework in which to model complex relationships and dependencies in data. Uncertainty can also be propagated through the model, allowing uncertainty arising from different components, both data sources and models, to be incorporated within estimates of uncertainty associated with the final estimates. The results of the modelling comprise a posterior distribution for each grid cell, rather than just a single point estimate, allowing a variety of summaries to be calculated. The primary outputs here are the median and 95% credible intervals for each grid cell. Based on the availability of ground measurement data, modelling and evaluation were focused on the year 2016. The model used in GBD 2017 and GBD 2019 also included within-country calibration variation. 6 The model used for GBD 2019, henceforth referred to as DIMAQ2, provides a number of substantial improvements over the initial formulation of DIMAQ. In DIMAQ, ground measurements from different years were all assumed to have been made in the primary year of interest and then regressed against values from other inputs (eg, satellites, etc.) made in that year. In the presence of changes over time, therefore, and particularly in areas where no recent measurements were available, there was the possibility of mismatches between the ground measurements and other variables. In DIMAQ2, ground measurements were matched with other inputs (over time), and the (global-level) coefficients were allowed to vary over time, subject to smoothing that is induced by a first-order random walk process. In addition, the manner in which spatial variation can be incorporated within the model has developed: where there are sufficient data, the calibration equations can now vary (smoothly) both within and between countries, achieved by allowing the coefficients to follow (smooth) Gaussian processes. Where there are insufficient data within a country, to produce accurate equations, as before, information is borrowed from lower down the hierarchy and it is supplemented with information from the wider region. DIMAQ2 as described above is used for all regions except for the north Africa and Middle East and sub-Saharan Africa super-regions, where there are insufficient data across years to allow the extra complexities of the new model to be implemented. In these super-regions, a simplified version of DIMAQ2 is used in which the temporal component is dropped. Model development and comparison was performed using within-and out-of-sample assessment. In the evaluation, cross-validation was performed using 25 combinations of training (80%) and validation (20%) datasets. Validation sets were obtained by taking a stratified random sample, using sampling probabilities based on the cross-tabulation of PM2.5 categories (0-24.9, 25-49.9, 50-74.9, 75-99.9, 100+ µg/m 3 ) and super-regions, resulting in them having the same distribution of PM2.5 concentrations and super-regions as the overall set of sites. The following metrics were calculated for each training/evaluation set combination: for model fit -R 2 and deviance information criteria (DIC, a measure of model fit for Bayesian models); for predictive accuracy -root mean squared error (RMSE) and population weighted root mean squared error (PwRMSE). The median R 2 was 0.9, and the median PwRMSE was 10.1 µg/m 3 . All modelling was performed on the log-scale. The choice of which variables were included in the model was made based on their contribution to model fit and predictive ability. The following is a list of variables and model structures that were included in DIMAQ. Random effects: o Regional temporal (random walk) hierarchical random-effects on the intercept o Regional hierarchical random-effects for the coefficient associated with SAT o Regional hierarchical random-effects for the coefficient associated with The TMREL was assigned a uniform distribution with lower/upper bounds given by the average of the minimum and fifth percentiles of outdoor air pollution cohort studies exposure distributions conducted in North America, with the assumption that current evidence was insufficient to precisely characterise the shape of the concentration-response function below the fifth percentile of the exposure distributions. The TMREL was defined as a uniform distribution rather than a fixed value in order to represent the uncertainty regarding the level at which the scientific evidence was consistent with adverse effects of exposure. The specific outdoor air pollution cohort studies selected for this averaging were based on the criteria that their fifth percentiles were less than that of the American Cancer Society Cancer Prevention II (CPSII) cohort's fifth percentile of 8.2 based on Turner and colleagues (2016). 10 This criterion was selected since GBD 2010 used the minimum, 5.8, and fifth percentile solely from the CPS II cohort. The resulting lower/upper bounds of the distribution for GBD 2019 were 2.4 and 5.9. This has not changed since GBD 2015. We create one set of cause-specific risk curves for both household air pollution and ambient air pollution as two different sources of PM2.5. In GBD 2017, we estimated the particulate matterattributable burden of disease based on the relation of long-term exposure to PM2.5 with Ischemic Heart Disease, stroke (ischemic and hemorrhagic), COPD, lung cancer, acute lower respiratory infection, and Type II Diabetes. In GBD 2019, we added adverse birth outcomes including low birthweight and short gestation. Because these are already risk factors (and not outcomes) in the GBD, we performed a mediation analysis, in which a proportion of the burden attributable to low birthweight and short gestation was attributed to PM2.5 pollution. For the six non-mediated outcomes, we used results from cohort and case-control studies of ambient PM2.5 pollution, cohort studies, case-control studies, and randomised-controlled trials of household use of solid fuel for cooking, and cohort and case-control studies of secondhand smoke. For the first time in GBD 2019, we no longer use active smoking data in the risk curves For GBD 2019, we made several important changes to the risk functions. Previously, we have used relative risk estimates for active smoking, converting cigarettes-per-day to PM2.5 exposure in order to estimate the PM2.5 relative risk at the highest end of the PM2.5 exposure-response curve. We took this approach because the vast majority of the air pollution epidemiological studies have been performed in low-pollution settings in high-income countries, preventing us from extrapolating the steep relationship at the beginning of the exposure range to locations with high exposure but no relative risk estimates, such as India and China. However, with the recent publication of studies in China and other higherexposure settings and additional studies of HAP, we have been able to include more estimates at high PM2.5 levels in the model. 11, 12, 13, 14, 15 Furthermore, in contrast to previous cycles of the GBD where the power function used to develop the IER required the inclusion of active smoking data to anchor the risk function, with the current use of splines and their flexibility, it is easier to fit functions to the (ambient, household, and SHS) data without active smoking data. Beginning in GBD 2019, we excluded active smoking studies from the risk curves. Removal of active smoking information removes an important source of uncertainty in our earlier estimates related to differences in dose rates and other aspects of exposure between active smoking and the other PM2.5 sources, including differences in voluntary (active smoking) and involuntary (ambient and household PM2.5, secondhand smoke) exposure. 16, 17 Additionally, in the past, we have built the curves for ischaemic heart disease and stroke based on studies of mortality and used evidence from three studies of both mortality and incidence to scale down the mortality curves to generate estimates of incidence risk. This year we extracted incidence and mortality from all available studies and included this as a covariate in the model. There was no significant difference between estimates of incidence risk and mortality risk, so we included both types of risk estimates in the curve fitting and used the same curve for both incidence and mortality. This is what was done for all other outcomes in the past and in GBD 2019. For cardiovascular diseases, evidence suggests that the relative risk decreases with age. 18 To account for this in our model, we generate unique risk curves for every five-year age group from 25-29 to 95 and older for both ischaemic heart disease and stroke. Because we do not have risk data for every unique age group, we adjust each study based on the median age during follow-up to generate a full adjusted dataset for every curve. We calculate the median age of follow-up by taking the median (or mean) age at enrollment and adding one-half of median or mean follow-up time. If follow-up time is not available, we take 70% of total study period based on the observed ratio of follow-up time to total study period for other studies. Once we have a median age during follow-up (a), we extrapolate each study to the full set of ages where the estimated datapoint for age, aj, is calculated with the following equation and accompanying explanatory figure: Previously we have used a fixed functional form to fit the risk curves. 16 In GBD 2019, we used MR-BRT (described in detail elsewhere) splines to fit the risk data with a more flexible shape. While previously we built in the TMREL estimates into the model fitting, this year we have fit the curve beginning at zero exposure and incorporate the TMREL into the relative risk calculation process. This allows others to use our risk curves with whatever counterfactual level is of interest to them. Relative risk curves are available upon request. When fitting the risk curves, we consider the published relative risk over a range of exposure data. For OAP studies, the relative risk informs the curve from the fifth to the 95 th percentile of observed exposure. When this is not available in the published study, we estimate the distribution from the provided information (mean and standard deviation, mean and IQR, etc.). We scale the RR to this range. For HAP studies, we allow each study to inform the curve from the ExpOAP to ExpOAP+ExpHAP, where ExpOAP is the GBD 2017 estimate of the ambient exposure level in the study location and year, and ExpHAP is the GBD 2017 estimate of the excess exposure for those who use solid fuel for cooking in the study location and year. For SHS studies, we updated our strategy of exposure estimation in GBD 2019. For the first time, we are also accounting for outdoor exposure. Similar to the approach used for HAP, we allow each study to inform the curve from the ExpOAP to ExpOAP+ExpSHS, where ExpOAP is the GBD 2017 estimate of the ambient exposure level in the study location and year, and ExpSHS is an estimate of the excess exposure for those who experience secondhand smoke. This is estimated from the number of cigarettes smoked per smoker per day in a given location and year, estimated by the smoking team of GBD, and from a study in Sweden, which measured the PM2.5 exposure in homes of smokers. 19 We fit splines on the datasets including studies of OAP, HAP, and SHS using the following functional form, where X and XCF represent the range of exposure characterised by the effect size: For each of the risk-outcome pairs, we tested various model settings and priors in fitting the MR-BRT splines. The final models used third-order splines with two interior knots and a constraint on the rightmost segment, forcing the fit to be linear rather than cubic. We used an ensemble approach to knot placement, wherein 100 different models were run with randomly placed knots and then combined by weighting based on a measure of fit that penalises excessive changes in the third derivative of the curve. Knots were free to be placed anywhere within the fifth and 95th percentile of the data, as long as a minimum width of 10% of that domain exists between them. We included shape constraints so that the risk curves were concave down and monotonically increasing, the most biologically plausible shape for the PM2.5 risk curve. On the non-linear segments, we included a Gaussian prior on the third derivative of mean 0 and variance 0.01 to prevent over-fitting; on the linear segment, a stronger prior of mean 0 and variance 1e-6 was used to ensure that the risk curves do not continue to increase beyond the range of the data. For chronic obstructive pulmonary disease, we used a looser Gaussian prior of mean 0 and variance 1e-4 on the linear segment of the risk function. For this outcome, we have epidemiological evidence from household air pollution that the risk continues to increase at higher levels of PM2.5. Table 2 summarizes relative risk input data for ambient particulate matter pollution and household air pollution. The following figures display risk curves for each outcome. The dashed line depicts the GBD 2017 IER including active smoking data, the dotted line depicts the GBD 2019 IER including active smoking data and updates to the AS and SHS exposure incorporation, and the solid line depicts the GBD 2019 MR-BRT curve without the inclusion of active smoking data. The grey shaded areas represent the 95% CI. The red box represents the TMREL area of the curve. On each page, the first figure depicts the typical range of outdoor exposure, whereas the second plot includes higher levels typical of household air pollution exposure. Each point or number represents one study effect size. Each is plotted at the 95 th percentile of the exposure distribution (OAP), the expected level of exposure for individual using solid fuel (HAP), or the expected level of exposure for individuals experiencing SHS. The relative risk is plotted relative to the predicted relative risk at the fifth percentile of exposure distribution (OAP), the expected (ambient only) level of exposure for individuals not using solid fuel (HAP), or the expected (ambient only) level of exposure for individuals not exposed to SHS. For example, a study predicting a relative risk of 1.5 for an exposure range of 10 to 20 would be plotted at (20, MRBRT(10)*1.5). Arrows represent studies that would have been outside the range of the plot but have been moved to include on the figure. The outcomes of low birthweight and short gestation include mortality due to diarrhoeal diseases, lower respiratory infections, upper respiratory infections, otitis media, meningitis, encephalitis, neonatal preterm birth, neonatal encephalopathy due to birth asphyxia and trauma, neonatal sepsis and other neonatal infections, haemolytic disease and other neonatal jaundice, and other neonatal disorders. We also calculate attributable YLDs for neonatal preterm birth. These are specific to ages 0-6 days and 7-27 days. In partnership with Dr. Rakesh Ghosh at the University of California, San Francisco, we conducted a systematic review of all cohort, case-control, or randomised-controlled trial studies of ambient PM2.5 pollution or household air pollution and birthweight or gestational age outcomes. Outcomes measured included continuous birthweight (bw), continuous gestational age (ga), low birthweight (LBW) (<2500 g), preterm birth (PTB) (<37 weeks), and very preterm birth (VPTB) (<32 weeks). We included any papers published until March 31, 2018. Systematic review PRISMA diagrams are below. Ambient particulate matter pollution, low birth weight Ambient particulate matter pollution, preterm birth Household air pollution, all outcomes The following plots depict forest and funnel plots for studies of OAP and birthweight, low birthweight, and preterm birth. Note that these plots do not capture the exposure level of these studies but the linear risk or difference in birthweight per 10-unit increase in PM2.5 exposure. Birth weight For studies of household air pollution, we used the same strategy described above to map them to PM2.5 exposure values. Because birthweight and gestational age are modelled using a continuous joint distribution for the GBD, we were interested in how those distributions changed under the influence of PM2.5 pollution. We therefore estimated the continuous shift in birthweight (bw, in grams) and gestational age (ga, in weeks) at a given PM2.5 exposure level. When available, we used estimates of continuous shift in bw or ga directly from each study. When that was not available, we used the published OR/RR/HR for LBW, PTB, or VPTB and the following strategy: 1. Extract the OR/RR/HR from the study. 2. Select the GBD 2017 estimated bw-ga joint distribution for the study location and year. 3. Calculate the number of grams or weeks required to shift the distribution such that the proportion of births under the specified threshold (P) is reduced by the study effect size to a counterfactual level (Pcf). 4. Save the resulting shift and 95% CI as the continuous effect. We then fit a MR-BRT spline to these studies, where the difference in the value of the model at the upper concentration (X) and the value of the model at the counterfactual concentration (XCF) is equal to the published or calculated shift in bw or ga. We fit the same model and priors as the non-mediated outcomes (with the exception of COPD), except, because the change in birthweight and gestational age was expected to be negative, the shape constraints were monotonically decreasing and concave up. The following figures depict the MR-BRT curves for shift in grams (bw) and weeks (ga). Once we had curves of estimated shifts across the exposure range, we predicted the shift in both birthweight and gestational age for total female particulate matter pollution exposure in each location and year. Because the epidemiological studies mutually controlled for birthweight and gestational age, we assumed these shifts are independent. We then shifted the observed distributions to reflect the expected bwga distribution in the absence of particulate matter pollution. These shifted distributions were used as the counterfactual in the PAF calculation equation to calculate the burden attributable to PM2.5 pollution. To calculate PAFs, the distribution is divided into 56 bw-ga categories, each with a unique RR. Let pi be the observed proportion of babies in category, i and pi' be the counterfactual proportion of babies in category, i if there were no particulate matter pollution. We proportionately split this PAF to ambient and HAP based on exposure as is described below. One important assumption to note is that we are assuming the shift in bw and ga is linear across the bwga distribution. For lower respiratory infections, we have directly estimated PAFs attributable to PM2.5 in addition to those mediated through birthweight and gestational age. We would expect that some of the directly estimated PAFs are mediated through bw and ga. Additionally, the directly estimated PAF is based on a summary of relative risks for all children under 5 years, so there is a chance that the mediated PAF, which is more finely resolved, could be greater. To avoid double-counting for these two age groups (0-6 days and 0-27 days), we take the max of the two PAF estimates. If the directly estimated PAF is greater than the bw-ga-mediated PAF, we take the direct estimate, and if the mediated PAF is greater, we take the mediated. PTB incidence and mortality are both outcomes measured in the GBD. 100% of the burden for this cause is attributable to short gestation. To calculate the percentage attributable to particulate matter pollution, we estimated the percentage of babies born at less than 37 weeks (pptb) and the percentage of babies that would have been born at less than 37 weeks in the counterfactual scenario of no particulate matter pollution (pptb'). Although in GBD 2019 we have not used active smoking data to estimate the risk curves, we are still using an integrated exposure response approach because we are integrating relative risk estimates across various exposure sources: ambient, SHS, and HAP. The use of various sources to construct a risk curve with PM2.5 as the exposure indicator assumes equitoxicity of particles, despite some evidence suggesting differences in health impact by PM source, size, and chemical composition. However, in the absence of consistent and robust evidence of differential toxicity by source and sufficient estimates of source or composition-specific exposure-response relationships, integrating across OAP, SHS, and HAP studies is the approach most consistent with the current evidence, as reviewed by US EPA and WHO. 20, 21 Use of a common risk function may affect the magnitude of risk estimates for HAP and OAP compared to separate risk functions. As more data from higher OAP concentration locations and from HAP studies for non-respiratory outcomes becomes available it may be possible to evaluate the strength of evidence for each and to develop separate risk functions. Proportional PAF approach Prior to GBD 2017, relative risks for both exposures were obtained from the IER as a function of exposure and relative to the same TMREL. In reality, were a country to reduce only one of these risk factors, the other would remain. We did not consider the joint effects of particulate matter from outdoor exposure and burning solid fuels for cooking. For GBD 2017 we developed a new approach to use the IER for obtaining PAFs for both OAP and HAP: Let be the ambient PM2.5 exposure level and be the excess exposure for those who use solid fuel for cooking. Let be the proportion of the population using solid fuel for cooking. We calculated PAFs at each 0.1 o ×0.1 o grid cell. We assumed that the distribution of those using solid fuel for cooking (HAP) was equivalent across all grid cells of the GBD location. For the proportion of the population not exposed to HAP the relative risk was: And for those exposed to HAP, the relative risk was We then calculate a population level RR and PAF for all particulate matter exposure. We population weight the grid-cell level particulate matter PAFs to get a country level PAF, and finally, we split this PAF based on the average exposure to each OAP and HAP. , and = * * . With this strategy, = + , and no burden is counted twice. Exposure to household air pollution from solid fuels (HAP) is estimated from both the proportion of individuals using solid cooking fuels and the level of PM2.5 air pollution exposure for these individuals. Solid fuels in our analysis include coal, wood, charcoal, dung, and agricultural residues. We extracted information on use of solid fuels from the standard multi-country survey series such as We also excluded sources that did not distinguish specific primary fuel types, estimated fuel used for purposes other than cooking (eg, lighting or heating), failed to report standard error or sample size, had over 15% of households with missing responses, reported fuel use in physical units, or were secondary sources referencing primary analyses. Table 1 summarizes exposure input data. We then apply this coefficient to household-only reports with the following formula: = the proportion of individuals using solid fuel for cooking, and = the proportion of households using solid fuel for cooking. The effect is that the household studies are inflated to account for bias. Larger households are more likely to use solid fuel for cooking. The following figure depicts the 3676 data points that informed the crosswalk model. There the red points indicate the 10% of studies that were trimmed as outliers. Household air pollution was modelled at individual level using a three-step modelling strategy that uses linear regression, spatiotemporal regression, and Gaussian process regression (GPR). The first step is a mixed-effect linear regression of logit-transformed proportion of individuals using solid cooking fuels. The linear model contains maternal education and the proportion of population living in urban areas as covariates and has nested random effects by GBD region and GBD super-region. The full ST-GPR process is specified elsewhere this appendix. No substantial modelling changes were made in this round compared to GBD 2017. First-stage linear model and coefficients For cataract, the TMREL is defined as no households using solid cooking fuel. For outcomes related to both ambient and household air pollution, the PAFs are estimated jointly and the TMREL is defined as uniform distribution between 2.4 and 5.9 ug/m 3 PM2.5. In addition to the previously included outcomes of lower respiratory infections (LRI), stroke, ischaemic heart disease (IHD), chronic obstructive pulmonary disease (COPD), lung cancer, type 2 diabetes, and cataract, in GBD 2019 we added low birthweight and short gestation as new outcomes of household air pollution through a mediation analyses. With the exception of cataract, all causes share risk curves and are jointly calculated with ambient PM2.5 air pollution. Table 2 summarizes relative risk input data for ambient particulate matter pollution and household air pollution. Prior to GBD 2019, we utilised the results of an external meta-analysis with a summary relative of 2.47 with 95% CI (1.63, 3.73). 1 While this effect estimate was for both sexes, in the past we estimated burden for women only because women are known to have higher HAP exposure than men. In GBD 2019, we performed our own meta-regression analysis of household air pollution and cataracts. We extracted all of the components studies of the above meta-analysis paper but excluded one cross-sectional study. GBD risk factor analyses typically do not include cross-sectional analyses. In additional literature search, we found one additional paper describing different fuel types and cataracts. 4 We excluded this study because there was no comparison group without solid fuel use. Our resulting dataset contained eight estimates from six sources in India and Nepal. On these eight estimates, we ran a MR BRT meta-regression to generate a summary effect size of 2 Studies reported effect sizes for males, females, and/or both sexes. In a sensitivity analysis we included a covariate for sex and found no significant difference in effect size by sex. Therefore, we now estimate cataract as an outcome of household air pollution in both males and females. In GBD 2019, we also made substantial changes to our particulate matter risk curves. These risk curves, utilising splines in MR-BRT, the new mediation analysis with birthweight and gestational age, and the joint-estimation PAF approach are described in the ambient particulate matter appendix. In order to use the particulate matter risk curves, we must estimate the level of exposure to particulate matter with diameter of less than 2.5 micrometers (PM2.5) for individuals using solid fuels for cooking. The Global Household Air Pollution (HAP) Measurements database from WHO contains 196 studies with measurements from 43 countries of various pollution metrics in households using solid fuel for cooking. 2 From this database, we take all measurements of PM2.5 using indoor or personal monitors. In addition to the WHO database, we included eight additional studies from a systematic review conducted in 2015 for GBD. The final dataset included 336 estimates from 75 studies in 43 unique locations. We included 260, 64, nine, and three measurements indoors, on personal monitors for females, children (under 5), and males, respectively. 274 estimates were in households using solid fuels, 47 in households only using clean (gas or electricity) fuels, and 15 in households using a mixture of solid and clean fuels. We use the following model: We also included the Socio-demographic Index (SDI) as a variable to predict a unique value of HAP for each location and year based on development. We also included a random effect on study. We weighted each study by its sample size. Before modelling, we calculated the excess particulate matter in households using solid fuel by subtracting off the predicted ambient PM2.5 value in the study location and year based on the GBD 2017 PM2. 5 Therefore, for females in households using solid fuel, we would expect their long-term mean excess PM2.5 exposure due to the use of solid fuels to be 1522, 117, and 9 μg/m 3 in SDI of 0.1, 0.5, and 0.9, respectively. Because there are so few studies of personal monitoring in men and children, rather than directly using the results of the model, we generated ratios using studies that measured at least two of the population groups for any size particulate matter. For PM2.5 we used the predicted ambient PM2.5 value in the study location and year based on the GBD 2017 PM2.5 exposure model as the "outdoor" measurement, and for PM4 and PM10 we used published values in the studies themselves. We first subtracted off this outdoor value from each PM measurement, and then calculated the ratio of male to female and child to female exposure, weighted by sample size. Smoking Input data and methodological summary Definition Exposure As in GBD 2017, we estimated the prevalence of current smoking and the prevalence of former smoking using data from cross-sectional nationally representative household surveys. We defined current smokers as individuals who currently use any smoked tobacco product on a daily or occasional basis. We defined former smokers as individuals who quit using all smoked tobacco products for at least six months, where possible, or according to the definition used by the survey. Our extraction method has not changed from GBD 2017. We extracted primary data from individuallevel microdata and survey report tabulations. We extracted data on current, former, and/or ever smoked tobacco use reported as any combination of frequency of use (daily, occasional, and unspecified, which includes both daily and occasional smokers) and type of smoked tobacco used (all smoked tobacco, cigarettes, hookah, and other smoked tobacco products such as cigars or pipes), resulting in 36 possible combinations. Other variants of tobacco products, for example hand-rolled cigarettes, were grouped into the four type categories listed above based on product similarities. For microdata, we extracted relevant demographic information, including age, sex, location, and year, as well as survey metadata, including survey weights, primary sampling units, and strata. This information allowed us to tabulate individual-level data in the standard GBD five-year age-sex groups and produce accurate estimates of uncertainty. For survey report tabulations, we extracted data at the most granular age-sex group provided. Our GBD smoking case definitions were current smoking of any tobacco product and former smoking of any tobacco product. All other data points were adjusted to be consistent with either of these definitions. Some sources contained information on more than one case definition and these sources were used to develop the adjustment coefficient to transform alternative case definitions to the GBD case definition. The adjustment coefficient was the beta value derived from a linear model with one predictor and no intercept. We used the same crosswalk adjustment coefficients as in GBD 2017, and thus we have not included a methods explanation in this appendix, as it has been detailed previously. As in GBD 2017, we split data reported in broader age groups than the GBD 5-year age groups or as both sexes combined by adapting the method reported in Ng et al 1 to split using a sex-geography-timespecific reference age pattern. We separated the data into two sets: a training dataset, with data already falling into GBD sex-specific 5-year age groups, and a split dataset, which reported data in aggregated age or sex groups. We then used spatiotemporal Gaussian process regression (ST-GPR) to estimate sex-geography-time-specific age patterns using data in the training dataset. The estimated age patterns were used to split each source in the split dataset. The ST-GPR model used to estimate the age patterns for age-sex splitting used an age weight parameter value that minimises the effect of any age smoothing. This parameter choice allowed the estimated age pattern to be driven by data, rather than being enforced by any smoothing parameters of the model. Because these age-sex split data points were to be incorporated in the final ST-GPR exposure model, we did not want to doubly enforce a modelled age pattern for a given sex-location-year on a given aggregate data point. We used ST-GPR to model current and former smoking prevalence. The model is nearly identical to that in GBD 2017. Full details on the ST-GPR method are reported elsewhere in the appendix. Briefly, the mean function input to GPR is a complete time series of estimates generated from a mixed effects hierarchical linear model plus weighted residuals smoothed across time, space, and age. The linear model formula for current smoking, fit separately by sex using restricted maximum likelihood in R, is: Where , is the tobacco consumption covariate by geography and time , described above, [ ] is a dummy variable indicating specific age group that the prevalence point , , captures, and , , and are super-region, region, and geography random intercepts, respectively. Random effects were used in model fitting but not in prediction. [ ] , , is the current smoking prevalence by specific age group , geography , and time that point , , captures, both derived from the current smoking ST-GPR model defined above. The methods for modelling supply-side-level data were changed substantially from those used in GBD 2017. The raw data were domestic supply (USDA Global Surveillance Database and UN FAO) and retail supply (Euromonitor) of tobacco. Domestic supply was calculated as production + imports -exports. The data went through three rounds of outliering. First, they were age-sex split using daily smoking prevalence to generate number of cigarettes per smoker per day for a given location-age-sex-year. If more than 12 points for a particular source-location-year (equal to over 1/3 of the split points) were above the given thresholds, that source-location-year was outliered. A point would not be outliered if it was (in cigarettes per smoker): under five (10-14 year olds); under 20 (males, [15] [16] [17] [18] [19] year olds); under 18 (females, [15] [16] [17] [18] [19] year olds); under 38/35 and over three (males/females, 20+ year olds). These thresholds were chosen by visualising histograms of the data for each age-sex, as well as with expert knowledge about reasonable consumption levels. In the second round of outliering, the mean tobacco per capita value over a 10-year window was calculated. If a point was over 70% of that mean value away from the mean value, it was outliered. The 70% limit was chosen using histograms of these distances. Additionally, some manual outliering was performed to account for edge cases. Finally, data smoothing was performed by taking a three-year rolling mean over each location-year. Next, a simple imputation to fill in missing years was performed for all series to remove compositional bias from our final estimates. Since the data from our main sources covered different time periods, by imputing a complete time series for each data series, we reduced the probability that compositional bias of the sources was leading to biased final estimates. To impute the missing years for each series, we modelled the log ratio of each pair of sources as a function of an intercept and nested random effects on super-region, region, and location. The appropriate predicted ratio was multiplied by each source that we did have, and then the predictions were averaged to get the final imputed value. For example, if source A was missing for a particular location-year, but sources B and C were present, then we predicted A twice: once from the modelled ratio of A to B, and again from the modelled ratio of A to C. These two predictions were then averaged. For some locations where there was limited overlap between series, the predicted ratio did not make sense, and a regional ratio was used. Finally, variance was calculated both across series (within a location-year) as well as across years (within a location-source). Additionally, if a location-year had one imputed point was, the variance was multiplied by 2. If a location-year had two imputed points, the variance was multiplied by 4. The average estimates in each location-year were the input to an ST-GPR model. For this, we used a simple mixed effects model, which was modelled in log space with nested location random effects. Subnational estimates were then further modelled by splitting the country-level estimates using current smoking prevalence. The theoretical minimum-risk exposure level is 0. Identical to GBD 2017, we estimated exposure among current smokers for two continuous indicators: cigarettes per smoker per day and pack-years. Pack-years incorporates aspects of both duration and amount. One pack-year represents the equivalent of smoking one pack of cigarettes (assuming a 20cigarette pack) per day for one year. Since the pack-years indicator collapses duration and intensity into a single dimension, one pack-year of exposure can reflect smoking 40 cigarettes per day for six months or smoking 10 cigarettes per day for two years. To produce these indicators, we simulated individual smoking histories based on distributions of age of initiation and amount smoked. We informed the simulation with cross-sectional survey data capturing these indicators, modelled at the mean level for all locations, years, ages, and sexes using ST-GPR. We rescaled estimates of cigarettes per smoker per day to an envelope of cigarette consumption based on supply-side data. We estimated pack-years of exposure by summing samples from age-and time-specific distributions of cigarettes per smoker for a birth cohort in order to capture both age trends and time trends and avoid the common assumption that the amount someone currently smokes is the amount they have smoked since they began smoking. All distributions were age-, sex-, and region-specific ensemble distributions, which were found to outperform any single distribution. We estimated exposure among former smokers using years since cessation. We utilised ST-GPR to model mean age of cessation using cross-sectional survey data capturing age of cessation. Using these estimates, we generated ensemble distributions of years since cessation for every location, year, age group, and sex. The same risk-outcome pairs from GBD 2017 were used: tuberculosis, lower respiratory tract infections, oesophageal cancer, stomach cancer, bladder cancer, liver cancer, laryngeal cancer, lung cancer, breast cancer, cervical cancer, colorectal cancer, lip and oral cancer, nasopharyngeal cancer, other pharyngeal cancer, pancreatic cancer, kidney cancer, leukaemia, ischaemic heart disease, ischaemic stroke, haemorrhagic stroke, subarachnoid haemorrhage, atrial fibrillation and flutter, aortic aneurysm, peripheral arterial disease, chronic obstructive pulmonary disease, other chronic respiratory diseases, asthma, peptic ulcer disease, gallbladder and biliary tract diseases, Alzheimer disease and other dementias, Parkinson disease (protective), multiple sclerosis, type-II diabetes, rheumatoid arthritis, low back pain, cataracts, macular degeneration, and fracture. Input data for relative risks were nearly the same as in GBD 2017. The only addition was for chronic obstructive pulmonary disease, for which a few additional studies were included. We synthesised effect sizes by cigarettes per smoker per day, pack-years, and years since quitting from cohort and case-control studies to produce nonlinear dose-response curves using a Bayesian meta-regression model. For outcomes with significant differences in effect size by sex or age, we produced sex-or age-specific risk curves. We estimated risk curves of former smokers compared to never smokers taking into account the rate of risk reduction among former smokers seen in the cohort and case-control studies, and the cumulative exposure among former smokers within each age, sex, location, and year group. As in GBD 2017, we estimated PAFs based on the following equation: where ( ) is the prevalence of never smokers, ( ) is the prevalence of former smokers, ( ) is the prevalence of current smokers, exp( ) is a distribution of years since quitting among former smokers, ( ) is the relative risk for years since quitting, exp( ) is a distribution of cigarettes per smoker per day or pack-years, and ( ) is the relative risk for cigarettes per smoker per day or pack-years. We used pack-years as the exposure definition for cancers and chronic respiratory diseases, and cigarettes per smoker per day for cardiovascular diseases and all other health outcomes. We define secondhand smoke exposure as current exposure to secondhand tobacco smoke at home, at work, or in other public places. We use household composition as a proxy for non-occupational secondhand smoke exposure and make the assumption that all persons living with a daily smoker are exposed to tobacco smoke. We use surveys to estimate the proportion of individuals exposed to secondhand smoke at work. We only consider non-smokers to be exposed to secondhand smoke. Nonsmokers are defined as all persons who are not daily smokers. Ex-smokers and occasional smokers are considered non-smokers in this analysis. Exposure is evaluated for both children and adults. To calculate the proportion of non-smokers who live with at least one smoker, we used unit record data on household composition, which included the ages and sexes of all persons living in the same household. Our sources included representative major survey series with a household composition module, including the Demographic Health Surveys (DHS), the Multiple Indicator Cluster Surveys (MICS), and the Living Standards Measurement Surveys (LSMS); and national and subnational censuses, which included those captured in the IPUMS project and identified using the Global Health Data Exchange catalog (GHDx). To calculate the proportion of individuals exposed to secondhand smoke at work, by age and sex, we used cross-sectional surveys that ask respondents about self-reported occupational secondhand smoke exposure. Sources include the Global Adult Tobacco Surveys, Eurobarometer Surveys, and WHO STEPS Surveys. We identified sources using the GHDx. No major changes have been introduced to data inputs since 2016. A new systematic review is planned for the next GBD round. Table 1 summarizes exposure input data. Source count (total) 721 Given the nature of the data used in our models (microdata), no crosswalk for case definition adjustment or age-and sex-splitting processes were required. Estimates of daily smoking prevalence in each location were also used in our calculations, as described in the modelling strategy section below. Identical to GBD 2017, we estimated the probability that each person is living with a smoker and is also a non-smoker themselves using set theory. First, household composition data were used at the individual level to capture the ages and sexes of each person in the household. Second, we analysed surveys with both household composition data and tobacco use questions and determined that the distribution of household size, mean age of the household members, and the age distribution were not significantly different between households with and without a self-reported smoker. Since we did not find that household composition varied between smokers and non-smokers, we then used the GBD 2019 primary daily smoking prevalence model to calculate the probability that each household member is a daily smoker. Next, we used the probability of the union of sets on each individual household member to calculate the overall probability that at least one of the other household members was a daily smoker. As in GBD 2017, we incorporated occupational exposure by modelling prevalence of current exposure to secondhand smoke at work, by age, sex, location, and year, using ST-GPR. In order to avoid double counting we calculated the probability that an individual is exposed through either non-occupational exposure or occupational exposure, given their age, sex, and household composition. Finally, we multiplied this probability of exposure by the probability that the individual is not a smoker themselves (ie, 1 minus primary daily smoking prevalence for that person's location, year, age, and sex). We then collapsed these individual-level probabilities to produce average probabilities of exposure by location, year, age, and sex. These probabilities were modelled in the GBD ST-GPR framework, which generates exposure estimates from a mixed effects hierarchical linear model plus weighted residuals smoothed across time, space, and age. The linear model formula was fit separately by sex using restricted maximum likelihood in R. We used the sex-specific overall daily smoking prevalence for adults (age 15 and older) as a countrylevel covariate in the model. The overall male adult daily smoking prevalence was used as the covariate for females of all ages and for males under age 15. The overall female adult daily smoking prevalence was used as the covariate for males age 15 and older. All input datapoints from the probability calculation had a measure of uncertainty (variance and sample size) coming from the uncertainty of the primary smoking prevalence model and the sample size from the unit record data going into the modelling process. Geographical random effects were used in model fitting but were not used in prediction. The theoretical minimum-risk exposure level for secondhand smoke is zero exposure among nonsmokers, meaning that non-smokers would not live with any primary smokers. The same risk-outcome pairs from GBD 2017 were used. For children ages 0-14, we estimated the burden of otitis media attributable to secondhand smoke exposure. For all ages we estimated the burden of lower respiratory infections (LRI), and for adults greater than or equal to 25 years of age we estimated the burden of lung cancer, chronic obstructive pulmonary disease (COPD), ischaemic heart disease, and cerebrovascular disease attributable to secondhand smoke exposure, breast cancer, and type 2 diabetes. For lung cancer, ischaemic heart disease, cerebrovascular disease, and LRI, we used country-specific relative risks created using integrated exposure response curves (IER) for PM2.5 air pollution. IER curve calculation was updated with the GBD 2019 cigarettes per smoker estimates. The relative risks for otitis media 1 , breast cancer 2 , and diabetes 3 are derived from published meta-analyses and are the same as the ones used in the previous GBD cycle. Table 2 summarizes relative risk input data. Source count (total) 232 We used the standard GBD population attributable fraction (PAF) equation to estimate burden based on exposure and relative risks. Input data and methodological summary Definition Exposure Current chewing tobacco use is defined as current use (use within the last 30 days where possible, or according to the closest definition available from the survey) of any frequency (any, daily, or less than daily). Chewing tobacco includes local products, such as betel quid with tobacco. As in GBD 2017, we included sources that reported primary chewing tobacco, non-chew smokeless tobacco, and all smokeless tobacco use among respondents over age 10. To be eligible for inclusion, sources had to be representative for their level of estimation (ie, national sources needed to be nationally representative, subnational sources subnationally representative). We included only selfreported use data and excluded data from questions asking about others' tobacco use behaviours. We extracted primary data from individual-level microdata and survey report tabulations on chewing tobacco, non-chew smokeless tobacco, and all smokeless tobacco use. We extracted data on current, former, and/or ever use as well as frequency of use (daily, occasional, and unspecified, which includes both daily and occasional smokers). Products that do not include tobacco, such as betel quid without tobacco, were excluded or estimated separately as part of the drug use risk factor, if applicable. For microdata, we extracted relevant demographic information, including age, sex, location, and year, as well as survey metadata, including survey weights, primary sampling units, and strata. This information allowed us to tabulate individual-level data in the standard GBD five-year age-sex groups and produce accurate estimates of uncertainty. For survey report tabulations, we extracted data at the most granular age-sex group provided. Age and sex splitting We split data reported in broader age groups than the GBD five-year age groups or as both sexes combined by adapting the method reported in Ng and colleagues (http://jamanetwork.com/journals/jama/fullarticle/1812960) to split using a sex-geography-timespecific reference age pattern. We separated the data into two sets: a training dataset, with data already falling into GBD sex-specific five-year age groups, and a split dataset, which reported data in aggregated age or sex groups. We then used spatiotemporal Gaussian process regression (ST-GPR) to estimate sex-geography-time-specific age patterns using data in the training dataset. The estimated age patterns were then used to split each source in the split dataset. The ST-GPR model used to estimate the age patterns for age-sex splitting used an age weight parameter value that minimises the effect of any age smoothing. This parameter choice allows the estimated age pattern to be driven by data, rather than being enforced by any smoothing parameters of the model. Because these age-sex-split datapoints will be incorporated in the final ST-GPR exposure model, we do not want to doubly enforce a modelled age pattern for a given sex-location-year on a given aggregate datapoint. We run three separate ST-GPR models for age-sex splitting -one for each smokeless tobacco category (chew, non-chew, and all smokeless). We used a ST-GPR to model chewing tobacco prevalence. Full details on the ST-GPR method are reported elsewhere in the Appendix. Briefly, the mean function input to GPR is a complete time series of estimates generated from a mixed effects hierarchical linear model plus weighted residuals smoothed across time, space, and age. The linear model formula for chewing tobacco, fit separately by sex using restricted maximum likelihood in R, is: , , Where [ ] is a dummy variable indicating specific age group that the prevalence point , , captures, and , , and are super-region, region, and geography random intercepts, respectively. The hyperparameters are the same as in GBD 2017. We run three ST-GPR models for each prevalence category -one for each smokeless tobacco category (chew, non-chew, and all smokeless). All smokeless tobacco prevalence adjustment Using the 1000 draws from each of the prevalence ST-GPR models, we calculated 1000 draws of chewing tobacco prevalence divided by the sum of chewing tobacco and non-chewing tobacco prevalence for each location, age group, sex, and year. The draws were unordered, as we did not want to enforce an assumption about the relationship between the levels of chewing tobacco and non-chewing tobacco prevalence. The draws of the ratio of chewing to non-chewing tobacco were then multiplied by the draws from the all smokeless tobacco prevalence model to adjust the estimates to chewing tobacco prevalence. These were then averaged to get the mean estimate. The variance across the ratios was calculated for each location, year, age, and sex, and was added to the variance from the original all smokeless tobacco draws. Final chewing tobacco prevalence model To calculate the final chewing tobacco prevalence, we ran an additional ST-GPR model with both the original chewing tobacco data (post-age-sex splitting), as well as the adjusted data. These adjusted data add more information to the model -as surveys will often only ask about all smokeless tobacco consumption -while taking into consideration the uncertainty from the ratio calculation. The theoretical minimum risk exposure level is that everyone in the population has been a lifelong nonuser of chewing tobacco. As in GBD 2017, we included outcomes based on the strength of available evidence supporting a causal relationship. There was sufficient evidence to include oral cancer and oesophageal cancer as health outcomes caused by chewing tobacco use. Relative risk estimates were derived from prospective cohort studies and population-based case-control studies. We used the same underlying effect size estimates from prospective cohort studies and population-based case-control studies as in GBD 2017. Briefly, we did not include hospital-based case control studies due to concerns over representativeness. We only included sources that adequately adjusted for major confounders, especially smoking status. Summary effect size estimates were calculated in R, using the 'metafor' package. We performed a random effects meta-analysis using the DerSimonian and Laird method, which does not assume a true effect size but considers each input study as selected from a random sample of all possible sets of studies for the outcome of interest. The random-effects method allows for more variation between the studies, and incorporates this variance into the estimation process. We used an inverse-variance weighting method to determine component study weights. We found significantly different relative risks for oral cancer for males and females, and estimated relative risks separately by sex for oral cancer alone. Average daily consumption (in grams per day) of less than 310-340 grams of fruit including fresh, frozen, cooked, canned, or dried fruit, excluding fruit juices and salted or pickled fruits Average daily consumption (in grams per day) of less than 280-320 grams of vegetables, including fresh, frozen, cooked, canned, or dried vegetables and excluding legumes and salted or pickled vegetables, juices, nuts and seeds, and starchy vegetables such as potatoes or corn Average daily consumption (in grams per day) of less than 140-160 grams of whole grains (bran, germ, and endosperm in their natural proportion) from breakfast cereals, bread, rice, pasta, biscuits, muffins, tortillas, pancakes, and other sources Average daily consumption (in grams per day) of less than 10-19 grams of nuts and seeds, including tree nuts and seeds and peanuts Average daily consumption (in grams per day) of less than 21-22 grams of fibre from all sources including fruits, vegetables, grains, legumes, and pulses Diet low in omega-3 fatty acids Average daily consumption (in milligrams per day) of less than 430-470 milligrams of eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) Diet low in polyunsaturated fatty acids (PUFA) Average daily consumption (in % daily energy) of less than 7-9% total energy intake from polyunsaturated fatty acids Average daily consumption (in grams per day) of less than 1.06-1.1 grams of calcium from all sources, including milk, yogurt, and cheese Average daily consumption (in grams per day) of less than 360-500 grams of milk including non-fat, low-fat, and full-fat milk, excluding soy milk and other plant derivatives Average daily consumption (in grams per day) of less than of 90-100 grams of legumes and pulses, including fresh, frozen, cooked, canned, or dried legumes Any intake (in grams per day) of red meat including beef, pork, lamb, and goat but excluding poultry, fish, eggs, and all processed meats Diet high in processed meat Any intake (in grams per day) of meat preserved by smoking, curing, salting, or addition of chemical preservatives Diet high in sugarsweetened beverages (SSBs) Any intake (in grams per day) of beverages with ≥50 kcal per 226.8 gram serving, including carbonated beverages, sodas, energy drinks, fruit drinks, but excluding 100% fruit and vegetable juices Diet high in trans fatty acids Any intake (in percent daily energy) of trans fat from all sources, mainly partially hydrogenated vegetable oils and ruminant products Average 24-hour urinary sodium excretion (in grams per day) greater than 1-5 grams In GBD 2019, we included new dietary recall sources from a literature search of PubMed and new sources from the IHME GHDx yearly known survey series updates in our models. We also conducted a new systematic review for sodium ( Figure 1 ). As in GBD 2017, the dietary data that we use in the models comes from multiple sources, including nationally and subnationally representative nutrition surveys, household budget surveys, accounts of national sales from the Euromonitor, and availability data from the United Nations FAO Supply and Utilization Accounts (SUA). Table 1 below provides a summary of data inputs used for dietary risk modeling in GBD 2019. Whole grains 37 9 The availability data for food groups in GBD were previously based on the FAO Food Balance Sheets (FBS), which provide tabulated and processed data of national food supply. In GBD 2019, to more accurately characterise the national availability of various food groups, we used more disaggregated data on food commodities that were included in FAO SUA and recreated the national availability of each food group based on the GBD definition of the food group. We modelled missing country-year data from FAO using a spatiotemporal Gaussian process regression and lag-distributed country income as the covariate. For nutrient availability, we continued to use data from Global Nutrient Database. 1 For each dietary factor, we estimated the global age pattern of consumption based on nutrition surveys (ie, 24-hour diet recall) and applied that age pattern to the all-age data (availability, sales and household budget surveys) before the data source bias adjustment. Our gold-standard data source for all dietary risks (except sodium) is 24-hour dietary recall surveys where food and nutrient intake are reported or convertible to grams per person per day; the goldstandard data source for sodium is 24-hour urinary sodium. The other data sources we use -household budget surveys, food frequency questionnaires, sales, and availability -are treated as alternate definitions for dietary intake and crosswalked to the gold-standard definition. In GBD 2016 and GBD 2017, we determined the bias adjustment factors from a mixed effects linear regression. In GBD 2019, we used MR-BRT (a network meta-regression) to determine the adjustment factors for non-goldstandard datapoints. Coefficients for these models can be found in Table 3 . We use a spatiotemporal Gaussian process regression (ST-GPR) framework to estimate the mean intake of each dietary factor by age, sex, country, and year. In GBD 2019, we removed lag-distributed income as a covariate from most of our models and added country-level energy availability (Table 2 ). To characterise the distribution of each dietary factor at the population level, we use an ensemble approach that separately fit 12 distributions for individual-level microdata to specific to each data source's sampled population. The respective goodness of fit of each family was assessed, and a weighting scheme was determined to optimise overall fit to the unique distribution of each risk factor. A global mean of the weights for each risk factor's data sources was created. We then determined the standard deviation of each population's consumption through a linear regression that captured the relationship between the standard deviation and mean of intake in nationally representative nutrition surveys using 24-hour diet recalls: Then we applied the coefficients of this regression to the outputs of our ST-GPR model to calculate the standard deviation of intake by age, sex, year, and country. We also quantified the within-person variation in consumption of each dietary component and adjusted the standard deviations accordingly. The dietary TRMELs were updated for GBD 2019. For harmful dietary risks other than sodium, TMREL was set to zero. For protective dietary risk factors, we first calculated the level of intake associated with the lowest risk of mortality from each disease endpoint based on the 85 th percentile of intake across all epidemiological studies included in the meta-analysis of the risk-outcome pair. Then we calculated the TMREL as the weighted average of these numbers using the global number of deaths from each outcome as the weight. For GBD 2019, we performed systematic reviews for each dietary risk and its related outcomes. Using the sources identified during these searches, we incorporated the most recent epidemiological evidence assessing the relationship between each GBD dietary risk factor and related outcomes in our relative risk analysis. After evaluating all available evidence, we found sufficient evidence on the casual relationship for 8 new R-O pairs and insufficient evidence for 5 old R-O pairs. Based on these results, we updated the R-O pairs used the GBD dietary risk factor analysis in the following ways: Removed: Diet low in fruit and nasopharynx cancer Diet low in fruit and other pharynx cancer Diet low in fruit and oesophageal cancer Diet low in fruit and larynx cancer Diet low in whole grains and haemorrhagic stroke Added: Diet low in whole grains and colon and rectum cancer Diet high in red meat and breast cancer Diet high in red meat and ischaemic heart disease Diet high in red meat and haemorrhagic stroke Diet high in red meat and ischaemic stroke Diet low in fibre and ischaemic stroke Diet low in fibre and haemorrhagic stroke Diet low in fibre and diabetes mellitus Additionally, based on the most recent epidemiological evidence and GBD 2019 newly developed methods for characterising the risk curve, we updated the dose-response curve of relative risks for all dietary risks. For sodium, we continued to estimate its effect on cardiovascular disease based on the effect of sodium on systolic blood pressure. There is a well-documented attenuation of the risk for cardiovascular disease due to metabolic risks factors throughout one's life. To incorporate this age trend in the relative risks, we first identified the median age-at-event across all cohorts and considered that as the reference age group. We then assigned our newly estimated risk curves to this reference age group. Then, we derived the percentage change in relative risks between each age group and the reference age group by averaging percentage changes in relative risks of all metabolic mediators. The three cardiovascular disease outcomes for dietary risks are haemorrhagic stroke (including intracerebral hemmorhage and subarachnoid hemmorhage), ischaemic stroke, and ischaemic heart disease, and the effects of dietary risks on them are mediated through high systolic blood pressure, cholesterol (not included for haemorrhagic stroke), and fasting plasma glucose. Since the effect of diet is estimated independently of body-mass index (BMI) in the GBD, BMI was not included as a mediator in the RR age trend analysis. We included surveys of the general adult population that captured self-reported physical activity in all domains of life (leisure/recreation, work/household and transport), where random sampling was used. Data were primarily derived from two standardised questionnaires: The Global Physical Activity Questionnaire (GPAQ) and the International Physical Activity Questionnaire (IPAQ), although we included other survey instruments that asked about intensity, frequency and duration of physical activities performed across all activity domains. Due to a lack of a consistent relationship on the individual level between activity performed in each domain and total activity, we were not able to use studies that included only recreational/leisure activities. Physical activity level is categorised by total MET-minutes per week using four categories based on rounded values closest to the quartiles of the global distribution of total MET-minutes/week. The lower limit for the Level 1 category (600 MET-min/week) is the recommended minimum amount of physical activity to get any health benefit. We used four categories with higher thresholds rather than the GPAQ and IPAQ recommended 3 categories to better capture any additional protective effects from higher activity levels.  Level 0: < 600 MET-min/week (inactive)  Level 1: 600-3999 MET-min/week (low-active)  Level 2: 4000-7,999 MET-min/week (moderately-active)  Level 3: ≥ 8,000 MET-min/week (highly active) The GHDx was used to locate all surveys that use the GPAQ or IPAQ questionnaire. Although there were many other surveys that focused specifically on leisure activity, we were unable to use these sources because they did not comprise all three domains (work, transport and leisure). In addition, we excluded any surveys that did not report frequency, duration, and intensity of activity. For this round of the GBD, we have chosen to use a machine learning crosswalk to predict IPAQ estimates for GPAQ results and GPAQ estimates for IPAQ results, with original and estimated results then being combined to get one comprehensive IPAQ dataset and one comprehensive GPAQ dataset. We then estimated the proportion of each country/year/age/sex subpopulation in each of the above four activity levels using 12 separate Dismod models (one set of six for IPAQ and one for GPAQ). We use six categories of physical activity prevalence rather than four to accommodate the different METminute/week cutoffs presented in tabulated data sources where individual unit record data was not available. Since the accepted threshold/definition for inactivity is consistently <600 MET-minutes/week, the vast majority of tabulated data was broken down into proportion inactive (model A) and proportion low, moderate or highly active (model B). MET-min/week Name of sequelae in online visualisation tool A inactive <600 Physical inactivity and low physical activity, inactive B low/moderately/highly active ≥600 Physical inactivity and low physical activity, low/moderately/highly active C low active 600-3999 Physical inactivity and low physical activity, low active D moderately/highly active >4000 Physical inactivity and low physical activity, moderately/highly active E moderately active 4000-7999 Physical inactivity and low physical activity, moderately active F highly active ≥8,000 Physical inactivity and low physical activity, highly active These models have mesh points at 0 15 25 35 45 55 65 75 85 100, and a study-level fixed effect on integrand variance (Z-cov) for whether a study was nationally representative or not, to account for the heterogeneity introduced by studies that are not generalizable to the entire population. They also have national level fixed effects on prevalence of obesity. After DisMod, we rescale each of the 6 models specific to each data source so that the proportions sum to one. Since we have the most data for models A and B, we rescale the sum of the proportion in each category to be equal to one. Next we rescale the sum of model C and D to be equal to the rescaled value from model B. Then we rescale the sum of models E and F to be equal to the rescaled value from model D. After these three rescales we are left with a proportion for each of the four categories that all sum to 1. Scaled results for each data source are then hybridised to produce only one set of results for the prevalence of the four categories of physical activity. Similar to the previous round, we have not directly estimated total MET-minutes per week globally. Although, this year we made use of two specific machine learning algorithms (Random Forest and XGBoost) that were trained using data that could characterise the relationship between total METmins/week and each of the categorical prevalences of physical activity. This resulted in country-yearage-sex specific estimates of total physical activity in the form of MET-minutes per week. Utilising microdata on total MET-mins per week from individual-level surveys, we characterised the distribution of activity level at the population level. We then used an ensemble approach to distribution fitting, borrowing characteristics from individual distributions to tailor a unique distribution to fit the data using a weighting scheme. We characterised the standard deviation of each population's activity through a linear regression that captured the relationship between standard deviation and mean activity levels in nationally representative IPAQ surveys: Agei is the youngest age in population i's age group, SRi is the super region in which the population lives, and Femi is a Boolean value depicting whether the population is female. We then applied the coefficients of this regression to the outputs of our estimate of total MET-minutes per week regression outputs to calculate the standard deviation by country, year, age, and sex. The theoretical minimum-risk exposure level for physical inactivity is 3000-4500 MET-min per week, which was calculated as the exposure at which minimal deaths across outcomes occurred. 3 We used a dose-response meta-analysis of prospective cohort studies to estimate the effect size of the change in physical activity level on breast cancer, colon cancer, diabetes, ischemic heart disease and ischemic stroke. 3 There is a well-documented attenuation of the risk for cardiovascular disease and diabetes due to metabolic risks factors throughout one's life. To incorporate this age trend in the relative risks, we first identified the median age-at-event across all cohorts and considered that as the reference age-group. We then assigned our risk curves to this reference age group. Then, we derived the percent change in relative risks between each age group and the reference age group by averaging percentage changes in relative risks of all metabolic mediators. Non-rheumatic valvular heart disease: Non-rheumatic calcific aortic valve disease, non-rheumatic degenerative mitral valve disease, and other non-rheumatic valvular heart diseases Non-fatal cause-specific modelling descriptions Congenital birth defects Non-rheumatic valvular heart diseases: Calcific aortic valve disease, degenerative mitral valve disease, and other non-rheumatic valve disease High fasting plasma glucose Ambient particulate matter pollution Congenital Heart Defects and Receipt of Special Education Services The unnatural history of the ventricular septal defect: outcome up to 40 years after surgical closure International Cardiac Collaborative on Neurodevelopment (ICCON) Investigators. Neurodevelopmental outcomes after cardiac surgery in infancy Survival with congenital heart disease and need for follow up in adult life Long-term outcome of patients with ventricular septal defect considered not to require surgical closure during childhood Small ventricular septal defects in adults Union Territories other than Delhi, Union Territories other than Delhi, Rural, Union Territories other than Delhi New Zealand Maori population, New Zealand non-Maori population Latent Rheumatic Heart Disease: Identifying the Children at Highest Risk of Unfavorable Outcome Screening-detected rheumatic heart disease can progress to severe disease Eighth Joint National Committee panel recommendation for blood pressure targets revisited: results from the INVEST study Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the Global Burden of Disease Study The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis Collaboration APCS, others. Blood pressure and cardiovascular disease in the Asia Pacific region Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1·25 million people The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis High total serum cholesterol, medication coverage and therapeutic control: an analysis of national health examination survey data from eight countries Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the Global Burden of Disease Study Very Low Levels of Atherogenic Lipoproteins and the Risk for Cardiovascular EventsA Meta-Analysis of Statin Trials Evolocumab and Clinical Outcomes in Patients with Cardiovascular Disease Prediction of Coronary Heart Disease Using Risk Factor Categories The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis Extended International (IOTF) Body Mass Index Cut-Offs for Thinness, Overweight and Obesity Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the Global Burden of Disease Study Body-mass index and all-cause mortality: individual-participant-data meta-analysis of 239 prospective studies in four continents Chronic kidney disease and cardiovascular disease in a general Japanese population: the Hisayama Study Estimated GFR and incident cardiovascular disease events in American Indians: the Strong Heart Study. American journal of kidney diseases : the official journal of the National Kidney Foundation Renal insufficiency as a predictor of cardiovascular outcomes and the impact of ramipril: the HOPE randomized trial Association of estimated glomerular filtration rate and albuminuria with allcause and cardiovascular mortality in general population cohorts: a collaborative meta-analysis Relationship between kidney function and risk of asymptomatic peripheral arterial disease in elderly subjects. Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association -European Renal Association Kidney function and risk of peripheral arterial disease: results from the Atherosclerosis Risk in Communities (ARIC) Study Renal insufficiency and the risk of lower extremity peripheral arterial disease: results from the Heart and Estrogen/Progestin Replacement Study (HERS) Level of kidney function as a risk factor for cardiovascular outcomes in the elderly Level of kidney function as a risk factor for atherosclerotic cardiovascular outcomes in the community Proportion of disease caused or prevented by a given exposure, trait or intervention Low birth weight and prenatal exposure to indoor pollution from tobacco smoke and wood fuel smoke: a matched case-control study in Gaza Strip Chronic bronchitis in women using solid biomass fuel in rural Peshawar Adult Cardiopulmonary Mortality and Indoor Air Pollution: A 10-Year Retrospective Cohort Study in a Low-Income Rural Setting Pregnancy outcomes and ethanol cook stove intervention: A randomized-controlled trial in Ibadan Human metapneumovirus and respiratory syncytial virus disease in children Long-term exposure to outdoor air pollution and the incidence of chronic obstructive pulmonary disease in a national English cohort Protective and risk factors for acute respiratory infections in hospitalized urban Malaysian children: a case control study Exposures to fine particulate matter (PM2.5) and birthweight in a rural-urban Effects of fine particulate matter and its constituents on low birth weight among full-term infants in California Long-Term Effects of Traffic-Related Air Pollution on Mortality in a Dutch Cohort Long-Term Effects of Traffic-Related Air Pollution on Mortality in a Dutch Cohort (NLCS-AIR Study) Prenatal Exposure to Fine Particulate Matter and Birth Weight: Variations by Particulate Constituents and Sources Ambient Air Pollution and Low Birth Weight in Connecticut and Massachusetts. Environ Health Perspect Decomposition Analysis of Black-White Disparities in Birth Outcomes: The Relative Contribution of Air Pollution and Social Factors in California The 2016 global and national burden of diabetes mellitus attributable to PM2·5 air pollution Birth weight and exposure to kitchen wood smoke during pregnancy in rural Guatemala A cohort study of traffic-related air pollution impacts on birth outcomes Risk factors for severe acute lower respiratory tract infection in under-five children Proportional Survival Model Hazard Ratios from Census Year to 2011 for Adults Aged 25 to 89 in CanCHEC Cohort Mortality associations with long-term exposure to outdoor air pollution in a national English cohort Long-term exposure to urban air pollution and mortality in a cohort of more than a million adults in Rome A spatial time-to-event approach for estimating associations between air pollution and preterm birth Risk of incident diabetes in relation to long-term exposure to fine particulate matter in Ontario The association between fatal coronary heart disease and ambient particulate air pollution: Are females at greater risk? Association of Long-Term Exposure to Transportation Noise and Traffic-Related Air Pollution with the Incidence of Diabetes: A Prospective Cohort Study Maternal exposure to ambient air pollution and fetal growth in North-East Scotland: A population-based study using routine ultrasound scans Modeling spatial effects of PM(2.5) on term low birth weight in Los Angeles County Indoor woodsmoke pollution causing lower respiratory disease in children and Diabetes and Hypertension Incidence in the Black Women's Health Study Residential proximity to major roads and term low birth weight: the roles of air pollution, heat, noise, and road-adjacent trees Ambient Air Pollution and Birth Weight in Full-Term Infants in Atlanta Woodsmoke exposure and risk for obstructive airways disease among women Indoor air pollution from unprocessed solid fuel use and pneumonia risk in children aged under five years: a systematic review and meta-analysis Risk factors for admission and the role of respiratory syncytial virusspecific cytotoxic T-lymphocyte responses in children with acute bronchiolitis Airborne PM2.5 chemical components and low birth weight in the northeastern and mid-Atlantic regions of the United States Exposure to coarse particulate matter during gestation and birth weight in the U The reduction of birth weight by fine particulate matter and its modification by maternal and neighbourhood-level factors: a multilevel analysis in British Columbia Indoor air pollution from biomass combustion and acute respiratory infections in Kenya: an exposure-response study Outdoor air pollution, preterm birth, and low birth weight: analysis of the world health organization global survey on maternal and perinatal health Risk factors for childhood pneumonia among the urban poor in Fortaleza, Brazil: a case--control study Indoor air pollution from solid fuel use, chronic lung diseases and lung cancer in Harbin, Northeast China Associations of ambient air pollution with chronic obstructive pulmonary disease hospitalization and mortality Long-Term Exposure to Traffic-Related Air Pollution and the Risk of Coronary Heart Disease Hospitalization and Mortality Ambient air pollution and term birth weight in Texas from 1998 to Impact of Noise and Air Pollution on Pregnancy Outcomes Trafficrelated air pollution, preterm birth and term birth weight in the PIAMA birth cohort study Risk Factors of Lung Cancer by Histological Category in Taiwan Assessing the impact of race, social factors and air pollution on birth outcomes: a population-based study Hierarchical spatial modeling of uncertainty in air pollution and birth weight study Risk factors of lung cancer in Chandigarh The effects of air pollution on adverse birth outcomes Long-term exposure to fine particulate matter and incidence of diabetes in the Danish Nurse Cohort Associations with Concentrations of 11 Ambient Air Pollutants Estimated by Combining Community Multiscale Air Quality Model (CMAQ) Simulations with Stationary Monitor Measurements Geographic Variation in the Association between Ambient Fine Particulate Matter (PM2.5) and Term Low Birth Weight in the United States The association of PM(2.5) with full term low birth weight at different spatial scales Long-Term Ambient Multipollutant Exposures and Mortality Effect Modification of Long-Term Air Pollution Exposures and the Risk of Incident Cardiovascular Disease in US Women Robust relationship between air quality and infant mortality in Africa Early childhood lower respiratory illness and air pollution A case-control study of dietary factors in patients with lung cancer Source Citation Relationships between air pollution and preterm birth in California Exposure and Birth Outcomes: Use of Satellite-and Monitor-Based Data Long-term residential exposure to air pollution and lung cancer risk on behalf of Prospective Urban and Rural Epidemiological (PURE) Study investigators. Health Effects of Household Solid Fuel Use: Findings from 11 Countries within the Prospective Urban and Rural Epidemiology Study on behalf of Prospective Urban and Rural Epidemiological (PURE) Study investigators. Health Effects of Household Solid Fuel Use: Findings from 11 Countries within the Prospective Urban and Rural Epidemiology Study Outdoor fine particulate matter air pollution and cardiovascular disease: Results from 747 communities across 21 countries in the PURE Study Gender differences in fetal growth of newborns exposed prenatally to airborne fine particulate matter Effects of passive smoking on respiratory illness from birth to age eighteen months The association of household pollutants and socio-economic risk factors with the short-term outcome of acute lower respiratory infections in hospitalized pre-school Nigerian children Effects of subchronic and chronic exposure to ambient air pollutants on infant bronchiolitis Infant exposure to fine particulate matter and traffic and risk of hospitalization for RSV bronchiolitis in a region with lower ambient air pollution An association between long-term exposure to ambient air pollution and mortality from lung cancer and respiratory diseases in Japan Cooking Coal Use and All-Cause and Cause-Specific Mortality in a Prospective Cohort Study of Women Lung cancer and indoor exposure to coal and biomass in rural China Using new satellite based exposure methods to study the association between pregnancy pm2.5 exposure, premature birth and birth weight in Massachusetts Risk factors for primary lung cancer among non-smoking women in Taiwan Uncertainty in the relationship between criteria pollutants and low birth weight in Chicago Blood zinc levels in children hospitalized with severe pneumonia: a case control study Variation in lung cancer risk by smoky coal subtype in Xuanwei, China Sources and contents of air pollution affecting term low birth weight in A Statewide Nested Case-Control Study of Preterm Birth and Air Pollution by Source and Composition: California Low birth weight and air pollution in California: Which sources and components drive the risk? Investigating the association between birth weight and complementary air pollution metrics: a cohort study Ambient air pollution and adverse birth outcomes: Differences by maternal comorbidities The heterogeneity in risk factors of lung cancer and the difference of histologic distribution between genders in Taiwan Chronic exposure to fine particles and mortality: an extended follow-up of the Harvard Six Cities study from 1974 to 2009 -Unpublished data Chronic exposure to fine particles and mortality: an extended follow-up of the Harvard Six Cities study from 1974 to Association between longterm exposure to ambient air pollution and diabetes mortality in the US Long-term exposure to air pollution and cardiorespiratory disease in the California teachers study cohort Long-term exposure to air pollution and cardiorespiratory disease in the California teachers study cohort Lung cancer and Source Citation indoor pollution from heating and cooking with solid fuels: the IARC international multicentre case-control study in Eastern/Central Europe and the United Kingdom Indoor burning coal air pollution and lung cancer--a case-control study in Fuzhou Air Pollution and Respiratory Infections during Early Childhood: An Analysis of 10 European Birth Cohorts within the ESCAPE Project Risk factors for pneumonia in infants and young children and the role of solid fuel for cooking: a case-control study Long-term exposure to air pollution and incidence of cardiovascular events in women Ambient air pollution exposure and full-term birth weight in California Relation between concentration of air pollution and cause-specific mortality: four-year exposures to nitrogen dioxide and particulate matter pollutants in 470 neighborhoods in Oslo, Norway Long-term exposure to air pollution and type 2 diabetes mellitus in a multiethnic cohort Particulate Matter Air Pollution Exposure and Heart Disease Mortality Risks by Race and Ethnicity in the United States Air Pollution and Birth Weight Among Term Infants in California Influences of study design and location on the relationship between particulate matter air pollution and birthweight Fine particulate matter and risk of preterm birth in Connecticut in 2000-2006: a longitudinal study Fine particulate matter and risk of preterm birth and pre-labor rupture of membranes in Perth, Western Australia 1997-2007: a longitudinal study Risk estimates of mortality attributed to low concentrations of ambient fine particulate matter in the Canadian community health survey cohort Risk estimates of mortality attributed to low concentrations of ambient fine particulate matter in the Canadian community health survey cohort Are particulate matter exposures associated with risk of type 2 diabetes? Environ Health Perspect Particulate matter exposures, mortality, and cardiovascular disease in the health professionals follow-up study Chronic fine and coarse particulate exposure, mortality, and coronary heart disease in the Nurses' Health Study Ambient air pollution and preterm birth: A prospective birth cohort study in Wuhan, China Long-term exposure to fine particulate matter air pollution and type 2 diabetes mellitus in elderly: A cohort study in Hong Kong Wood-burning stoves and lower respiratory illnesses in Navajo children Indoor air pollution from solid fuels and risk of hypopharyngeal/laryngeal and lung cancers: a multicentric case-control study from India A case-control study of lung cancer in Casablanca, Morocco. Cancer Causes Control Modifiable risk factors for acute lower respiratory tract infections Ambient Fine Particulate Matter, Nitrogen Dioxide, and Term Birth Weight A case-control study on the effect of exposure to different substances on the development of COPD Risk factors for severe pneumonia in children in south Kerala: a hospital-based case-control study Coal use, stove improvement, and adult pneumonia mortality in Xuanwei, China: a retrospective cohort study Prenatal exposure to wood fuel smoke and low birth weight Effect of reduction in household air pollution on childhood pneumonia in Guatemala (RESPIRE): a randomised controlled trial Impact of London's road traffic air and noise pollution on birth weight: retrospective population based cohort study Associations of Pregnancy Outcomes and PM2.5 in a National Canadian Study Impact of reduced maternal exposures to wood smoke from an introduced chimney stove on newborn birth weight in rural Guatemala Ambient Particulate Matter Air Pollution Exposure and Mortality in the NIH-AARP Diet and Health Cohort Exposure to indoor biomass fuel and tobacco smoke and risk of adverse reproductive outcomes, mortality, respiratory morbidity and growth among newborn infants in south India Chronic disease prevalence in women and air pollution--A 30-year longitudinal cohort study Chronic exposure to particulate matter and risk of cardiovascular mortality: cohort study from Taiwan Long-term ozone exposure and mortality in a large prospective study Risk factors for pneumonia among children in a Brazilian metropolitan area Long-term exposure to fine particulate matter air pollution and mortality among Canadian women Association of subclinical vitamin D deficiency with severe acute lower respiratory infection in Indian children under 5 Long-term exposure to fine particulate matter: association with nonaccidental and cardiovascular mortality in the agricultural health study cohort Heinz Nixdorf Recall Investigator Group. Long-term exposure to fine particulate matter and incidence of type 2 diabetes mellitus in a cohort study: effects of total and traffic-specific air pollution Assessment and 2-year follow-up of some factors associated with severity of respiratory infections in early childhood Traffic-related air toxics and preterm birth: a population-based case-control study in Satellite-Based Estimates of Long-Term Exposure to Fine Particles and Association with Mortality in Elderly Hong Kong Residents Smoking and other risk factors for lung cancer in women Comparing exposure assessment methods for traffic-related air pollution in an adverse pregnancy outcome study Impact of biomass fuels on pregnancy outcomes in central East India Maternal exposure to carbon monoxide and fine particulate matter during pregnancy in an urban Tanzanian cohort Long-term Fine Particulate Matter Exposure and Nonaccidental and Cause-specific Mortality in a Large National Cohort of Chinese Men Long-term Fine Particulate Matter Exposure and Nonaccidental and Cause-specific Mortality in a Large National Cohort of Chinese Men Association of Solid Fuel Use With Risk of Cardiovascular and All-Cause Mortality in Rural China Association between biofuel exposure and adverse birth outcomes at high altitudes in Peru: a matched case-control study Improved Global Estimates of Fine Particulate Matter Concentrations and Trends Derived from Updated Satellite Retrievals, Modeling Advances, and Additional Ground-Based Monitors Global Estimates of Fine Particulate Matter using a Combined Geophysical-Statistical Method with Information from Satellites, Models, and Monitors Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution Half the world's population are exposed to increasing air pollution. Accepted by Nature Climate and Atmospheric Science Ambient Air Pollution Exposure Estimation for the Global Burden of Disease Data integration for the assessment of population exposure to ambient air pollution for global burden of disease assessment Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations Spatio-temporal downscaling for continental-scale estimation of air pollution concentrations Generalized additive models: an introduction with R Long-term ozone exposure and mortality in a large prospective study Long-term Fine Particulate Matter Exposure and Nonaccidental and Cause-specific Mortality in a Large National Cohort of Chinese Men All-cause mortality risk associated with long-term exposure to ambient PM2·5 in China: a cohort study Long term exposure to air pollution and mortality in an elderly cohort in Hong Kong Outdoor fine particulate matter air pollution and cardiovascular disease: Results from 747 communities across 21 countries in the PURE Study Modifiable risk factors, cardiovascular disease and mortality in 155,722 individuals from 21 high-, middle-, and low-income countries (PURE): a prospective cohort study. The Lancet An integrated risk function for estimating the global burden of disease attributable to ambient fine particulate matter exposure Cardiovascular Disease and Fine Particulate Matter: Lessons and Limitations of an Integrated Exposure Response Approach Impact of Aging on the Strength of Cardiovascular Risk Factors: A Longitudinal Study Over 40 Years Fine particulate matter concentrations in smoking households: just how much secondhand smoke do you breathe in if you live with a smoker who smokes indoors? Integrated science assessment (ISA) for particulate matter (Final Report Review of evidence on health aspects of air pollution -REVIHAAP Project technical report. Copenhagen: WHO Regional Office for Europe Millions Dead: How Do We Know and What Does It Mean? Methods Used in the Comparative Risk Assessment of Household Air Pollution Global household air pollution database: Kitchen concentrations and personal exposures of particulate matter and carbon monoxide Global estimation of exposure to fine particulate matter (PM2.5) from household air pollution Use of traditional cooking fuels and the risk of young adult cataract in rural Bangladesh: a hospital-based case-control study The Gambia Smoking Prevalence and Cigarette Consumption in 187 Countries Parental smoking and the risk of middle ear disease in children Active and passive smoking and risk of breast cancer: a meta-analysis The association between passive smoking and type 2 diabetes: a meta-analysis The Global Nutrient Database: Availability of Macronutrients and Micronutrients in 195 Countries from 1980 to 2013. The Lancet Planetary Health Guidelines for data processing and analysis of the International Physical Activity Questionnaire (IPAQ)-short and long forms World Health Organization. Global Physical Activity Questionnaire (GPAQ) Analysis Guide Physical activity and risk of breast cancer, colon cancer, diabetes, ischemic heart disease, and ischemic stroke events: systematic review and doseresponse meta-analysis for the Global Burden of Disease Study Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study Global burden of 87 risk factors in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study Global, regional, and national disability-adjusted life-years (DALYs) for 359 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study International Classification of Diseases (ICD). WHO Global, regional, and national sepsis incidence and mortality, 1990-2017: analysis for the Global Burden of Disease Study Measuring performance on the Healthcare Access and Quality Index for 195 countries and territories and selected subnational locations: a systematic analysis from the Global Burden of Disease Study a systematic analysis for the Global Burden of Disease Study Global mortality, disability, and the contribution of risk factors: Global Burden of Disease Study Vander Hoorn S. Comparative quantification of health risks conceptual framework and methodological issues Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990-2015: a systematic analysis for the Global Burden of Disease Study Food, nutrition, physical activity, and the prevention of cancer: a global perspective a systematic analysis for the Global Burden of Disease Study Sampling algorithms for generating joint uniform distributions using the vine-copula method Primary responsibility for this manuscript focused on: applying analytical methods to produce estimates Gregory A. Roth The final ratios were 0.64 95% CI (0.45, 0.91) for males and 0.85 95% CI (0.56, 1.31) for children. We used these results to scale the PM2.5 mapping model for these age and sex groups to input into the PM2.5 risk curves. We calculated the cardiovascular and gout fatal and non-fatal burden attributable to the categorical exposure to kidney dysfunction using the following equation: We measure physical activity performed by adults older than 25 years of age, for duration of at least ten minutes at a time, across all domains of life (leisure/recreation, work/household and transport). We use frequency, duration and intensity of activity to calculate total metabolic equivalent-minutes per week. MET (Metabolic Equivalent) is the ratio of the working metabolic rate to the resting metabolic rate. One MET is equivalent to 1 kcal/kg/hour and is equal to the energy cost of sitting quietly. A MET is also defined as the oxygen uptake in ml/kg/min with one MET equal to the oxygen cost of sitting quietly, around 3.5 ml/kg/min.