key: cord-0067825-6pbzgpvg authors: Lakman, I. A.; Askarov, R. A.; Prudnikov, V. B.; Askarova, Z. F.; Timiryanova, V. M. title: Predicting Mortality by Causes in the Republic of Bashkortostan Using the Lee–Carter Model date: 2021-09-23 journal: Stud Russ Econ Dev DOI: 10.1134/s1075700721050063 sha: a8c2aa70f892844872d3004a969308f641be3a52 doc_id: 67825 cord_uid: 6pbzgpvg This paper analyzes and predicts age-sex mortality rates by causes in the Republic of Bashkortostan. The following methods of analysis were used: the Lee–Carter model, singular value decomposition, and ARIMA-modeling. The forecast results suggest that by 2025 the Republic of Bashkortostan will have lower mortality due to malignant neoplasms in all age groups, except for the 70+ group for women and 50+ age groups for men; lower mortality due to diseases of the circulatory system in all age groups for men and higher mortality in 45+ age groups for women; lower mortality due to injuries in all age groups for both sexes; no significant changes in mortality due to respiratory diseases; increased mortality from gastrointestinal diseases for both sexes at all ages, except for children; higher mortality due to infections at 20–54 for men and 20–64 for women; and almost half lower mortality from infections in the age group of 0–4 years for both sexes. Interest in predicting mortality is motivated by the need for its reduction and so achieving higher quality of life indicators, to maintain the proportion of the active population, and many other reasons, including the need to create ways to reduce mortality and more efficient management of pension obligations. The problem of obtaining accurate mortality forecasts has been aggravated by significant aging of the population in economically developed countries. Demographic models used to predict mortality are usually based on statistical modeling of historical data and often do not take into account the structure of mortality, that is, its distribution by causes. At the same time, information on mortality by causes can potentially be used by government agencies implementing national medical programs to increase life expectancy and by insurance companies and pension funds in the management of insurance rates and pension obligations. There are many approaches used to predict the mortality rate of the population: a good review is given in [1] . The most popular mathematical tool for predicting age-specific mortality rates is the model developed by demographers R.D. Lee and L.R. Carter in 1992 to predict mortality in the United States [2] . There are also modifications of this model, for example, the Renshaw-Haberman model [3] , the Lee-Miller model [4] , or the Kuo-modified Lee-Carter model [5] . More recent works combine machine learning tools and the Lee-Carter model, for exam-ple, [6] . A good analysis of the applicability of modern modifications of the Lee-Carter model is given in reviews [7, 8] and the algorithms for applying these models are described in detail in [9] . A special feature of mortality forecasting methods based on the Lee-Carter model or its modifications is the ability to take into account the age-sex structure of the population. These methods are widely used to predict mortality rates in various countries: Italy [10] , Australia [11] , Kazakhstan [12] , and Ukraine [13] . It should be noted that the construction of the Lee-Carter model requires a rather long period of observing mortality rates by age groups, for example, in [14] the authors use the interval of 1900-2015 to "train" the mortality model for France. Russian demographers used the Lee-Carter model and its modifications to predict the mortality rate of the population in the country as a whole [15, 16] and its particular regions [17, 18] . The Lee-Carter model is also used to predict mortality in individual age groups [19] or subpopulations [20] . Much less often, the Lee-Carter model and its modifications are used to predict mortality by causes. This is largely because the construction of such models requires rather "long" dynamics of mortality rates by causes for each of the age and sex groups, and the availability of such data is very low. However, the fact that in recent years statistical agencies have been digitizing data stored on physical media made it possible to carry out studies on predicting mortality rates by causes for each of the age and sex groups. So, a study from 2014 [21] forecasted mortality from coronary heart disease in England and Wales up to 2030. More comprehensive forecasting studies were carried out in 2018 using data on the population mortality in 2006-2015 in the Republic of Iran [22] , as well as in 2019 based on mortality data for the male population of the US in 1970-2015 for seven major causes of death (cancer, diabetes, external causes, influenza, mental illness, nephritis, and vascular diseases) [23] . Sometimes studies using the Lee-Carter model additionally consider the effect of macroeconomic fluctuations on trends in causes of death [24] . The above review makes it possible to conclude that the application of the Lee-Carter model represents a fairly new approach to the study of population mortality by causes, the advantage of which is higher accuracy of predictions due to different manifestations of different categories of diseases in men and women in the context of age groups. To date, for the Russian Federation and separately for its regions, no detailed analysis of the mortality rate by causes using the Lee-Carter model has been carried out. This is partly due to the need to use sufficiently long time series as input data. Another limitation for using this approach is the possible incompatibility of the initial data, since over such long periods of time, first, classifications of diseases changed, and second, it is questionable whether the adherence to these classifications was correct. The purpose of this study is to analyze and predict mortality rates by causes for men and women taking into account the age structure of the population using the Lee-Carter model and data for the Republic of Bashkortostan (RB). Materials and methods. To analyze and predict mortality rates of the population by causes, we used data of the Territorial Body of the Federal State Statistics Service for the Republic of Bashkortostan collected in the period of 1990-2018 by five-year age-sex groups and causes of death. Because of data gaps and the relatively short length of the interval (only 28 time points for the period from 1990 to 2018), the indicators for the age groups 70-75, 75-79, and 80+ were combined into one group of 70+. The age-specific mortality rates were determined for the following six (j = 1, 2, …, 6) aggregated causes of death (the corresponding ICD-10 chapters are given in parentheses 1 ): -Malignant neoplasms, MN (chapter II). -Diseases of the circulatory system, DCS (chapter III). -Injuries (chapter XIX). -Diseases of the respiratory system, DRS (chapter X). -Diseases of the digestive system DDS (chapters IV and XI). -Infections (chapter I). Age-specific mortality rates were calculated for men and women separately in five-year intervals from 0 to 70+ years as follows: where is the average annual number of men/women in the age group х; is the absolute number of men/women who died due to cause j in age group x. As a method for analyzing the dynamics of mortality, we used the Lee-Carter model in which the logarithm of the observed mortality due to cause j for age x and year t is represented as a linear combination of time-independent age components α x and the timevarying parameter k t with the coefficient β x . where is the common component for age-specific mortality rates due to cause j, is the index of change in the mortality rate due to cause j in period t (the death index by causes), is used to describe the trend of changes in the mortality rate due to cause j at age x as the total mortality rate changes, and is a random component. Despite the fact that the Lee-Carter model is essentially a linear regression model, its estimation is complicated by the fact that the regressor is also unknown. To address this problem, Lee and Carter proposed an approach based on singular value decomposition (SVD) of the matrix centered on the values of the mean age components . It is of practical interest for decision-making to predict the resulting time series of values of the vector k for each cause of mortality. For this, at the second stage of the study, the ARIMA (p, d, q) model was estimated for the obtained indicator for each cause of mortality, where p is the order of autoregression, d is the order of taking differences of the process, and q is the order of the moving average. At the third, final stage of the study, predictive estimates of the mortality rate for separate causes for both sexes and separately for men and women were calculated. The ARIMA (p, d, q) model for each time series was constructed in accordance with the five-step procedure of Dolado, Jenkinson, and Sosvilla-Rivero [25] . Data processing and modeling were carried out using the R programming language version 3.5.3 in the = ×1000, RStudio software environment using the "ilc" modeling package for Lee-Carter mortality models [26] . Based on the collected data, we estimated mortality rates of age groups for the six causes: malignant neoplasms (MN), diseases of the circulatory system (DCS), injuries, diseases of the respiratory system (DRS), diseases of the digestive system (DDS), and infections. Table 1 shows the most interesting results of assessing the Lee-Carter mortality rate , which is the ageaveraged effect of time. In particular, the data in Table 1 show that the rate of mortality due to MN decreases for men and women in the period from 1990 to 2018, but for men it is uneven: in 2010, 2012, and 2018 there is a strong decrease in the indicator relative to a smoothed trend. The rate of mortality due to injuries is the same for men and women: in the period from 1990 to 1994 it was increasing, then until 1996 it was stable, and since 1997 it started to decline. The mortality rate due to diseases of the circulatory system remained almost constant from 1990 to 2018 for both sexes, but it can be noted that the rate is more volatile for men than for women. From 1990 to 2012, the Lee-Carter rate of mortality due to DRS was decreasing, but in the period of 2013-2018 it was slowly growing. In contrast to other indices, the mortality rate due to diseases of the digestive system was increasing for both sexes between 1990 and 2018, while the rate of mortality due to infections was decreasing for men and increasing for women. As a result of testing the initial time series for the type of process according to the five-step procedure of Dolado, Jenkinson, and Sosvilla-Rivero, which is based on the extended Dickey-Fuller test, the stationarity of the time series of the rate of mortality due to DCS was found for both sexes. For the dynamics of the rate of mortality due to DRS for men, as well as for both sexes due to MN, injuries, DDS, and infections, the stationarity was found for the series represented by the first differences in the initial series, and for mortality in women due to DRS by the second differences 2 . Based on the values of , , and determined according to the Lee-Carter model, the age-specific mortality rates were reconstructed for six causes of death. Tables 2-7 show logarithms of the age-specific mortality rates from six causes for men and women separately. The data in Table 2 show that the rate of mortality due to malignant neoplasms generally increases with age, it can also be seen that male mortality due to malignant neoplasms significantly decreased in the past five years. The study of age-specific rates of mortality due to diseases of the circulatory system (Table 3) made it possible to conclude the following: in the period from 1990 to 2018, this cause of death for men became more common in recent years, for women, on the contrary, less often; mortality due to this cause for men starts noticeably increasing in the age group of 55-59 years and sharp increases in the 65+ group; for women, a sharp increase in mortality due to this cause is observed in the age group of 65-69 years. The age-related mortality rates due to injuries (Table 4) show that injuries as a cause of death became less common in the last decade; among women, this cause of death is five to six times less common than among men; in the age group of 0-4 years, the rate of mortality from injuries in relation to later childhood is high; mortality from injuries starts sharply increasing at the age of 15-19 years (twice as high for men than for women); the highest number of deaths due to injuries occurs in the age groups of 45-49 and 55-59 for men and women, respectively. Analysis of age-specific rates of mortality due to respiratory diseases (Table 5) shows that this cause of death became less common in the last decade; this cause of death is three times less common among women than among men; in the age group of 0-4 years, mortality became noticeably less frequent in the last decade, for children in the period of life up to 12 months the mortality in the period from 1990 to 2018 decreased tenfold, which is largely due to the results of the program of supporting children in the neo-and postnatal periods of life; mortality in men starts sharply increasing at the age of 55+ years and in women at the age of 65+ years. Interesting and mostly similar results were shown by the analysis of age-specific rates of mortality due to diseases of the digestive system and as a result of infec- tious diseases (Tables 6-7) : these causes of death for men became two to three times more common in recent decades compared to the 1990s; with age, the mortality rates for these causes increase, and they have the same local maxima: for men, in the age group of 60-64 years; for women, in the age group of 55-59 years. To predict age-specific rates of mortality due to six causes, the indices were predicted, which were then substituted into the Lee-Carter model with previously estimated coefficients j t k α j x and for six causes of death. For almost all causes, the ARIMA model was used for both sexes together and for men and women separately (1, 1, 0) . The exceptions were the rates of mortality due to DCS (the ARIMA model (0, 0, 1) for men and ARIMA (1, 0, 0) for women were used) and the rates of mortality in women due to DRS (the ARIMA model (0, 2, 1)). As a result, forecasts of age-specific rates of mortality from six causes were obtained up to 2025. 2025 in comparison with the data of 2018 for men and women separately. It should be noted that the logarithms of mortality rates due to "MN" for women in 2018 practically do not differ from the corresponding forecasts for 2025 (Fig.1) , as well as the logarithms of the mortality rates of men from DCS in 2018 do not differ from the forecasts for 2025 (Fig. 2) . As seen in Fig. 1 , by 2025, a decrease in mortality due to malignant neoplasms is expected in all age groups except for the 70+ group for women and the 50+ age groups for men. Figure 2 shows that the rates of mortality due to diseases of the circulatory system by 2025 will decrease in all age groups for men, except for the 0-1 year group, while for women, on the contrary, mortality from DCS is expected to increase in the 45+ age groups. Mortality due to injuries (Fig. 3) will decrease by 2025 for all age groups, the largest decrease in this indicator is predicted for the age group of 55-59 years for women and 45-49 years for men. By 2025, mortality due to respiratory diseases in the Republic of Bashkortostan (see Fig. 4 ) will not change significantly for either women or men. A significant change in mortality due to diseases of the digestive system should be expected by 2025 (see Fig. 5 ). For children, mortality rates are projected to decrease for both sexes; for all other age groups, mortality from gastrointestinal diseases is expected to increase; the highest increase in this indicator is expected at the age of 60-64 years for men and 55-59 years for women. A significant change in mortality due to infectious diseases is also expected for the Republic of Bashkortostan by 2025 for both sexes (see Fig. 6 ). In the ages of 20-54 years for men and 20-64 years for women, the mortality rate due to this cause is predicted to increase, where the maximum increase in the mortality rate (by almost 40%) is expected in the ages of 35-39 years. On the contrary, a decrease in mortality from infections for both sexes by almost half should be expected by 2025 in the age group of 0-4 years. Conclusions. First, the variety of diseases that individuals encounter during their life, second, the degree of readiness of the body to fight these diseases in different periods of life, and third, the continuous improvement of treatment methods determine the rel- This study made it possible to determine significant changes in the structure of mortality in the Republic of Bashkortostan due to the main causes: a decrease in mortality from malignant neoplasms in the last five years and from injuries and respiratory diseases in the last ten years, an increase in mortality from diseases of the digestive system and infections (mainly due to the higher incidence of chronic hepatitis B and C [27] ), and opposite changes in mortality from diseases of the circulatory system in men and women. Thus, the construction of Lee-Carter models makes it possible to carry out more meaningful analy- sis of mortality, take into account opposite trends in morbidity at different ages, and obtain more accurate forecasts of changes in life expectancy. It should be noted that the forecast estimates were obtained using the data of the "pre-COVID" period. The COVID-19 pandemic is likely to significantly change the structure of mortality rates. In this case, the obtained forecast results can be used to assess the so-called "leverage" in the structure of mortality: the difference between what has happened and "what would have happened if there were no COVID-19 pandemic." Mortality modeling perspectives Modeling and forecasting U.S. mortality On the forecasting of mortality reduction factors insurance Evaluating the performance of the Lee-Carter method for forecasting mortality A jump-diffusion model for option pricing Application of machine learning to mortality modeling and forecasting Mortality and life expectancy forecast for (comparatively) high mortality countries Lee-Carter mortality forecasting: A multi-country comparison of variants and extensions General procedure for constructing mortality models The mortality of the Italian population: Smoothing techniques on the Lee-Carter Model Applying Lee-Carter under conditions of variable mortality decline Forecasting mortality for Kazakhstan using the Lee-Carter model Application of formal methods to predicting population mortality Evaluation of the Kou-Modified Lee-Carter model in mortality forecasting: Evidence from French male mortality data Predicting mortality in Russia using the Renshaw-Haberman actuarial stochastic model Stochastic models for smoothing and predicting mortality rates Forecasting agespecific mortality rates by the Lee-Carter method The Lee-Carter method for mortality forecasting: The case of the Republic of Bashkortostan Age distribution, trends, and forecasts of under-5 mortality in 31 sub-Saharan African countries: A modeling study Coherent forecasts of mortality with compositional data analysis Future declines of coronary heart disease mortality in England and Wales could counter the burden of population ageing Trend forecasting of main groups of causes-of-death in Iran using the Lee-Carter model A forecast reconciliation approach to cause-of-death mortality modeling Explaining mortality dynamics: The role of macroeconomic fluctuations and cause of death trends Cointegration and unit roots ilc: Lee-Carter Mortality Models Using Iterative Fitting Algorithms. R Package Version Forecast of the socio-economic burden of chronic viral hepatitis C (genotype 1) in the implementation of various scenario forecasts of its spread in the Republic of Bashkortostan This study was carried out within the framework of the state assignment of the Ministry of Science and Higher Education of the Russian Federation, research topic code FZWU-2020-0027.