key: cord-0638799-23yg87iu authors: Bezrukov, Alexander V. title: Analysis of Regional Cluster Structure By Principal Components Modelling in Russian Federation date: 2020-10-20 journal: nan DOI: nan sha: 479d89ded72ec038cd8cece9d974606187f32eca doc_id: 638799 cord_uid: 23yg87iu In this paper it is demonstrated that the application of principal components analysis for regional cluster modelling and analysis is essential in the situations where there is significant multicollinearity among several parameters, especially when the dimensionality of regional data is measured in tens. The proposed principal components model allows for same-quality representation of the clustering of regions. In fact, the clusters become more distinctive and the apparent outliers become either more pronounced with the component model clustering or are alleviated with the respective hierarchical cluster. Thus, a five-component model was obtained and validated upon 85 regions of Russian Federation and 19 socio-economic parameters. The principal components allowed to describe approximately 75 percent of the initial parameters variation and enable further simulations upon the studied variables. The cluster analysis upon the principal components modelling enabled better exposure of regional structure and disparity in economic development in Russian Federation, consisting of four main clusters: the few-numbered highest development regions, the clusters with mid-to-high and low economic development, and the"poorest"regions. It is observable that the development in most regions relies upon resource economy, and the industrial potential as well as inter-regional infrastructural potential are not realized to their fullest, while only the wealthiest regions show highly developed economy, while the industry in other regions shows signs of stagnation which is scaled further due to the conditions entailed by economic sanctions and the recent Covid-19 pandemic. Most Russian regions are in need of additional public support and industrial development, as their capital assets potential is hampered and, while having sufficient labor resources, their donorship will increase. The social differentiation and disparity, as well as interregional disparity, is one of the central problems of the developed modern society. Of the development goals set by the United Nations Organization that must be achieved until 2030, we should point out the "Reduction of inequality within and between countries" [1] . It is hardly arguable that socio-economic disparity is one of the sources of instability of the territorial development of a country, and is one of the struggles to achieve sustainable development of its economy. The President of Russia, V.V. Putin, has stated in his message to the Federal Assembly in 2018: "a person, his present and future, is the main meaning and goal of our development", which then further resulted in the May Decree to preserve and increase the human capital. The Decree specifies a goal of reducing the poverty level in the Russian Federation by half, which, in the opinion of V.V. Putin, could enable a breakthrough in the development of the country and improve the life quality of its citizens. It therefore becomes necessary to perform in-depth exploration of regional data to expose the possible problems in socio-economic development and disparity in regional development, and to improve the decisions on both local and federal level. The regional modelling and forecasting is, evidently, one of the primary foci of efficient modern governing and decision-making for public organizations and the government in overall. It is essential for the government to provide integral infrastructure and innovative development with collaborating regions, putting geographic adjacency to good use in economic and financial development, being able to adequately assess and forecast the economic situation among regions. Considering the Russian Federation territorial diversity and spatiality, it would be necessary to apply a whole variety of economic characteristics in a model while attempting to simulate the regional development as a system, to highlight and play to its strengths and determine the points of growth. In author's opinion, the relevance of dimensionality reduction modelling methods in this situation becomes most apparent, as a large variety of economic characteristics of regional development and the number and diversity of Russian regions would produce mixed results in clustering and cluster modelling methods of analysis. It appears that the principal component modelling would be a very appropriate method to simulate the economic regional development, as the principal components would allow to achieve the following theoretical and practical purposes: -reveal the cluster structure of regions in the dissection of economic development and the disparity existing among the regions; -highlight the main factors influencing the interregional variation of sustainable growth characteristics; -allow to obtain a model of behavior of the regional characteristics of their socio-economic development, to forecast their development; -gain insight into the disparity causes among the regions in terms of sustainable economic growth and draw the possible suggestions to certain aspects that require tackling. The presented paper focuses on obtaining a component model of Russian regional development upon the system of 19 economic indicators of activity by 85 regions: The initial data were aggregated from official sources by Rosstat, while the missing values for certain regions replaced by variable mean levels, and the dataset was subsequently normalized for purposes of principal component modelling by means of the R package. In its own term, the principal component modelling can be considered in the following stages. 1) Understanding the structure of the data and the variables; 2) Obtaining the component model and determining its parameters; 3) Evaluating the quality of the model and interpreting it. The parallel coordinates plot (Fig. 1 ) and the heatmap (Fig.2) allow to state the following findings. The subsequent stage would be to perform the principal components modelling. Let us remind that the essence of the principal components is to reduce the dimensionality of the initial dataset to a new set of variables, each being a linear function of the initial parameters, with the following properties, essential to the goal of the present task: -the principal components are uncorrelated and represent a coordinate system of orthogonal axes; -the principal components describe the behavior of the initial variables, rather than individual observations; -the principal components factor loadings are the correlations with the initial variables which are considered as the effect of these higher-order factors; thus, principal components are interpreted upon the factor loadings. Having performed the principal component analysis, it was necessary to decide upon the number of principal components to be considered, which has been done upon the scree criterion ( Fig. 4 ) and the eigenvalues / explained variance percentage ( Table 2) . For principal components modelling the general guideline is to select the ones with eigenvalues higher than 1, while at the same time considering the scree plot inclination. Thus, five principal components were selected, describing 75,18% of behavior of the initial variables (see Table 2 ). The coefficients for the principal axes are presented in the Table 3 , representing the factor loadings of the component model. In order to validate the principal components model, the comparative hierarchical clustering procedure has been performed upon the initial dataset and the factor coordinates of observations (regions) in the obtained principal axes system. The complete linkage algorithm was applied with normal Euclidean distance as the metric (Fig. 7) . It is readily observable that the cluster structure of the dataset has been preserved in the component modelling with minor differences, with clusters being somewhat more pronounced and alleviated. The principal components axes can therefore be possibly interpreted as follows: f1capital investments potential; f2demographic load; f3economic ability of population; f4demographic capacity; f5labor resources and labor reward (negative). The performed cluster analysis enables to identify four clusters among the regions of Russian Federation. Upon the regional clusterization, the structural and descriptive statistical analysis has been performed. The first cluster is composed of the regions of Russia with the worst economic development and least provided population: The Republic of Dagestan, The Republic of Ingushetia, Kabardino-Balkar Republic, Karachay-Cherkess Republic, Chechen Republic, Altai Republic, and Tyva Republic. For these regions, the average per-capita cash income was 19864,86 rub., which is 42% below the country average. These regions are also characterized by the astoundingly small Gross Regional Product per capita and the volume of investments in fixed assets, which are 93,9% and 90,6% below the country average, respectively, and the highest proportions of population with cash incomes below subsistence level (24,44%). In terms of principal components analysis, these regions have the lowest capital investments potential (1,47 times below the country average), and the economic status of the population (1,2 times below the country average), as well as 2,96 and 2,4 times worse demographic situation and production capacity. However, these regions display certain availability of labor resources and their supplication, but are unable to effectively apply them, as the unemployment rate in these regions is 3 to 5 times greater than in other clusters (Table 4 ). The second cluster is composed of the regions of Russia with the highest economic development, including the regions of Moscow city, Tyumen region, Khanty-Mansi Autonomous District, and Yamalo-Nenets Autonomous District. These regions have ample economic potential by resource extraction (such as gold, coal, oil and gas), while Moscow city is the capital of Russian Federation. These regions are characterized by the GRP per capita 2.18 times and the investments in fixed assets 2.21 times greater than the country level; the average cash income per capita among these regions is 56915.75 rub. which is 64% above the country average and the highest among the four clusters. These regions also share the highest Gini coefficients in the country, the cluster average being 41%. The proportion of population below subsistence level in these regions is relatively small, amounting to 9.12% which is 39% below the country average. These regions are characterized with the highest production capacity and the highest labor reward (Table 5) . The third cluster is constituted by the regions with high and mid-to-high economic development, and includes, notably, the Moscow Region, the St.Petersburg city and region, the Republic of Tatarstan, the Rostov region, the Samara region, the Republic of Sakha and some other. These regions have the Gini coefficient of 39%, the average cash income per capita being 35974.97 which is approximately at the country average level (4% higher). These regions, however, share smaller GRP per capita and smaller fixed capital investments (approximately below 50% of the country average). In terms of principal factors analysis, the economic ability among these regions is therefore better than average, but lower demographic capacity and labor rewards (the value of these principal factors is about 3 times lower than the country level). It is worth noting that the regions in this cluster also share the highest skewness of fixed capital investments and the average cash income, as most people in these regions are characterized by below-cluster-average income levels ( The fourth cluster includes the regions with mid and low-to-mid economic development, and includes the regions: Bryansk Region, Vladimir Region, Ivanovo Region, Kostroma Region, Kursk Region, Oryol Region, Ryazan District, Smolensk Region, Tver Region, Tula Region and others. The regions in this cluster are characterized by 25% below country average personal cash income (25915.36 rub.), and 15% population share with cash income below subsistence level which corresponds to the country average. The GRP per capita in these regions and the volume of investments in capital assets are 75% and 80% lower than country average, as well as lower demographic and labor resources capacity principal factors by 3.2 and 3.4 times. Within this cluster, the population income distribution is similar, as Gini coefficient equals to 36% among them and is slightly below the country average level (Table 7) . The results of analysis enable to state the following. The Russian Federation regions share distinctively non-uniform socio-economic development, and while the cluster structure is mixed due to high variation of the 'raw' parameters of territorial economic level, it is exposed better in performing the principal components analysis while maintaining the same cluster structure. The wealthiest regions of Russian Federation remain its capital and central ones, as well as the regions with core resources extraction industries (such as Moscow, St. Petersburg, Yamalo-Nenets, Tyumen, Chukotka Autonomous Districts). These regions share above-average population income and high capital investments, labor market competition is high within their territories and these regions are the general 'acceptors' of labor force. The results of principal component modelling of Russian Federation regions by a set of 19 socio-economic parameters have enabled higher prominence of the cluster structure while allowing some space for certain variation of regional economic activity parameters. In conclusion, the proposed principal component model and the PCA procedure in general can be applied as a routine for cluster modelling of regional development. Generalizing the results of the principal components modelling, makes it additionally possible to formulate the following conclusions: the other clusters aside from the afore-mentioned ones tend to be distinctively different from the highest-performing ones, lacking capital investment potential, the industrial growth points in them do not appear to have recovered from economic impacts that started between 2008 and 2015, including the sanctions that followed in 2014 and 2015, and while being competitive in terms of labor resources the economic potential in them shows signs of stagnation, and we are unlikely to see any economic improvement within them up until 2022-2025. The Covid-19 pandemic has further impacted the economic disparity existing within and among the regions of Russian Federation, and this poses further tasks for the government to alleviate its effects as well. The income differentiation in the wealthiest regions is also likely to continue increasing; these regions are accumulating more wealth and the concentration of monetary and financial assets. In addition upon the cluster analysis and the principal components analysis results it is possible to be observed that the Russian regions lack inter-territorial infrastructural integrity, as the neighbouring regions do not tend to achieve any synergic effect from being co-located and are diverse in terms of industrial areas. The cluster analysis and regression modelling are heavily dependent upon exact similarities and differences by several parameters among the observations, which in certain occurrences may produce ambiguous or uncertain results when obtaining the regional cluster structure. This problem is even more evident where there exists the multicollinearity, while the initial parameters are numerous and their variation is distributed evenly among the studied regions. Reduce inequality within and between countries. UN SDGs Provision of affordable and comfortable housing and utilities for citizens of the Russian Federation The national project "Housing and Urban Environment Ensuring sustainable reduction of housing unsuitable for living Comprehensive observation of living conditions of the population Federal statistical observations on socio-demographic problems Statistical studies of Russia's socioeconomic development and prospects for sustainable growth: materials and reports State Corporation -Fund for Assistance to the Reform of Housing and Communal Services Analysis of the development of the regions of the Russian Federation according to the main indicators of the socio-demographic situation Foreign experience in the overhaul of apartment buildings using innovative mechanisms Studying Cross-Country Differences in Innovation Potential Housing and communal services Control. Teaching aids. Housing Education Statistical studies of the socio-economic development of Russia and prospects for sustainable growth: materials and reports Statistical analysis of a generalized integral indicator of the socio-economic situation of the constituent entities of the Russian Federation Results of operational monitoring of socio-economic development of Russia and the regions of Russian Federation Analytical notes. Results of operational monitoring of socio-economic development of Russia and the regions of Russian Federation This research was performed in the framework of the state task in the field of scientific activity of the Ministry of Science and Higher Education of the Russian Federation, project "Development of the methodology and a software platform for the construction of digital twins, intellectual analysis and forecast of complex economic systems", grant no. FSSW-2020-0008.