key: cord-0135991-sfwzi741
authors: Reisch, Tobias; Heiler, Georg; Diem, Christian; Thurner, Stefan
title: Inferring supply networks from mobile phone data to estimate the resilience of a national economy
date: 2021-10-11
journal: nan
DOI: nan
sha: 95bf1ae4e84ac4c239326595990754558689a852
doc_id: 135991
cord_uid: sfwzi741

National economies rest on networks of millions of customer-supplier relations. Some companies -- in the case of their default -- can trigger significant cascades of shock in the supply-chain network and are thus systemically risky. Up to now, systemic risk of individual companies was practically not quantifiable, due to the unavailability of firm-level transaction data. So far, economic shocks are typically studied in the framework of input-output analysis on the industry-level that can't relate risk to individual firms. Exact firm-level supply networks based on tax or payment data exist only for very few countries. Here we explore to what extent telecommunication data can be used as an inexpensive, easily available, and real-time alternative to reconstruct national supply networks on the firm-level. We find that the conditional probability of correctly identifying a true customer-supplier link -- given a communication link exists -- is about 90%. This quality level allows us to reliably estimate a systemic risk profile of an entire country that serves as a proxy for the resilience of its economy. In particular, we are able to identify the high systemic risk companies. We find that 65 firms have the potential to trigger large cascades of disruption in production chains that could cause severe damages in the economy. We verify that the topological features of the inter-firm communication network are highly similar to national production networks with exact firm-level interactions.

networks. Only for very few countries buyer-supplier relations are known on a granular level of individual companies from which the supply-chain networks can be constructed. For Hungary value-added tax (VAT) data exists that specifies which company pays VAT to another. From this the exact national supply-chain network has been reconstructed [31] , containing more than 89,000 companies and 235,000 business relations (links). Using estimations for production functions for these companies makes it possible to obtain the national production network. Using this as an input, firm level systemic risk for all individual companies were computed by using an appropriately designed SR measure, the Economic Systemic Risk Index (ESRI) [18] . It is a network-based measure to estimate the fraction of the total production output (goods and services) of the economy that is affected by a firm's (short-term) failure.

However, the Hungarian data is an exception. Granular and exhaustive datasets on the supply network of an entire nation are notoriously hard to obtain. Data exists only for a handful of countries, Japan [1] , Belgium [2] , Brasil [32] , and Hungary [31] . Customer-supplier relations are inferred either from surveys and business intelligence [1] , payment system data [32] , or VAT data [2, 31] . Survey data is typically very costly to collect and suffers from being outdated, highly incomplete, unweighted, and hard to verify [30] ; on the other hand, payment system and tax data -in countries where it is collected-is sensitive and access is highly restrictive.

In this work we propose an alternative approach to reconstruct the supply-chain network by using the multilayer network structure of firm-to-firm relations. We assume that companies that communicate with each other also entertain customer-supplier relations. We thus focus on two network layers, the flow of goods and services that constitute supply relations and the mobile phone communication between companies. Figure 1a schematically depicts the two-layer network. The communication layer (blue) shows the mobile devices belonging to one firm, calling devices in other firms. The supply layer (orange) represents the flow of intermediate products (or services) between firms. In Fig. 1b we show the same situation by showing a communication link c ij (blue) between firm i and j if they had at least one phone call within a certain time period and a supply link s ij (orange) if goods or services flow from firm i to j. Note that communication links are undirected, supply links are directed.

The coordination of a customer-supplier relation, such as ordering, negotiating prices, or organizing shipping, requires communication between firms and has been studied intensively [33, 34] . We thus expect the existence of strong link-correlations between the communication and supply layers. From the multilayer network in Fig. 1b we define the conditional probability, p(s ij |c ij ), to find a supply link, s ij , between firms i and j given that a communication link, c ij , exists, and vice versa, the conditional probability, p(c ij |s ij ), to observe a communication link given that a supply link exists, see Fig.  1c .

Albeit strong legal regulation telecommunication data has been accessible to researchers since more than a decade. Mobile phone data in the form of call detail records (CDRs) that are collected by mobile phone operators for billing purposes have been used to study communication networks and the behavior of millions of people [35] , leading to spectacular insights into the structure of human communication and organization [36, 37] , human behavior in emergency situations [38] , the spread of infectious diseases [39, 40] and the principles of human mobility [41] [42] [43] . CDRs allow for population-wide coverage, granular resolution of interactions on the person level, and the possibility to be combined with information, such as age and gender. Even though possible, inter-firm or organisational networks have so far not been studied systematically with mobile phone data.

Probability p(s|c) to find a supply link, sij, given that there exists a communication link, cij, between firms i and j for communication links exceeding a given call duration, dij. Error bars denote the quartiles of a bootstrap simulation described in SI Text 1. (b) Cumulative distribution function p(ki > k) for the degree k of the RSN (blue dots), HSN (orange x's) and HCN (green pluses). The degree distribution of the HSN is much more similar to the RSN than the HCN. Errorbars denote the quartiles of a bootstrap simulation described in SI Text 1.

Through a cooperation with a large mobile phone provider we have access to a dataset of CDRs in a medium-sized European country that allows us to identify groups of phones that are associated with a company through anonymized billing information, for details, see Materials and Methods. The dataset contains additional information on the firm's primary industry classification and balance sheet information. In Fig. 1d we show the corresponding firm-to-firm communication network (FCN) as obtained from our data. Firm locations are shifted by random distances (on average 30km) to ensure the anonymity of companies. Arcs in the figure represent communication links between firms. We find many shortrange interactions within one city or economic region and few long-range interactions. We are intentionally vague with regards to information concerning the mobile phone provider because we are contractually bound to ensure its anonymity, as well as to protect sensitive business information such as the exact market share in the businessto-business market.

Here we demonstrate that phone data can indeed be used to reasonably reconstruct supply networks that allow for a meaningful estimation of firm-level economic systemic risk of an economy. The method is an efficient alternative to survey, tax, or bank transactions estimates. It uniquely allows us to study supply networks and monitor economic systemic risk in real time and provides a nearly complete overview of a nation's production network.

Conditional supply-link probability. We determine the conditional supply-link probability p(s|c) by comparing the firm communication network, shown in Fig. 1d , with ground-truth information on the real customer-supplier relations, obtained from a nation-wide survey in April 2020. In the online survey more than 100,000 companies and businesses were asked to share their ten most critical suppliers and customers, respectively. More than 5,900 firms declared at least one supplier or customer with a total of more than 17,000 customer-supplier relations reported. For details on the survey, see SI Text 1. We obtain the overall probability that a supply link exists between two companies, given that they had at least one conversation event in the observed time period of approximately 150 days, is p(s|c) = 0.19. For the conditional communication probability we get p(c|s) = 0.27. For comparison, the respective marginal probability from the firm communication network directly is p(c) = 0.002. For the linking probability --using Hungarian data--we get p(s) = 0.00005. Since both values are orders of magnitude smaller than the conditional probabilities, highly significant link correlations between the supply and communication layers are indicated.

The conditional link probability increases with the intensity of the firm-firm communication. As a proxy for the latter we use the average daily call duration,d ij , in seconds per day. In Fig. 2a p(s|c) is shown as a function ofd ij (red). The number of links used to calculate the overlap is shown in blue. p(s|c) rises from 19% to values around 70% ford ij = 30s/d and around 90% for 60s/d. The number of links reduces from 75 to 14 links asd ij increases. Note that errors do not increase, because a higher probability is associated with a smaller error. For details of the computation and errorbars, see SI Text 1.

For the supply network (p(c|s)) the best proxy for tie strength would be the amount of traded goods. However, this information is not available, so we estimate the link weight as the product of the firm's sizes. Here, to stay consistent on the communication data, we proxy the firm size with the number of devices associated with a firm. Supplementary figure 6 shows p(c|s) for the networks thresholded by the number of devices per firm in red and the number of links in the underlying sample in blue. We find an increase from 27% to around 60% for the network of firms with 4 or more devices. For thresholds larger than 4, the curve levels off and stabilizes around 70% for thresholds of 6 or more devices. The number of links drops as in Fig. 2a , but again, the error-bars are still sufficiently small.

Reconstructing the supply network. For obtain-ing an estimate of the supply network, based on the FCN, we chosed ij = 30s/d, with the aim to balance the loss of information due to ignored supply links and increasing link correlations due to the thresholds. This particular threshold is the result of a minimization of the Kullback-Leibler divergence for degree distributions of the HSN and thresholded FCNs, described in SI Text 2. We arrive at an unweighted and undirected reconstructed supply network (RSN). To get an estimate for the link directions (firm i supplies j or vice versa), we use classical inputoutput tables of the national statistical office. They contain information on the volume of trade between economic sectors in the economy. An element of the inputoutput table, G ab , describes the flow of goods (in Euro) from sector a to sector b. We denote the number of links (firm-firm supply relations) from sector a to sector b by L ab and assume that the ratio of links from one sector to the other is proportional to the ratio of goods flowing between these sectors, L ab /L ba ≈ G ab /G ba . For example, the flow between the agricultural sector (a) and the food industry (f ) is G af ≈ 3, 400me, while the food industry sold goods for G f a ≈ 450me to the agricultural sector. We now assume that it is 3, 400/450 ≈ 7.6 times more likely that a supply link points from a firm a to one in f . We now consider every link from firm i in sector a to firm j in sector b in the RSN and assign it a direction according to the probability

Since we perform this assignment stochastically, we should think in ensembles of RSNs. Finally, we estimate a supply-link weight for every link in the RSN. We use the companies' total assets, calculated from the balance sheets, as size information, s i ; it is obtained from a commercially available business intelligence database, see Materials and Methods. Following the philosophy of "gravity models" in economics, we assume that large and small firms typically trade large and small volumes, respectively [44] . Therefore we obtain a link weight estimate between firms i and j as the product of firm sizes, W ij = s i s j . We will use only relative weights in the following.

Comparing network topologies of supplychains, firm-firm communication, and human communication. It is enlightening to compare the network topology of the so-obtained RSN (blue) with the topologies of the Hungarian supply network (HSN) (orange) (for which exact topology is known [18] ) and the private communication network between individual people (green) (i.e. not between companies). Figure  2b shows the degree distribution of the RSN (blue) in comparison to the exact Hungarian supply network (HSN) derived from VAT data [31] . Both networks are similar and fat tailed, in contrast to the human communication network (HCN) that was obtained from the mobile phone data set. The RSN has an average Toy example of a production network with eight firms of the same sector and size. After the default of an initial node (red X) its customers (suppliers) have to reduce their production level according to the share of inputs (supply) they lost. This logic is iterated until a stable configuration is reached, the relative share of economic activity (production) lost is the initial node's ESRI. Economic Systemic Risk. With a reasonable re-construction of the supply network, RSN, we turn to the quantification of economic systemic risk in the national production network. For quantification we use the economic systemic risk index (ESRI) as developed in [18] . The algorithm of the ESRI is sketched in Fig. 3a in an example with seven firms of equal size within the same industrial sector. The ESRI of firm i, assumes that if firm i (red cross) cannot operate for some time (e.g. defaults) it neither supplies nor demands inputs. The customers and suppliers of i then reduce their production accordingly, causing a successive reduction of production in their customers and suppliers. This recursive reduction converges to a state where all firms have reduced their level of production in response to i's default. Figure 3a shows the relative reduction for every firm. The fraction of total economic activity lost is the ESRI i of firm i. We use the ESRI algorithm with a generalized Leontief production function that captures the different nature of the production process for companies producing physical goods and companies providing services. Firms with a NACE classification up to F43 are considered to having a physical production process and, hence, are more susceptible to production stops following from input shortages. For a detailed explanation of the use of generalized Leontief production functions in the ESRI definition, we refer to SI Text 4 and [18] .

We compute ESRI for every firm in the network and plot their values according to their rank, from highest ESRI to lowest, in Fig. 3b . This is called the systemic risk profile of the production network. The ESRI for the defaulting firm in panel a is highlighted as the red bar. Performing the same steps for all firms in one realization of the RSN yields the systemic risk profile shown in Fig. 3c , where we show the 200 riskiest firms. The profile shows similar characteristics to what has been reported for the exact production network of Hungary [18] , namely, a plateau containing the 65 most risky firms, which all, except for a few extremely risky firms, have a similar risk of around ESRI ≈ 0.21, followed by a sharp decline for firms that are not part of the plateau. The inset in Fig. 3c shows the cumulative distribution (CDF) p(ESRI > x) of the ESRI in log-log scale.

To take the stochastic nature of the RSN into account we repeat the ESRI calculation. We consider five realizations of the RSN to calculate their mean ESRI. Subsequently, due to computational challenges, we focus on the 1000 most risky firms only, after ranking them according to their mean ESRI. For those we repeat the ESRI calculation 100 times. For each node we get a distribution of ESRI values. Figure 3d shows the median ESRI for every firm as a solid line; the 25% and 75% quantiles are indicated by the errorbars. An alternative way to investigate the ESRI profile of the RSN is to plot the maximal systemic risk of every node. This method yields similar results and is shown in SI Fig. 9 in SI Text 5. The median ESRI per node profile in Fig. 3d shows the same characteristics as the single run in Fig. 3c , a plateau of high-risk firms and a rapid decline of ESRI outside of the Economic systemic risk vs. firm size measured as total assets in log-log scale. Marker size represents total assets; companies in the "plateau" of Fig. 3c are marked red and highlighted by red shading. For a given firm-size there is an obvious lower bound for the ESRI that corresponds to the firm-size (here no firm can loose less than all assets when it defaults). Although the correlation between size and ESRI is high, we find small and large firms in the "plateau", suggesting that firm-size is not a good tool to identify highsystemic risk firms.

plateau. In contrast to the single run ESRI profile, the plateau consists of only around 50 firms. The spread of the ESRI distributions for individual nodes is small for high-as well as low-risk nodes, indicating that the results are remarkably stable and robust. For the intermediate risk firms error-bars become large, indicating that their ESRI depends on the direction of one or few links. It is a well-known feature of systemic risk and the ESRI that single links or link directions can have a large influence [11, 18] . Ref. [18] explains that some nodes "inherit" systemic risk by being a crucial supplier to a firm that is inherently risky due to e.g. its size. Therefore flipping a link-direction can turn a node from a crucial supplier of a central firm to a buyer of that firm, which strongly reduces its inherited systemic risk.

To understand which firms are in the plateau of Fig.  3c , in Fig. 4 , we plot the firm-size, approximated by the firms' total assets against the ESRI. It is evident that the high systemic risk plateau (highlighted in red) contains large and small firms, with their total assets spanning more than 4 orders of magnitude. Although firm size correlates well with ESRI (Spearman's ρ = 0.87), it is not a good predictor for systemic risk since for a given firmsize the ESRI can vary by several orders of magnitude. A similar situation is described in [18] .

The 65 firms found in the high systemic risk plateau mainly belong to the manufacturing sector (NACE lvl. 1 category C, 77%), followed by companies in the electricity, gas stream and air conditioning supply (D, 8%) and financial and insurance activities (K, 6%) sectors. The full composition is listed in SI Tab. II. In contrast to the exact Hungarian production network [18] , several companies from non-manufacturing sectors (NACE ≥ 45) are found in the plateau. This is somewhat unexpected since they are associated with linear production functions (see SI Text 4), which causes their shock spreading behavior to be less extreme than for Leontief producers.

Robustness of results. Our study is subject to several limitations, in particular (i) the imperfect overlap of the two communication and supply-link layers, limiting the possible accuracy, (ii) the limited market coverage of the phone provider (resulting in limited agreement even if p(s|c) = 1), see SI Text 6, and (iii) errors originating from the network reconstruction uncertainties in the estimations of directions and weights.

To estimate the biases and errors introduced by these weaknesses, we perform several simulation studies. First, we generate a synthetic communication network based on the HSN and the probabilities to find a communication link, where a supply-link is present p(c|s), and where no supply-link is found p(c|¬s). From this synthetic communication network we then take a sample of nodes according to an estimated market share m of the data provider and calculate the induced subgraph comprised by links only between the sampled nodes.

Finally, following the procedure used on the empirical data, we reconstruct a supply network from this synthetic communication network and calculate the ESRI. We calculate Spearman's rank correlation coefficient, ρ, between the ESRI as calculated on the full, real HSN and on the reconstructed subgraph. After repeating these steps for 100 times with m = 1/3, p(c|s) = 0.21 and p(c|¬s) = 9.3 × 10 −5 , we find an average Spearman correlation of ρ(ESRI HSN , ESRI reconstr ) = 0.563 (6) . In SI Text 6 we address the shortcomings mentioned above one by one and discuss the expected magnitude of the introduced errors. We find that the most relevant effect is caused by the limited market share with a drop of correlation of ∆ ρ = 0.31, followed by the limited overlap, adding another, ∆ ρ = 0.13. The effects from network reconstruction reduce the correlation by only ∆ ρ = 0.0004, which is remarkably small. We calculate the probability that a node that is among the 0.1% riskiest nodes of the subsample is also among the riskiest 0.1% of all nodes and find 32.9(82)%. The probability that one of the top 0.1% of the subsample nodes is among the top 1% of the full network is 47.7(99)%. DISCUSSION We show that mobile phone metadata can be used to reasonably reconstruct the flow of goods between firms in an economy, i.e., the supply network. The reconstruction is possible because of the similarity of the communication-and the supply layer of the inter-firm network. This method is one of the very few alternatives to obtain a comprehensive view on national supply network, when there is no VAT or payment system data.

Based on the supply network we calculate economic systemic risk and find that a small core of about 65 high systemic risk firms have the potential to affect large parts of the economic activity. Apart from these core firms systemic risk of companies is generally small. These results agree well with the previous results for Hungary, where a core of 32 high systemic risk firms was found to contribute to 45% of the overall systemic risk [18] . With a series of robustness checks we demonstrate the reliability of the results.

Using a large-scale survey on the actual customersupplier relationships between companies, we find the probability of a supply link to exist, given an existing communication link as p(s ij |c ij ) ≈ 0.19. When thresholding for higher interaction strength of the communication relation p(s ij |c ij ) the conditional probability increases strongly to 92%. Note that the survey asked for the firms' most critical suppliers. It is almost certain that in the FCN we observe connections to suppliers that are perhaps important but were not classified as critical in the survey, causing p(s ij |c ij ) to be underestimated. Landline phones are still common practice in many firms; these communication links are not covered, thus further underestimating the overlap of communication and supply links.

We find that the degree exponents of the reconstructed supply network, α RSN k ≈ 2.18, and the exact Hungarian supply network, α HSN k ≈ 2.40, are similar; the degree exponent of the human-human communication network is much larger, α CN k ≈ 4.89. Also for the average nearest neighbor degree and the local clustering coefficient the topology of the RSN is more similar to the topology of the exact HSN than to the HCN.

We showed that the FCN and the HSN are most similar when thresholding communication strength to d ij > 30s/d. We sample supplier directions using external information on companies' industry sectors and from input-output tables. Link weights are estimated by the product of firm sizes. Future improvement of the reconstruction method could be reached by using additional information contained in the FCN, such as asymmetries in the calling behavior, temporal patterns in the sequence of calls, as well as using dependencies of supply link weights on communication intensity.

Because the reconstruction process is stochastic, we calculate an ensemble of systemic risk profiles and investigate the inter-quartile ranges. For high-and low-systemic risk firms the inter-quartile ranges are small, indicating that results are stable. However, for firms of intermediate systemic risk the inter-quartile ranges are relatively large, indicating that risk changes strongly with the direction of one or a few links. This agrees well with previous results for the Hungarian supply network, where approximately a third of the riskiest firms were found to constitute the periphery of the high systemic risk core. These firms 'inherit' the high systemic risk status of important firms by being critical suppliers to these firms [18] . Also in banking networks it is well known that individual links may dramatically increase systemic risk [11, 29] . Although the ESRI correlates strongly with firm size, the high systemic risk core is not predicted well by firm size.

The method has several limitations. We systematically investigate the error introduced by the imperfect overlap of the communication-and supply layers, the limited market share of the mobile phone provider, and the reconstruction of the link directions. In a simulation study we find an average rank correlation between the true ESRI in the HSN and the ESRI on a carefully simulated synthetic firm communication network of ρ(ESRI HSN , ESRI reconstr ) = 0.563. The limited market coverage and the imperfect link overlaps contribute most of the effect. We expect ρ(ESRI HSN , ESRI reconstr ) to be higher in reality since it is based on the estimate for p(s|c) that is a lower bound. Further, despite the limited correlation, our method allows us to capture heterogeneity in shock spreading well and uncovers the localized effects of upand downstream cascades on the firm level that traditional methods such as input-output models cannot describe.

There are also three limitations that could not be addressed explicitly. First, firms use many more communication channels than mobile phones such as landlines, e-mail or physical mail, and a growing number of new interaction channels, such as social media or online portals. Nevertheless, we assume that, if the supply relation is sufficiently strong, firms become more and more likely to use mobile phones to arrange spontaneous meetings, inform partners about delays, coordinate the quality, quantity and timing of deliveries, fix dates, provide support, etc.

Second, due to the anonymity of the telecommunication data it is not possible to perform targeted surveys on the customers of the phone provider. To reach significant overlap of the survey respondents and the customers of the phone provider, untargeted surveys need a response rate of a considerable fraction of firms within a country.

Third, another consequence of the anonymity of the data is that -by definition-firms cannot be identified and concrete policy statements can only be made on the level of the network. However, within the anonymity constraints, the effect of heterogeneous shocks in relation to economic sectors and geography can still be investigated. This is important since recent work has shown that heterogeneity in the initial economic shocks can cause dramatically different economic outcomes [17, 45] .

Since mobile phone data is easily available, the presented method to reconstruct a national production network is cheap, scalable, and easily implemented. The method also captures international links which allow us to identify economic exposures to specific countries. Maybe one of the most interesting features of the method is its temporal resolution, supply relations can be monitored in real-time. This offers the possibility to study how firm-ties form and rewire on the network-level. Observing the restructuring processes of the economy during natural disasters or economic crises take place might become relevant for acute crisis management.

Data. The anonymized (but fine-grained, devicelevel) call detail record (CDR) data is mapped to an anonymized ID for each company. The observation period is approximately 125 days in autumn 2020 between two lockdowns. The obtained edge list is aggregated for the whole observation period, grouped by each source/destination, anonymized firm ID tuple and the call duration (in seconds) for each arc is summed up. Further, node-level statistics i.e. the number of devices is aggregated. We also calculate a rough location as the centroid of the night-locations of the individual devices. The night-location was previously calculated as described in [46] for each device.

The firm communication dataset is merged with a commercially available business intelligence database that includes balance sheet information the industry classification in the NACE 2008 system [47] ). For details on the anonymization procedure see SI Text 7. For analysis, we drop NACE J61, J62, M70, and N82 to exclude businesses such as call-centers that have telephone activity at the center of their business and would confound the study with exceptionally high numbers of calls.

To compare the reconstructed supply network (RSN) with a real supply network we use a dataset based on granular VAT reporting in Hungary (HSN), described in [18, 31] . It contains a link between two firms only if at least two transactions occur in two different quarters. We use the data from 2017, where only transactions with a tax content larger than 1,000,000 Forint (approx. 3000€) are included. Hungarian VAT rates range from a 27% base rate to a 18% and 5% reduced tax rate for certain foods, pharmaceuticals, etc. and a 0% rate for public transport [48] . The calculations presented here are based on an unweighted version of the Hungarian production network.

We further compare the topology of the FCN with a human-to-human communication network (HCN) . To this end we use a dataset provided by the same phone provider. It contains CDRs of calls between individual mobile phones which are anonymized with a new key every 24 hours. For this reason we can only analyze the HCN of one day. We choose September 17, 2020, a Thursday during the observation period outside of the holiday season and before the winter lock-downs. On that day we find 144,516 active devices and 154,557 calls.

We use input-output tables containing information on how many intermediate goods or services were used for the overall production of a certain good in a national economy in a given year. We use the input-output table of 2017, it is the latest available of the country studied. It contains 64 sectors in the CPA classification (Classification of Products by Activity [49] ), which is harmonized with NACE 2008 on level 2.

Systemic risk. We define the relative output level of firm i at time t as h i (t) = xi(t) xi(0) , where x(t) is firm i's output at time t. Let an initial firm i default by setting h i (0) = 0. Subsequently the shock from firm i's default propagates downstream along the out-links by updating all other firms' output according to their production function

and upstream along the in-links by updating

At time T , the algorithm has converged and we define the vector h j (T ) = min(h d j (T ), h u j (T )) to calculate the economic systemic risk index as

where s i denotes the size of firm i. For more details on the algorithm and the definition of the production function, see SI Text 4.

We thank P. Klimek for inspiring discussions in the initial phase of the work. We thank the anonymous mobile phone operator for providing the communication data, the Hungarian National Bank for providing the Hungarian data. The project was supported by Austrian Science Fund FWF under I 3073-N32, Austrian Science Promotion Agency FFG under 857136, and Hochschuljubiläumsstiftung of the Austrian National Bank OeNB under P17795 2018-2021 To calculate conditional probabilities describing the overlap of the communication and supply layer, we use data from a survey conducted in the country of the FCN that was conducted between April 8 and 20, 2020. The survey asked questions on the general business results of the past half year, the outlook for the near future and how the firms expected to be affected by the COVID-19 crisis in the near future. Additionally, the survey contained a part asking for the ten most critical suppliers and ten most critical customers. From this survey we construct a supply network, that we can compare with the FCN. The survey was sent out to 102,386 companies; more than 5,955 firms replied to the supply network part, declaring more than 17,393 customer-supplier relations. To keep the privacy of the companies, the data is co-anonymized. This means that the metadata is made available to all parties prior to the data collection process and, subsequently, only anonymized data is made available to the researchers.

We quantify the overlap to find a link x ij , x ∈ {c, s}, in one layer when a link y ij , y ∈ {s, c}, in the other layer is present as the conditional probability p(x|y),

When comparing the results of the supply chain survey and the firm communication network there is an additional distinction between firms reporting in the survey and firms reported in the survey. Let's denote the buyer-supplier network as the set S containing all links s ij from reporting firm i to reported firm j. This means

where R is the set of reporting nodes, M the set of reported (mentioned) nodes and S the set of all nodes in the network. Please note that the sets R and M are not mutually exclusive; a reporting node can also be mentioned by another firm and, hence, be part of both sets. Let's denote the communication network as the set C of links c ij from firm i to firm j, where i, j ∈ C, C being the set of firms in the communication network.

We are interested in the conditional probability of finding a buyer-supplier relationship where there is a communication link. Formally p(s ij |c ij ). To estimate this, we need to perform a fair comparison and stratify for the fact that only a subset of all buyer-supplier relationships is known. Figure SI Fig. 5 illustrates the problem. The central (black) node has reported in the survey and mentioned 6 other nodes (grey). We can only observe links between black and grey nodes, not between two grey nodes, so to perform a fair comparison communication links between two grey nodes need to be excluded from the analysis. We defineC in analogy to the buyer-supplier network as the set of communication links from reporting nodes to mentioned nodes, thereby excluding links between i.e. mentioned nodes

Now, following Eq. (5) we can write

To establish indicators for the error of the survey we perform a bootstrap-like simulation. Our simulations corrects for the fact that the distribution of call durations d ij for the subsample in the survey is not representative of the full network. We start by sampling a synthetic supply network based on the empirical communication network using p(s|c) as found in the survey. We use averages of p(s|c) on bins of d ij . Then we draw subsamples of the sample size of the survey (N ∝ 200) . After repeating the process 1500 times, we calculate mean and quartiles and report them in main text Fig. 2b .

For p(c|s) we lack the true distribution of supply link strengths, so we perform a classic bootstrap. For a given conditional probability bin we draw samples of the same size with replacement. We repeat the process 1500 times, calculate mean and quartiles and report them in SI Fig. 6 . 

The overlap probabilities p(s|c) and p(c|s) increase when thresholded for call duration and/or firm sizes. Of course on the one hand, when thresholding, the information contained in low-intensity contacts (low call duration, small trade volumes) is lost, while on the other hand, link correlations between the communication and supply layers increase. To balance these two effects, we choose the threshold combination where the topology of the thresholded communication network is most similar to a real production network.

To determine when the topology of the thresholded FCN most resembles a real production network, we calculate the Kullback-Leibler divergence between the thresholded FCN and the HSN,

We systematically try threshold combinations for the average call duration per week d ij and the number of devices per firm N i , (d ij , N i ). As shown in SI Fig. 7a and 7b , the minimal Kullback-Leibler divergence is found for (30s/d, 0). 

Here we compare the network topologies of the FCN, HSN and HCN we discuss the behavior of p(k), k nn (k) and c(k) in greater detail. We begin by characterizing the topology of the firm communication network (FCN) . For reference we compare the results to a more directly observed supply network and to a human communication network and show that the topology of the FCN is similar to the known supply network and dissimilar to the social network.

Here we describe the network thresholded to only links between firms that have an average interaction duration of more than 30 seconds per day. For the comparison with a real supply network, we compare with the national supply network of Hungary that is obtained through VAT data for 2017 [18, 31] (henceforth HSN, short for "Hungarian supply network"). The network shows a link if the tax content of the goods exchanged between two firms exceeds 1,000,000 Forint (approx. 3,000 Euro) and if the link occurs in at least two distinct quarters. For details on the supply network data, see Materials and Methods and [31] . To investigate the difference between the inter-firm communication network and a social communication network (HCN) between humans we use a dataset on calls of individual devices by the same mobile phone provider. The data is for one day during the studied period because the IDs of individual devices are re-anonymized daily, preventing us from studying the communication network for longer than 24h. For details on the HCN we refer to Materials and Methods. In Fig. 2a we show the degree distribution p(k) of the three networks. In complex networks the degree distribution is often skewed to the right with a fat tail, corresponding to the fact that a few hubs are interacting with many nodes, while the majority of nodes interacts with only a few nodes. The FCN has an average degree of k F CN = 4.79. Its degree distribution has a maximum at k F CN = 2 and has a fat tail that can asymptotically be described by a power-law

which we fit with a maximum likelihood estimator using Python's powerlaw package [50] . We find α F CN k = 2.18(12) for k F CN > 30. The PNW does not show the increase for small k but also exhibits a fat tail that can be well fitted by the power law from Eq. 10 with α HSN The mixing patterns of a network have a strong influence on its structure and function. If high-degree nodes tend to interact with other high-degree nodes, the network is called assortative, if high-degree nodes are more likely to interact with low-degree nodes, the network is called disassortative. For supply networks disassortative mixing has been reported [1] ; for social networks we expect assortative mixing [51, 52] . To investigate the degree-degree correlations we plot the average nearest neighbor degree

where N i is the set of neighbor nodes of i. In Fig. 2b we plot k nn as a function of k for the FCN (orange), PNW (blue) and HCN (green). For the FCN k nn (k) shows an increase for small values below k < 10 and then shows decreasing trend for larger k. For very small values the network is assortative, for intermediate and large k the firm communication network is disassortative. For the PNW we find k nn to be relatively flat for small k < 10 and then decrease for large k, thereby showing that the PNW is disassortative for k > 10. This result agrees well with previous studies on the Japanese supply network, which was also shown to be disassortative. The HCN increases to values around k ≈ 30 and then decreases quickly, suggesting the existence of two regimes; a low-degree regime, where assortative mixing patterns dominate, and a high-degree regime, where disassortative mixing is predominant. For the mobile phone communication network of Shanghai very similar results were found. There the authors associate the assortative mixing for nodes with 'reasonable degree' to the social network of calls and the disassortative mixing of high-degree nodes with hotlines or robots [51] . This suggests, that for 'human' callers, the network is assortative. The local cohesiveness around one node is typically measured by the local clustering coefficient c i . It is defined as the number of closed triangles of node i with its neighbors, t i , divided by the number of possible triangles

.

The average local clustering coefficients c , along with their expected value for a random network p is shown in Tab. I. All clustering coefficients are comparably small, but still large compared to random networks. Figure 2c shows average c i as a function of degree c(k). For the FCN (orange) and the HSN (blue) c(k) shows a very similar decay. The HCN (green) does not show such a decay, but a decrease for values below k ≈ 20 and an increase for values between k ≈ 20 and k ≈ 40. This weak, peaked dependence of c on k for a mobile phone network was also reported in [51] . Here we describe how the ESRI is calculated. We keep our notation closely to the one of [18] . Given a supply network W , where W ij describes the value of products, of type p i , delivered from firm i to firm j. The vector p with element p i ∈ {1, 2, . . . , m} indicates the product type produced by firm i. We identify the product p i with a firm's industry affiliation. The amount of input k firm j uses is Π jk = n i=1 W ij δ pi,k . A firm's production function is an increasing function f i (Π i1 , . . . , Π im ) that describes how much firm i can produce with a given set of inputs Π i1 , . . . , Π im . Conversely it also allows to assess how much production drops if the amount Π ik of input type k is reduced.

The generalized Leontief production function is a generalization of the regular Leontief production functionwith functional form x i = min 1 αi1 Π i1 , 1 αi2 Π i2 -and a linear production function -with functional form

-and is defined as

where the set, I es i , denotes all input types k, that are deemed essential for production of firm i and thus entering the production in a Leontief way, the set I ne i denotes all input types k that enter the production of firm i in a linear way. The parameter α ik = n j=1 Wjiδ p j ,k n l=1 W il is the fraction of firm i's output that it spends on the input type k, the parameter α i = n j=1 Wji n l=1 W il is the overall fraction of output that is spend on all inputs and β i is another parameter inferred from the supply network and defined as the attainable production level if only essential inputs k ∈ I es i are available, i.e.,

We list the necessary equations to compute the ESRI. First, the following objects have to be defined. The downstream impact matrix Λ d defined by

and the elements of Λ d1 and Λ d2 are defined as

Similarly, the upstream impact matrix is defined by

Second, to calculate the ESRI of firm i the initial exogenous shock parameter is chosen to be ψ i = 0 and ψ j = 1 for the other firms j = i.

Third, the following equations are iteratively computed to update the downstream and upstream impeded production levels of firms at time point t.

1. Compute the dynamic intraindustry market share for each firm j σ j (t) = min s out

2. Compute, for essential inputs k ∈ I es i of firm i the fraction available as

3. Compute, for non-essential inputs, k ∈ I ne i , of firm i the fraction available as

4. Update for each firm i the relative production level reduced by downstream shocks

5. Update for each firm i the relative production level reduced by upstream shocks

The iteration continues until the algorithm reaches a stable state at time

with = 10 −2 as convergence threshold. Then the ESRI i of firm i is computed as

The quantity can be interpreted as the fraction of production in the network that is (temporarily) impeded if firm i fails (temporarily). For details of the derivation see [18] Appendix G. Note that the calculation is computationally intensive and scales badly, because with growing network the number of ESRI i to calculate grows linearly and the convergence times grows by 2 times matrix multiplication costs. For this reason we only show the 1000 most risky firms in Figure 3 (d).

We are not only interested in the median damage a company can do, but also in the scenario where the damage is largest. In SI Fig. 9 we plot the maximal ESRI per node of 100 reconstructed supply networks. Reconstructing 100 supply networks and calculating ESRI yields a distribution for every node. Supplementary Figure 9 shows the maximum of each distribution. Note that the maxima are not all from the same configuration. The maximal damage is max(ESRI) = 0.53 and the high systemic risk core consists of around 100 firms, with the majority of nodes having an ESRI around ESRI ≈ 0.25. Our study is subject to several limitations, in particular (i) the error due to faulty direction/weight estimation, (ii) the limited market coverage of the phone provider (resulting in limited agreement even if p(s|c) = p(c|s) = 1) and (iii) the imperfect overlap of the two networks limiting the possible accuracy. In the following, we address all of these shortcomings one by one and discuss the size of the introduced biases and errors. We end with a simulation to estimate the error introduced by all shortcomings combined.

We study the error introduced by sampling the link directions by simulating the reconstruction process on a real supply network topology and then validating it by comparison with the true network. We use the Hungarian supply network and start by aggregating the trade volumes to an input-output table. Then we remove the direction information from the network and sample new link directions according to the probabilities obtained from the inputoutput table and Eq. (1). We compare the true and simulated link directions and calculate which fraction was guessed correctly. Figure 10a shows results of repeating this experiment 1000 times. The mean overlap is 51.5% with a standard deviation of 0.1%. This result, although significantly better than what would be expected from random chance (red line in Fig. 10b) , is surprisingly low. It can be explained by investigating the probabilities associated with the links in Hungary. If most links were between sectors with a direction as polarized as the relationship between agriculture and the food industry, more links would be guessed correctly. However, as shown in SI Fig. 10b the majority of links has probabilities between 42% and 60% (the lower and upper quartile, shown as red lines). Nevertheless, even though many direct links are guessed incorrectly, on the sector level the proportion of links from sector i to sector j p ij = L ij /(L ij + L ji ) are captured well. We find an average Pearson correlation of the true, empirical sector wise link directions p emp ij and the simulated sector wise link directions p sim ij of r(p emp ij , p sim ij ) = 0.91(1), see also SI Fig. 10c . As in most countries, there is more than one mobile phone provider in the country where the mobile phone data is from. This results in a market share m less than one. As is schematically shown in Fig. 11a , this leads to a large fraction of links that are not accounted for, since we only consider calls between companies who are customers of the mobile phone provider. The graph containing only the links between a set of nodes in a network is called the induced subgraph of the respective set of nodes. To quantify the error introduced by limited coverage we use the real supply network of Hungary and compare the systemic risk as calculated on the full network with the systemic risk calculated on an induced subgraph. We investigate the effect of a market coverage of m = 1/4, m = 1/3 and m = 1/2 using the following steps. link where a supply link is present p(c|s) and where no supply link is present p(c|¬s). It is not possible to measure p(c|¬s) directly. Nevertheless we can compute it from known quantities p(c|¬s) = p(c) − p(c|s)p(s) 1 − p(s) .

We are interested in the effect on top of the error introduced by the incomplete market coverage, therefore we sample nodes according to a market share of m = 1/3 and calculate the induced subgraph. To isolate the effect, however, we keep the directions from the HSN and investigate their effect in the next step. The modified algorithm works as follows: Figure 11 show the results of 100 iterations assuming m = 1/3 and p(c|s) = 0.21. The mean Pearson correlation is ρ p(s|c) = 0.564(5), demonstrating a substantial shift of ∆ ρ = 0.13 compared to the result not including p(s|c) < 1.

Finally, we study the combined effect of the limitations described above, as shown in Fig. 11c we calculate the effects of a limited market share, imperfect link correlations and an inaccurate link direction estimation. We use the empirical network topology of Hungary, simulate a mobile phone network and then estimate the link directions. The process follows the steps below.

Financial stability and interacting networks of financial institutions and market infrastructures

The multi-layer network nature of systemic risk and its implications for the costs of financial crises

Systemic riskefficient asset allocations: Minimization of systemic risk as a network optimization problem

Monitoring indirect contagion

Debtrank analysis of financial distress propagation on a production network in

Firm-level propagation of shocks through supply-chain networks

Quantifying firm-level economic systemic risk from nation-wide supply networks

Measuring Systemic Risk, The Review of Financial Studies

May's instability in large economies

Rethinking the financial network, speech delivered at the Financial Student Association

The subprime credit crisis and contagion in financial markets

Teleconnected food supply shocks

Challenges and solutions for addressing critical shortage of supply chain for personal and protective equipment (PPE) arising from Coronavirus disease (COVID19) pandemic-Case study from the Republic of Ireland

COVID-19 and Supply Chain Disruption: Evidence from Food Markets in India

Elimination of systemic risk in financial networks by means of a systemic risk transaction tax

Incentivizing resilience in financial networks

What is the minimal systemic risk in financial exposure networks?

Supply network science: Emergence of a new perspective on a classical field

Unfolding the hidden structure of the Hungarian multi-layer firm network

Modeling Economic Networks with Firm-to-Firm Wire Transfers (2020)

The impact of supplier development on buyer-supplier performance

Interorganizational communication as a relational competency: Antecedents and performance outcomes in collaborative buyer-supplier relationships

A survey of results on mobile phone datasets analysis

Structure and tie strengths in mobile communication networks

Network diversity and economic development

Collective response of human populations to large-scale emergencies

Unveiling spatial epidemiology of hiv with mobile phone data

Christakis, Population flow drives spatio-temporal distribution of COVID-19 in China

Understanding individual human mobility patterns

Limits of predictability in human mobility

The universal visitation law of human mobility

The Gravity Model

Heterogeneity in initial shocks causes heterogeneity in outcomes

Country-wide Mobility Changes Observed Using Mobile Phone Data During COVID-19 Pandemic

establishing the statistical classification of economic activities NACE Revision 2 and amending Council Regulation (EEC) No 3037/90 as well as certain EC Regulations on specific statistical domains Text with EEA relevance

Taxation and Investment in Hungary (rates are updated to 2017) (PDF)

establishing a new statistical classification of products by activity (CPA) and repealing Council Regulation (EEC) No 3696/93 (Text with EEA relevance

powerlaw: a Python package for analysis of heavy-tailed distributions

A comparative analysis of the statistical properties of large mobile phone calling networks

Analysis of a large-scale weighted network of one-to-one human communication

Calculate ESRI for the full Hungarian network

Draw a sample of nodes according to market share m, calculate the induced subgraph

Correlate ESRI of induced subgraph with "true" ESRI of these nodes

Repeat from 2. and calculate the histogram of correlation coefficients

The next step is to quantify the error introduced by the imperfect link correlations, p(s|c) < 1 and p(c|s) < 1. Figure 11b illustrates the imperfect overlaps of the communication (broken blue lines) and the supply (solid orange lines) layers. We generate a fake communication network

Generate "fake" mobile phone network for Hun using p(c|s) and p(c|¬s)

Calculate ESRI on the simulated mobile phone network

Correlate ESRI of the simulated network with the "true" ESRI. 5. Repeat from 2. and make histogram of correlation coefficients

Calculate ESRI for the full Hungarian network

Generate"fake" mobile phone network for Hun using p(c|s) and p(c|¬s)

Draw a sample of nodes according to market share m, calculate induced subgraph

Reconstruct the directions using input-output tables

Calculate ESRI on induced subgraph of the simulated phone network

Correlate ESRI of induced subgraph with "true" ESRI of these nodes. 7. Repeat from 2. and make histogram of correlation coefficients

Compared to the previous simulation the estimation of the link directions adds only a small error δ r = 0.0004 to the final result

The firm communication dataset is merged with a commercially available business intelligence database that was made available to the mobile phone provider. This database includes balance sheet information from which we proxy the firm's sizes by their total assets, and on their industry classification in the NACE 2008 system [47] ). Because this information would potentially make it possible to identify individual firms, the merged data does not leave the premises of the phone company. All calculations were executed there and then anonymized. In this form the data was handed to us and was destroyed at the phone company after the project.To calculate conditional probabilities describing the overlap of the communication and supply layer, we perform a large survey to obtain ground truth data, for details see SI Text 1. To keep the privacy of the firms, the data is co-anonymized. This means that metadata is exchanged between the researchers and the mobile phone provider and a shared anonymization procedure is employed. Finally, only fully anonymized data is made available to the researchers.