key: cord-0669787-2idqtpta authors: Raimbault, Juste; Zdanowska, Natalia; Arcaute, Elsa title: Modeling growth of urban firm networks date: 2020-09-11 journal: nan DOI: nan sha: 5519e4fba32c09f03a2d6a7f2f2bc0b18a459a63 doc_id: 669787 cord_uid: 2idqtpta The emergence of interconnected urban networks is a crucial feature of globalisation processes. Understanding the drivers behind the growth of such networks - in particular urban firm networks -, is essential for the economic resilience of urban systems. We introduce in this paper a generative network model for firm networks at the urban area level including several complementary processes: the economic size of urban areas at origin and destination, industrial sector proximity between firms, the strength of links from the past, as well as the geographical and socio-cultural distance. An empirical network analysis on European firm ownership data confirms the relevance of each of these factors. We then simulate network growth for synthetic systems of cities, unveiling stylized facts such as a transition from a local to a global regime or a maximal integration achieved at an intermediate interaction range. We calibrate the model on the European network, outperforming statistical models and showing a strong role of path-dependency. Potential applications of the model include the study of mitigation policies to deal with exogenous shocks such as economic crisis or potential lockdowns of countries, which we illustrate with an application on stylized scenarios. Networks constitute the new social morphology of our societies, since the intensification of globalisation of the economy in the late 20th century [1] . A world-city system has since then emerged [2] , characterized by highly interconnected urban centres at the global level [3] . These interactions can be of different nature and determined by economic, socio-cultural, or geopolitical proximity inducing spatial and non-spatial patterns [4] . Interactions of economic nature between transnational firms [5] , are together with international trade patterns among the most representative of the current economic trends [2] . Understanding the drivers of growth and geographical properties of such an urban network, can be used to foster innovation in specific cities and to shape public policies for local industrial clusters [6] . In addition, straightening these city-interactions can be crucial for revitalising certain geographical areas [7] , by developing strategic industries for global integration of regions [8] , cities [9] and countries [10] . Exploring city/firm interactions [11] and the position of cities within these networks [9] can also permit to infer the consequences of future exogenous shocks such as an economic crisis, a post-Brexit scenario for the European Union or a single market collapse in the post-COVID19 future. The use of network models to understand processes underlying the emergence and dynamics of such networks has already been tackled in the literature, but still not extensively and never explicitly at city level [2] . In fact generative urban network models combining geographical factors with network topological properties exist at the regional scale [12] . Trade networks have been extensively studied from the complex networks perspective, but these are generally driven at the country-level [13] because of a lack of city-level data. Input-output models, mostly used in regional science to represent flows between geographical areas, are considered in some cases as a type of urban generative model [14] . In this stream of research, spatial interaction models [15] can form the basis of urban network models for different types of flows [16] or physical infrastructure [17] . In this paper we choose to focus on urban networks of firms to question and capture geographical and economic processes as internationalisation, metropolisation, regionalisation and specialisation of cities within a generative model. The specific question we tackle is the complementarity of different processes to generate such urban networks. The originality of the present paper is to examine these issues within the system of cities framework [18] . In this approach the position and dynamics of cities in the socio-economic world-wide space can be considered by their the data-driven setting. In this simple version of the model we assume an absence of evolution of economic size, which means that the adjusted parameters are valid on a restricted time period. When setting the model, it is important to consider that different initial conditions (or parametrisation on real data), given by the spatial distribution, the sizes and industrial sector compositions, are going to affect the behaviour of the model. One of the authors [24] has already shown that the initial spatial configuration can have significant impacts on model outcomes. We therefore consider two versions: one with a synthetic system of cities which can be stochastically repeated, and one other based on real data. Synthetic setup The synthetic system of cities is generated to be at the scale of a continent with properties similar to Europe: (i) N = 700 cities with an empirical power-law from the Global Human Settlements database for economic sizes (α = 1.1), distributed randomly into a uniform and isotropic square space of width d max = 3000km; (ii) clustered into C = 20 countries using a k-means algorithm (this remains consistent with non-correlated city sizes at the scale of the countries as our cities are randomly distributed in space [25] ); (iii) with synthetic distribution of sectors following log-normal distributions adjusted in a way that large cities have more high-value activities and are more diverse. More precisely for this last point, assuming sectors are distributed continuously along a one-dimensional axis, cities are assumed to have a sector distribution with a log-normal density function. The largest city is constrained to have a standard deviation and a mode of 1/2 and the smallest 1/K, with a linear interpolation between the two for other cities. Parameters of the log-normal law are obtained by solving the corresponding two equations system for each city, and each distribution is discretised and renormalised to be a vector of probabilities. See S2 Text for all details of this synthetic setup. In synthetic experiments, the network is generated from an initial complete network with dummy links of weight 1 until reaching t f = 1500, with a unit link volume of w 0 = 1. The real setup is done using the empirical network of ownership links. Urban areas are distributed according to their positions in the actual geographical space, and economic size and sector composition attributes are used. The socio-economic distance is constructed using information from the statistical modeling, by considering fixed-effect coefficients estimated for each pair of countries. See S2 Text for more details on the construction of the socio-economic distance matrix. The initial network is the same than in the synthetic setup, and the real links are taken as targets for the generated network. We constrain the total volume of flows by setting w 0 = ij w (obs) ij /t f when the number of time steps t f is given as a parameter. The model parameters are the Cobb-Douglas weights γ O , γ D , γ W , γ S and distance decay parameters d 0 , c 0 . As the model also includes path-dependencies by integrating the influence of previous links, it cannot be reduced to a simple statistical model. Finally, the role of space introduces even further complexity, yielding a generative model, which has to be simulated. We consider two main macroscopic indicators to quantify the generated network. Firstly, geographical structures are captured by internationalisation (modularity of countries in the network). The situation of a minimised modularity corresponds to dense connections between cities of different countries in Europe and sparse connections between cities of the same European country. In practice, modularity is computed as follows where d i is the weighted degree of city i in country c i . Secondly, we consider the metropolisation indicator as the correlation between the weighted degree and the GDP of the city. With this indicator we capture the effect of large cities or metropolises of regional importance attracting the biggest amount of links and involved in high interactions with other cities. It is computed with a Pearson correlation estimator as In this work, we do not consider the relationship between countries given their economic complexity, such as the framework developed by [26] , where the diversity of products produced plays an essential role in determining co-dependencies and resilience. What we investigate instead, is the role of linkages given by geography and history, and hence, the sector structure of cities does not evolve within the model. Companies from AMADEUS are georeferenced using the Geonames database (using postcode or address depending on the information available). We then join this data with the Global Human Settlement dataset GHS-FUA for Functional Urban Areas [27] -which delineate the commuting areas for urban centres. Starting from a firmlevel dataset of 1,562,862 nodes and 1,866,936 links, we obtain a directed network of 573 urban areas and 9323 ownership links. The weight of links is obtained by computing the owned share of turnover at destination, i.e. w ij = k∈i,l∈j p kl · T l where p kl is the ownership share of company l by company k and T l is the turnover of company l. Node attributes include the economic size of urban areas, that we compute as the sum of turnovers of companies within the area. This quantity is highly correlated with the GDP of the area-level included in the GHS database (Spearman correlation ρ = 0.71) and also with population (ρ = 0.60). We then expect to capture size effects while keeping the consistence of a single main data source. Another attribute of the nodes is the industrial sector composition, expressed as a proportion of turnover in the area associated to a given industrial sector. Detailed sectors of firms are available up to the four digits of the NACE classification in the raw data [28] . We consider the first level classification (21 categories) and compute sector proportions accordingly. This allows us to define an industrial proximity matrix between urban areas, taking a cosine similarity between vectors as s ij = K k=1 S ik · S jk where S ik are sector proportions. The urban network has heavy tail weighted degree and edge weight distributions, as shown in Fig. 1 . We fit power law and log-normal distributions, including minimal value cutoff following [29] , using the powerlaw R package [30] . For both, log-normal distributions appear to be a better fit (µ = 18.8, σ = 2.3 for weighted degree; µ = 13.8, σ = 2.8 for edge weight), and include a much broader range of empirical distributions with lower estimated cut-offs. These heavy tail properties suggest the relevance of a generative model including self-reinforcing processes, which are known to produce such distributions. When focusing on the community structure of the network, different modularity maximisation algorithms (ran on the undirected corresponding network) result in the same undirected modularity of 0.38, a directed modularity [31] of 0.34 for the greedy algorithm of [32] and of 0.36 for the Louvain algorithm [33] , corresponding to 36 communities Figure 2 : Map of network communities obtained by modularity maximisation. The area of circles gives the turnover within the Functional Urban Area, and color the community it belongs to. Urban Areas in communities of size smaller than 3 were not colored for readability. (resp. 15). In comparison, when taking into account 29 countries, the classification of urban areas by country has a modularity of 0.32. This indicates that the structure of flows present certain regional patterns. Indeed, a null model assigning random labels with 30 communities, bootstrap 1000 times, yields an average modularity of 0.049 ± 0.002, confirming the statistical significance of these communities. We show in Fig. 2 a map of the communities obtained. The communities reflect characteristic socio-cultural structures and cross-border economic ties throughout Europe. Cities in France, Belgium and Luxembourg belong to the same cluster as they share French as a common business language, but also cross-border tax arrangements creating easier economic interactions [34] . The same socio-cultural proximity effect is seen in cities of the United Kingdom and Ireland, and those of Sweden, Finland, Estonia, Latvia and Lithuania. The cluster composed of Dutch and Polish cities is justified by the fact that firms from the Netherlands are among the first foreign owners of polish firms since 2010 [20] . The cluster including cities in Germany, Czech Republic, Austria, Slovakia, Hungary and partly in Romania, find its origin in the dependence of the Central and Eastern European economies on German and Austrian large companies mostly from the automotive sector (i.e. Audi, Volkswagen). In addition, all cities of the later cluster have in common to be at the intersection of a very important freight transport route in Europe from Germany to Turkey [35] . In order to understand the most determining geo-economic factors influencing the creation of links between cities, we run a statistical analysis as applied in the case of unconstrained spatial interaction models [36] . We test five different statistical models, which progressively include the processes for the generative network model. The most general statistical one is: where α ci,cj is a fixed effect term for the coupling of countries c i , c j , and T i , T j correspond to the turnovers of cities i and j respectively. This multiplicative form linearly transformed through logarithms is common practice for spatial interaction models. Note that no constraints are considered, since links include information about the turnover and a share of ownership at the same time. Estimation results are given in Table 1 . We also fit a Poisson generalized linear model with a logarithmic link on the rounded weights [37] . The findings show that when distance only is taken into account (model 1), it has a significant negative effect on link creation, but with an explained variance close to zero. Model 2 adds fixed effects by pair of countries, for which around a third are significant, and with a better R 2 of 0.17. Adding the economic size at origin and destination (turnovers T i and T , model 3) slightly improves the explanatory power (model 3). The best model among the linear ones is the model 4, with all factors and fixed effects, both in terms of explained variance (R 2 = 0.31) and Akaike information criterion. Model 4 also indicates the absence of overfitting. Model 5, which is a Poisson model with all variables is much a better fit regarding pseudo-R 2 and provides much more significant estimates (all fixed effects significant and much smaller standard deviations). Across all models, consistent qualitative stylized facts regarding link creation between cities are revealed: (i) distance has a negative influence; (ii) economic sizes have a positive effect, with the size of the origin city being more important (consistent in the ownership links context, as strongly driven by decision at the headquarter/owner level); (iii) industrial proximity has a positive influence, i.e. similarity is more important than complementarity for this dimension; (iv) a large part of country fixed-effects are statistically significant; and (v) the order of magnitude of the influence of each factor is roughly the same, which means that no process totally dominates the others. This confirms the relevance of including all these different dimensions in the generative network model, since they have a comparable explanatory role in statistical models. Table 1 : Statistical models. Ordinary Least Square estimations of statistical models explaining log(w ij ) (standard errors in parenthesis). ***, ** and * respectively indicate 0.01, 0.05 and 0.1 levels of significance. For the fixed effect by origin-destination pairs of countries, the proportion of significant coefficients is attributed (where p < 0.1). Models (1) to (4) are ordinary least square models, while model (5) is a Poisson generalized linear model with logarithmic link. We compare model performance using R 2 , Mean Square Error, and Akaike Information Criterion. For the Poisson model (model 5), AIC is not given as it is not comparable. (1) The model is implemented in NetLogo, which provides a good compromise between performance and interactive prototyping and exploration [38] . Numerical experiments are done by integrating the model into the OpenMOLE software for model exploration [39] , which provides sensitivity analysis and calibration methods and a transparent access to high-performance computing infrastructures. The model source code, the aggregated network data (the raw initial firm network cannot be made available for data ownership reasons), and the results are available on the open Git repository of the project at https://github.com/JusteRaimbault/ABMCitiesFirms. Large simulation result files are available on the dataverse repository at https://doi.org/10.7910/DVN/UPX23S. We show in Fig. 3 examples of generated networks, with very contrasted patterns obtained with different spatial interaction ranges. Low values of the gravity decay parameter yield local networks around main cities, while long interaction ranges yield more integrated networks. We also show networks simulated in the real setting. We first study extensively the behavior of the model on synthetic systems of cities, to isolate its intrinsic behavior independently of a contingent geographical situation. Sensitivity analysis A global sensitivity analysis first unveils an aggregated influence of parameters on model indicators. We apply Saltelli's Global Sensitivity method introduced by [40] . This produces values for each parameter and each output of a first order sensitivity index, defined as V ar , for parameter X i and output Y j , X ∼i being all other parameters. This captures the total influence of X i all other things being equal. The total order index is given by and captures non-homogeneity in behavior and interaction between parameters. Indices were estimated with a Sobol sequence sampling of 20,000 model runs [41] . We give in S2 Text the full estimated results for sensitivity indices. To summarize, we find that indicators related to modularity (internationalisation and optimal modularity) are mostly influenced by geographical parameters. On the contrary, indicators of network structure are driven by origin and destination sizes. The influence of sectors and historical links has a secondary but not negligible role. Altogether, the analysis witnesses of strong interaction effects between parameters, since for several indicators the total order index is one order of magnitude larger than the first order index. This means that the generative model captures some complex behavior of the system. One factor sampling A first numerical experiment of one-factor sampling on all Cobb-Douglas parameters and 100 stochastic repetitions confirms a good statistical convergence (average Sharpe ratios for indicators all larger than 5). We use thus 20 repetitions in following larger experiments (in the case of a normal distribution, a 95% confidence interval on the average is of size 2σ · 1.96/ √ n, what gives n = 16 repetitions for a CI of width of the standard deviation). Behavior of some indicators are shown in Fig. 4 . For example, internationalisation and metropolisation both show a transition as a function of d o , from a local to a global regime. The sector proximity parameter γ S influences the internationalization linearly, but induces a maximum for the correlation between size and degree, which can be interpreted as a setting where size and the total volume in a city are the most correlated. The model behaviour is then studied with a grid sampling for parameters and 20 repetitions (the computations are run on the European Grid Infrastructure, and are equivalent to 2.5 years of CPU time). Distance decay and sector parameters are varied with 10 steps, other parameters with three. The influence of gravity decay parameters is confirmed when plotting the internationalization index against the distance, which exhibits an exponential decay as shown in Fig. 5 . It interacts with the role of sectors and the size of the origin, with a more significant effect of sector proximity, when size has the largest influence (third panel). Other indicators exhibit non-trivial patterns. For example, when considering average community size of the final network as shown in Fig. 6 , we obtain a maximal integration in terms of communities at an intermediate value of the gravity decay. This can be interpreted as the emergence of a regional regime. The size of communities is largely influenced by the value of the elasticities for the similarity function and the ratio of the economic output of the area. In particular, we observe a qualitative inversion of the role of γ S when introducing an effect of the origin (switching γ O from 0 to 1, first panel to second panel). The maximal community size disappears when γ O = 2, implying a regime with small local communities where large cities mostly connect with their hinterland. These experiments reveal a complex interplay between processes and how the model produces diverse stylised facts. We run a specific experiment to study the impact of urban hierarchy on firms dynamics. Rank-size law in cities is a central stylized fact within urban theories, despite the discussions regarding the estimation of the power-law exponent, its actual values and its variation depending on the definition of cities considered [42, 43] . In this particular context, understanding how this parameter can impact urban dynamics within artificial systems of cities -in which it can be modified to extreme values non observable in real systems-, should bring evidence on underlying processes [24] . We change the initial hierarchy α from 0.5 to 2.0 with a step of 0.1, and also vary the interaction range d 0 . The behavior of internationalisation and metropolisation indicators are Figure 3 : Example of simulated networks at t = 1500, for a synthetic system of cities (Top row) and the European urban network (Bottom row), with a high (resp. low) gravity decay parameter for the left column (resp. right). shown in Fig. 7 . As expected, the hierarchy of the urban system has an important impact on model outcomes. We find that more hierarchical urban systems tend to produce more international networks (internationalisation indicator is a modularity which is minimized when networks span more accross different countries), what is consistent with the existence of globalized metropolis [44] . This effect is mitigated by lower interaction ranges. Regarding metropolisation, we observe a maximum values as a function of initial hierarchy consistent across different interaction ranges d 0 . This means that a too high hierarchy is detrimental to the global metropolisation when considering all cities, what may be due to the fact that in such a case very large cities are capturing all flows, leaving nothing to intermediate and medium-sized cities. Interestingly, the optimal value α 1.3 is not far from observed values (which usually range between 0.9 and 1.1 but can take larger values such as 1.5 when extending the urban system [45] ), what may imply that this optimality property is endogenous to the emergence of urban systems and linked to their intrinsic hierarchy. We then apply the model to a real system of cities by calibrating it on the European ownership network. Model setup is done following the real setup described above. The number of time steps t f is left as an additional parameter, which in a sense determines the granularity of the cumulative network generation process. The objective functions for calibration are the average mean squared error on logarithms of weights given by ε L = 1 ifŵ ij are the simulated weights, and the logarithm of the mean squared error ε M = log 1 showed that these two are complementary to calibrate urban systems models. Calibration is done using a genetic NSGA2 algorithm [46] , which is particularly suited for such a bi-objective calibration of a simulation model. Stochasticity is taken into account by adding the estimation confidence as an additional objective, and we filter the final population to have at least 20 samples. We use a population of 200 individuals and run the algorithm for 50,000 generations, on 1000 islands in parallel. We show in Fig. 8 the calibration result, as a Pareto front of compromise points between the two objectives. As the mean squared error on logarithm is the same than the one for statistical models, it can be directly compared. The other objective would correspond to statistical models directly on variables: therefore, our model covers in a sense a set of intermediate models which make a compromise between the two objectives, and is therefore much more flexible. The values for ε L are for a large proportion of points lower than for statistical models, our model outperforms these in this predictive sense. As shown in Fig. 8 , increasing values for γ O lead to better performance regarding ε L : increasing the importance of the origin will provide a better fit on all order of magnitudes of link weights (while ε M favors mostly the largest links given the fat-tail distribution). When considering optimal points with ε L < 5.0, the adjusted parameters are: long range gravity interactions and socio-economic interactions given by d 0 = 5972 ± 2148 and c 0 = 113 ± 33 (the maximal c ij is 11.8); a stronger influence of destination than origin as γ O = 1.46 ± 0.10 and γ D = 2.0 ± 0.05; a comparable influence of sectors as γ S = 1.76 ± 0.21; a fine time granularity as t f = 4951 ± 124; and a very strong influence of previous links as γ W = 6.43 ± 4.11. This last point furthermore shows in particular a role of the path-dependency parameter, confirming the relevance of this complex generative model including path-dependency. The interaction of this process with others implies an inversion of the relative importance of origin and destination (on the contrary to statistical models). The fact that calibrated interaction distances are large is consistent with long range economic interactions within Europe. Besides these stylized facts, these calibration results altogether demonstrate the feasibility of model application to real data. We finally design scenarios to evaluate the impact of economic shocks. These can be the consequence of diverse factors, including socio-economic, geopolitical factors (example of Brexit), or other unpredicted human disasters such as the COVID-19 pandemic. We simulate border closures by rescaling c 0 -the range parameter for socio-economic interactions -, and intra-country lock-downs by rescaling d 0 -the range parameter for direct interactions -during the simulation, with other parameters being fixed to their default values and the model setup in the real configuration. More precisely, with t f = 5000, the model is run for t f /2 steps. Then, following two new rate parameters κ C ∈ [0; 1] and κ G ∈ [0; 1] defining the shock, model parameters are updated as d 0 = κ G · d 0 and c 0 = κ C · c 0 . The model is then run for the remaining time steps until t f . We study the temporal trajectory of indicators to see the impact of the shock in time. The experiment is run for (κ G , κ C ) ranging between 0.2 (strong restrictions) to 1 (reference trajectory), with 100 model repetitions. Results are shown in Fig. 9 . We find an important impact of the shock on internationalisation, but less on the average community size. The impact of changing d 0 is more important than c 0 , but a combination of both yield the stronger effect. For minimal values of κ G and κ C , internationalisation is more than doubled, confirming a reconfiguration of urban networks within countries. We validate the statistical significance of effects observed in Fig. 9 by comparing distributions with Kolmogorov-Smirnov (KS) tests. More precisely, we proceed for each indicator and value of (κ G , κ C ) to a two-sided KS test between the statistical distribution of the indicator over repetitions at final time with the distribution in the reference configuration. For internationalisation, p-values of the test are all smaller than 0.002 for κ D < 0.6 but larger than 0.15 for κ D ≥ 0.6. This means that the shock has a statistically significant impact only when interaction range is more than halved. For community size, significance (p-value smaller than 0.01) is reached only for the smallest value κ D = 0.2. Thus, the trajectory of the urban network is rather resilient to moderate shocks. This application shows how the model could be applied to practical policy issues. This paper aimed at presenting a generative model for urban networks defined by interactions between firms based on synthetic data, simulated via the OpenMole platform and calibrated on real data on ownership linkages of firms for Europe from the Amadeus dataset. The simulation on a synthetic system of cities provides a broad knowledge on model behavior in its parameter space. Even in such a simple model (close to directly tractable stationary state) the behaviour is highly non-linear in many dimensions. Our study shows how crucial model exploration is to overcome hidden parameters (deactivated mechanisms or default parameter values). This paper emphasis on the fact that exploration of intrinsic dynamics on synthetic data is of great importance and should be driven before an application on real data so as to put aside the geographical effects from model dynamics. The calibration of model suggests its potential application to policy issues and can therefore have practical applications in the future. For example the effect of exogenous shocks in the socio-economic structure can be tested as we did in the stylized application example above. Such shocks can for example be the upcoming impact of Brexit on the European market or the consequence of a long-term closure of the European Union members' borders due to further pandemic outbreaks on the single market. In this sense the UK economic sectors mostly dependent on foreign capital and generating the highest turnovers in 2018, that would be the most affected and worth examining in details are: extraction of crude petroleum, manufacture of tobacco products and of basic pharmaceutical goods (in terms of out-going links from the UK) ; activities of head offices and holding companies, sale of cars and manufacture of Figure 9 : Application of the calibrated model to an economic shock scenario. All plots show the trajectory of indicators (internationalisation for the top row and average community size for the bottom row) in time, for different values of interaction decay rescaling κ G (color) and socio-economic distance rescaling κ C (panels). The vertical dotted red line shows the moment the economic shock is introduced and d 0 (respectively c 0 ) is set to κ G · d 0 (resp. κ C · c 0 ). motor vehicles (in terms of in-going links to the UK). We showed in our experiment that although the system is resilient to moderate shocks, strong restrictions have an important impact on internationalization and thus can be expected to have a detrimental effect on UK economy due to these foreign ownership links (see also S1 Text). Several possible extensions of the current work can be undertaken in the future as a co-evolution model with an evolution of city sizes or a model with firm agents leading to a multi-scale agent-based model. A more precise study of the role of path dependency can also be undertaken in order to understand the creation of the future links. Other formulations of the model might be taken into consideration as other formulation of the combination of factors or multi-objective optimisation depending on sectors using Pareto fronts. Supporting information: S1 Text UK's economic sectors dependent on international links. Summary of links between UK and the rest of the EU, by economic sector. Based on AMADEUS dataset, the most vulnerable economic sectors in the United Kingdom have been highlighted. The later are the ones due to be mostly affected by eventual future lock-downs of the international economy (international ownership links). When considering the UK sectors generating the highest turnover in 2018 mostly dependent on foreign capital they own (out-going links from the UK -see below Table 2 ) these are extraction of crude petroleum, manufacture of tobacco products and of basic pharmaceutical goods. In terms of in-going links to the UK (see below Table 3 ), activities of head offices and holding companies, sale of cars and manufacture of motor vehicles are sectors mostly dependent on foreign ownership coming from abroad. When expressed in total number of links (Table 4 and Table 5 ), activities mostly dependent on out-going links from the UK are activities of holding companies, of head offices and fund management activities. In terms of in-going links these are business and consultancy activities and renting and operating of own and leased real estate. Other business support service activities n.e.c. 567 3 Activities of head offices 503 4 Fund management activities 407 5 Business and other management consultancy activities 398 6 Extraction of crude petroleum 366 7 Other activities auxiliary to financial services, except insurance and pension funding 240 8 Development of building projects 221 9 Manufacture of basic pharmaceutical products 212 10 Other information technology and computer service activities 181 Other business support service activities n.e.c. 16,140 2 Renting and operating of own or leased real estate 6,320 3 Business and other management consultancy activities 6,094 4 Activities of head offices 5,628 5 Computer consultancy activities 5,047 6 Activities of holding companies 4,728 7 Other personal service activities n.e.c. 3,996 8 Development of building projects 3,946 9 Other information technology and computer service activities 3,757 10 Other financial service activities, except insurance and pension funding 3,321 Model setup and sensitivity analysis. Details on the simulation model: summary statistics for the socioeconomic distance matrix, procedure for the synthetic model setup and detailed Global Sensitivity Analysis. In the real model setup, socio-economic distance between countries c ij is constructed using fixed effects coefficients from the statistical model. In Table 1 in main text, we estimated several statistical models, including some with fixed effects by pairs of countries. The socio-cultural distance between two countries is then taken as the opposite of the fixed effect coefficient for this pair, for the full statistical model. Summary statistics for the 29x29 matrix (28 EU countries and Jersey which is in the database a distinct country from UK but has an important total turnover since many British companies are located there for fiscal advantage reasons) are shown below in Table 6 . We observe that distances are rather localized but with large outliers, and an important proportion are not defined (339 out of 841), corresponding to couple of countries having no exchange at all in the dataset. The procedure for the synthetic sector setup is the following: • Sectors distributions follow log-normal densities with most mass in [0; 1] • Large cities are more innovative and more diverse. Assuming that sectors are ordered by innovativity (the larger the sector index the larger the innovativity), this assumption is translated by taking a mode and variance of 0.5 for the largest city and a mode and variance of 1/K for the smallest, and a linear interpolation between the two. For each city, we define log(e i ) = (log(E i ) − log(E imin )) (log(E imax ) − log(E imin )) * (1/2 − 1/K) + 1/K • Log-normal parameters (µ i , σ i ) for each city are then fixed by µ i − σ 2 i = log(e i ) and −3σ 2 i − 2 log(exp(σ 2 i ) − 1) = log(e i ) The table 7 gives the full results for the Global Sensitivity Analysis, for all model indicators and free parameters. We give the first order indices and the total indices. Table 7 : Saltelli sensitivity indices, for indicators in rows and parameters in columns. We give for each pair the first order index (F) and the total order index (T). T F T F T F T F T F T Internationalisation 0 The Rise of the Network Society Specification of the world city network Global city clusters: theorizing spatial and non-spatial proximity in inter-urban firm networks Central flow theory: Comparative connectivities in the world-city network Structure and evolution of global cluster networks: evidence from the aerospace industry Connecting cities, revitalizing regions: the centrality of cities to regional development Creating strategic couplings in global production networks: regional institutions and lead firm investment in the Humber region Relational upgrading in global value networks The brokerage role of small states and territories in global corporate networks Introducing cluster heatmaps to explore city/firm interactions in world cities. Computers, Environment and Urban Systems Simulating infrastructure networks in the Yangtze River Delta (China) using generative urban network models. Belgeo Revue belge de géographie Structure and evolution of the world trade network Generation of integrated multispatial input-output models of cities (GIMIMoC) I: Initial stage A multilevel spatial interaction modelling framework for estimating interregional migration in Europe Generative network models for simulating urban networks, the case of inter-city transport network in Southeast Asia Environment and Planning B: Urban Analytics and City Science Cities as systems within systems of cities. Papers in regional science An Evolutionary Theory of Urban Systems Exploring cities of Central and Eastern Europe within transnational company networks: the core-periphery effect Van Dijk Amadeus database: extraction for Europe. University College London Library Service Databases The nested structure of urban business clusters A geometric perspective on the generalized Cobb-Douglas production functions Space Matters: Extending Sensitivity Analysis to Initial Spatial Conditions in Geosimulation Models Testing Heaps law for cities using administrative and gridded population data sets The product space conditions the development of nations GHS Urban Centre Database 2015, multitemporal and multidimensional attributes, R2019A. European Commission 2 Statistical classification of economic activities in the European community Power-law distributions in empirical data Fitting Heavy Tailed Distributions: The poweRlaw Package Extending the definition of modularity to directed graphs with overlapping communities Finding community structure in very large networks Fast unfolding of communities in large networks Exploring cross-border integration in Europe: How do populations cross borders and perceive their neighbours? Distribution of Foreign Direct Investment Across the National Urban Systems in Countries of Central and Eastern Europe in 2013 Some new forms of spatial interaction model: A review Fitting constrained Poisson regression models to interurban migration flows Improving execution speed of models implemented in NetLogo a workflow engine specifically tailored for the distributed exploration of simulation models Global sensitivity analysis: the primer Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index A dynamic meta-analysis of city size distributions. PloS one Truncated lognormal distributions and scaling in the size of naturally defined population clusters The global city Empowering Urban Governance through Urban Science: Multi-Scale Dynamics of Urban Systems Worldwide A fast and elitist multiobjective genetic algorithm: NSGA-II Results obtained in this paper were computed on the vo.complex-system.eu virtual organization of the European Grid Infrastructure (http://www.egi.eu). We thank the European Grid Infrastructure and its supporting National Grid Initiatives (France-Grilles in particular) for providing the technical support and infrastructure. We also acknowledge the funding of the EPSRC grant EP/M023583/1.