key: cord-0649587-dzpnj0m6 authors: Bailey, Michael; Johnston, Drew; Kuchler, Theresa; Russel, Dominic; State, Bogdan; Stroebel, Johannes title: Online Appendix&Additional Results for The Determinants of Social Connectedness in Europe date: 2020-07-23 journal: nan DOI: nan sha: a33148e4321e94addefff90f4448f41acd794840 doc_id: 649587 cord_uid: dzpnj0m6 In this online appendix we provide additional information and analyses to support"The Determinants of Social Connectedness in Europe."We include a number of case studies illustrating how language, history, and other factors have shaped European social networks. We also look at the effects of social connectedness. Our results provide empirical support for theoretical models that suggest social networks play an important role in individuals' travel decisions. We study variation in the degree of connectedness of regions to other European countries, finding a negative correlation between Euroscepticism and greater levels of international connection. The previous examples suggest that social connections are substantially stronger within countries than between countries, but that there is also substantial within-country variation based on certain regional characteristics. We next seek to understand how these patterns would be reflected if we created communities of regions with strong connections to each other. To do so, we create clusters that maximize within-cluster pairwise social connectedness using hierarchical agglomerative linkage clustering. 1 Figure 6 shows the results when we use this algorithm to group Europe into 20 and 50 communities, instead of the 37 existing countries. In Panel A, the 20 unit map, nearly all of the community borders (denoted by a change in area color) line up with country borders (denoted by large black lines), consistent with strong intra-country social connectedness. The only exception is in the United Kingdom, where one region, Outer London West and North West, is grouped together with Romania. This area of London has welcomed a large number of Romanian immigrants in recent years and includes Burnt Oak, a community nicknamed "Little Romania" (Harrison, 2018) . Furthermore, the borders of cross-country communities mostly line up with historical political borders. For example, every region in the countries that made up Yugoslavia until the early 1990s for which we have data-Slovenia, Croatia, Serbia, North Macedonia, and Montenegro-are grouped together in one community. The same is true of the regions in the Czech Republic and Slovakia, which were united as Czechoslovakia until 1993. Other cross-country communities line up with even older historical borders: the United Kingdom and Ireland, Denmark and Iceland, Sweden and Norway, and Austria and Hungary all split from political unions in the first half of the 20th century but remain grouped together by present-day social connectedness. While much of the alignment between country and community borders persists in the 50 unit map in Panel B of Figure 6 , countries begin to break apart internally. Most of these resulting sub-country communities are spatially contiguous, consistent with distance playing a strong role in social connectedness within countries. One notable exception is Île-de-France, which is home to Paris and accounts for nearly 30% of French GDP. The region is grouped with France's southeastern coast (often referred to as French Riviera) and Corsica, popular vacation destinations for the well-heeled. We also see linguistic communities form: Belgium splits into a French and a Dutch speaking community, and Catalan and Andalusian Spanish communities emerge in Spain. The southeastern corner of Turkey, which has a much higher concentration of Kurds than the rest of the country, also forms its own community. Finally, historical country borders remain influential: the regions of former Czechoslovakia, former Yugoslavia, and Denmark and Iceland still form single communities, while former East Germany forms its own community. 1 There are a number of possible algorithms that could construct such clusters. Conceptually, the agglomerative clustering algorithm starts by considering each of the N countries as a separate community of size one. It then iteratively combines clusters that are "closest" together. Here, our measure of distance is 1/SocialConnectedness i,j . The distance between clusters is the average of the pairwise distances between regions in the clusters. Social connectedness between two regions may be related to certain economic and social interactions. For example, Bailey et al. (2018b) documents correlations between social connectedness, trade flows, patent citations, and migration patterns between U.S. counties. Bailey et al. (2019a) looks more specifically at transportation, highlighting the relationship between transportation infrastructure and urban social networks. In this section, we look at the relationship between social connectedness and European passenger train flows. Our unique data allow us to add a continental-level empirical analysis to an existing literature of theoretical and survey-based studies that explore the influence of social networks on travel decisions (for an overview, see Kim et al., 2018) . Regional Data on European Travel. For our dependent variable, we use the number of passenger train trips between each regional pair i and j. We discuss the availability of these data, as well as our procedure for cleaning and standardization, in Appendix 5.4. We also use information on rail and drive travel times between region geographic centers. Rail travel times come from the European Transport Information System 2010 "observed" data. 2 Drive times were generated using the Open Source Routing Machine, an OpenStreetMap-based routing service. 3 As the outcome variable is non-negative and contains many zeros, we follow Correia et al. (2019) and estimate a Poisson Pseudo-Maximum Likelihood regression model. Specifically, our equation of interest is: Here, the definitions of log(SocialConnectedness ij ), X ij , ψ i , and ψ j remain unchanged from Equation 2 in the primary paper. The variable PassengerTrain ij is the number of rail passengers that travel from region i to region j in a given year. The vector for "distance," represented by D ij , includes the geographic distance, as well as the rail and driving travel times in minutes between the central points of the regions. Table 1 shows the results from Regression 1. Due to differences in data availability between years, we present results using the most recent year for which passenger train flows for a given i, j pair is available (in columns 1-5) and, separately, results using 2015, 2010, and 2005 (columns 6, 7, and 8). Column 1 includes the log of SocialConnectedness ij and the NUTS2 region fixed effects as the only explanatory variables. We find that a 10% increase in connectedness between a pair of regions is associated with a 17% increase in passenger rail traffic between the two. Column 2 adds the log of the geographic distance between the regions, which, intuitively, has a negative relationship with the number of passenger train trips. In column 3, we add controls for the travel time, separately by train and car, between the central Note: Table shows results from Regression 1. The unit of observation is a NUTS2 region pair. The dependent variable in all columns is the number of passenger rail trips in 2015, 2010, or 2005 from region i to region j. In columns 1-5, we use the most recent year for which these data are available for a given i, j pair. In columns 6, 7, and 8, we use only the 2015, 2010, and 2005 data, respectively. Column 1 shows the results from using the log of SocialConnectedness ij and NUTS2 region fixed effects as the only explanatory variables. Column 2 adds the log of the geographic distance between the regions. Column 3 adds the log of the travel time, by train and car, between the central points of the regions. Column 4 adds all country-pair fixed effects. Columns 5-8 add the demographic/soceioeconomic controls from Table 1 in the primary paper. Observations that are fully explained by fixed effects are dropped before the PPML estimation. Standard errors are double clustered by each region i and region j in a region pair. Significance levels: *(p<0.10), **(p<0.05), ***(p<0.01). points of the regions. In general, we find that these travel times have a stronger negative relationship with passenger rail traffic than distance alone. Column 4 adds all country pair fixed effects; column 5 adds all the demographic/soceioeconomic controls from Table 1 in the primary paper; and columns 6, 7, and 8 repeat these analyses for the years 2015, 2010, and 2005, respectively. Adjusting for geographic distance, travel time by car and train, and country fixed effects, we find that, on average, a pair of regions with 10% higher social connectedness will have between 11.7 and 13.6% more rail passengers travel between them. Overall, our results provide large-scale empirical evidence to support existing models that suggest social networks play an important role in individuals' travel decisions (see e.g., Axhausen, 2008; Carrasco and Miller, 2009; PÃąez and Scott, 2007) . A central goal of the European Union is to enhance cohesion and solidarity across European countries, and a variety of programs, such as the Erasmus exchange student program, explicitly exist to foster this connectivity (European Union; European Commission). However, in recent years there has been a decline in trust in the EU and a rise of anti-EU voting, that have lead to, for example, the United Kingdom's 2016 vote to exit the European Union. A number of studies explore correlates with, and potential origins of, support for Eurosceptic political parties (for example Algan et al., 2017; Becker et al., 2017; Colantone and Stanig, 2018; Inglehart and Norris, 2016) . While much of this research emphasizes either economic insecurity or a cultural backlash against multiculturalism/progressive values, a related strand of research explores the role of personal connections in shaping political preferences. Early evidence was provided by Lazarsfeld et al. (1944) , who documents the influence of friends on U.S. voters. More recently, McLaren (2003) To systematically explore the relationship between regional views on the European Union and international friendships, we use the equation: Here, Share_EU_View i is either the share of individuals responding that they trust the European Union in the Eurobarometer survey or the average share of the electorate that voted for Anti-EU parties between 2009 and 2017. Our unit of observation for this regression is the level at which we observe these two outcomes of interest. For both data sets, this includes a mixture of NUTS1 and NUTS2 regions. The first explanatory variable, Share_Connections_Foreign i , is the share of the region's European connections that are to individuals in foreign countries. X i is a set of regional socioeconomic characteristics. These are average income, unemployment rate, and the shares of employment in manufacturing, construction, and professional sectors. They may also include the share of residents living in the region who are born in other European countries. The controls are indicators based on the deciles of each measure, to capture Note: Table shows results from Regression 2. The dependent variable in columns 1-4 is the share of individuals responding that they trust the European Union in a survey conducted for the European Commission. The dependent variable in columns 5-8 is the average share of the electorate the voted for "Anti-EU" parties between 2009 and 2017. The unit of observation is the level at which we observe each dependent variable (either a NUTS2 or NUTS1 region). Columns 1 and 5 include only one explanatory variable: the share of the region's European connections that are to individuals in foreign countries. Columns 2 and 6 add a set of demographic and economic controls. Columns 3 and 7 add controls for the share of the region's population that was born in a different country. Columns 4 and 8 add country-level fixed effects. Significance levels: *(p<0.10), **(p<0.05), ***(p<0.01). any non-linear relationships. Some specifications include country fixed effects, denoted here by ζ c(i) . Table 2 and fifth columns show that when adding the socioeconomic controls listed above, the magnitude of the relationship between the trust in the EU decreases and becomes insignificant, but the magnitude of the relationship with anti-EU voting slightly increases. When we also add the share of the population that is born in other European countries (columns 3 and 7), the magnitude of the relationship with trust increases and the magnitude of the relationship with voting decreases. There is a complicated relationship between demographic and socioeconomic factors, and views on the European Union; however, the directionally persistent relationship between Euroscepticism and international social connectedness at the regional level suggest foreign connections might play a role in shaping views on the EU, a result that would be consistent with the existing literature on personal connections impacting political preferences. Columns 4 and 8 show that, after adding country-level fixed effects, increases in a region's foreign connections continue to imply a decrease in the share of residents that trust the EU, while th effect on the share that vote for anti-EU parties becomes insignificant. This suggests that country-specific factors (e.g. the organizational resources of anti-EU political parties within the country) dominate possible effects of international connections in shaping Eurosceptic voting patterns, but not in shaping individuals' views. Establishing a Facebook friendship link requires the consent of both individuals, and the total number of friends for a person is limited to 5,000. As a result, Facebook connections are primarily between real-world acquaintances. Indeed, one independent survey of American Facebook users revealed that only 39% of users reported being Facebook friends with someone that they have never met in person (Duggan et al., 2015) . In contrast, Facebook users reported that they were generally Facebook friends with individuals for which they have real-world connections: 93% said they were Facebook friends with family other than parents or children, 91% said they were connected with current friends, 87% said they were connected to past friends (such as former classmates), and 58% said they were connected to work colleagues (Duggan et al., 2015) . As a result, networks formed on Facebook more closely resemble realworld social networks than those on other online platforms, such as Twitter, where uni-directional links to non-acquaintances, such as celebrities, are common. In prior work Facebook friendships have been shown to be useful to describe real world networks. For example, social connectedness as measured through Facebook friendship links is strongly related to patterns of COVID-19 spread (Kuchler et al., 2020b) , international trade (Bailey et al., 2020) , and investment decisions (Kuchler et al., 2020a) . See Bailey et al. (2018c Bailey et al. ( ,a,b, 2019a for additional discussion of the evidence that friendships observed on Facebook serve as a good proxy for real-world social connections. As one way of better understanding the connections underlying our measure of Social Connectedness in the primary paper, we compare it to similar measures constructed using restricted sets of friendships. Table 3 presents the cross-correlation of measures limited to connections made during certain periods of time (e.g., recent friendships vs old friendships) and to friendships between individuals with certain shared characteristics (e.g., ages less than 5 years apart). Each of these measures is highly correlated with the others. This provides more evidence that Facebook connections resemble full real-world networks and not, for example, primarily recently formed online-only connections. Information on demographic and socioeconomic characteristics of each region, such as educational attainment, median age, average income, and unemployment rate, is available from Eurostat. We calculate a measure of region-to-region industrial composition similarity using employment data from Eurostat's Structural Business Statistics series. For regional data on language and religion, we use the European Social Survey. In particular, the survey asks respondents which language they speak most often at home and -if the respondent considers him-or herself religious -their religious affiliation (European Social Survey). 6 Our analysis focuses on the regional pairs within the smaller set of countries for which we Note: Table presents correlations between social connectedness and similar measures constructed from restricted sets of connections. Rows and Columns 2-5 limit to connections made less than a year ago, between 1 and 5 years ago, less than five years ago, and more than 5 years ago. Rows and Columns 5-8 limit to connections between females, males, and individuals with ages 5 or fewer years apart. have a full set of control data. 7 Our analyses in Section 3 use information on region-to-region passenger train travel made available by updates. We take a number of steps to clean the data to prepare it for our analyses. This process was informed by both the "Reference Manual on Rail transport statistics" (Eurostat) and correspondence with the Eurostat data providers. We first restrict our data to observations at the NUTS2 level, removing any country-level observations (we do, however, keep country-level data for countries which have only a single NUTS2 region). We also exclude all pairs that include a region with the unknown indicator "XX" or the extraregio territory indicator "ZZ." These pairs make up around 1.3% of passenger trips in the data. From here, we are faced with four challenges: 1) As confirmed by the authors' correspondence with Eurostat, when the data appear as "non-available" in a particular row this could mean either that there was no traffic Denmark (Wave 7, 2014), Albania, Bulgaria, Cyprus and Slovakia (Wave 6, 2012), Croatia and Greece (Wave 5, 2010), and Latvia and Romania (Wave 4, 2008) . In addition, Malta was not surveyed but is comprised of a single NUTS2 region. According to other survey data, 97% of the population considers Maltese their "mother tongue" (European Comission, a) and 95% identifies as Roman Catholic (Sansone, 2018) . We include these as the most common language and religion. 7 We do not include Albania, Iceland, Lichtenstein, Luxembourg, Montenegro, North Macedonia, Serbia, Switzerland, and Turkey; most of these countries are excluded because they are not part of the European Union, and therefore do not participate in many of the data collection efforts we use to construct our data. or that the relevant country did not provide the data. 8 2) There are a number of hypothetical regional pairs missing, even between countries that did report data elsewhere. 3) For some international regional pairs, there are data reported from both countries on the same train flows, and often the number of passengers does not match. 4) NUTS2 classifications changed in 2006, 2010, and 2013 . Each year of data is reported using the NUTS2 classification that was relevant at that particular point in time. With respect to challenges 1 and 2, we found that each country reports data to Eurostat in two intermediate data sets: one for domestic passenger travel and another for international passenger travel. To identify countries that submitted a particular set of data in a particular year, we group the data by the reporting country, year, and whether the region pair is international or domestic. We then generate a list of countries that had at least one non-missing entry in each year/domestic-international group. These lists are provided in Table 4 . When "non-availble" values are reported by a country that did not report data elsewhere in the year/domestic-international group, we treat this as missing and exclude it. When "non-available" values are reported by a country that did report data elsewhere in the group, we treat this value as a zero (no traffic). Additionally, for countries that reported data in a particular group, we fill any missing regional pairs (i.e. pairs that are not in the data) in the group with zeros. Together, these assumptions handle challenges 1 and 2. For each international regional pair, there still remains two possible reports: one from each of the regions' home countries in the pair. In instances when only one country reports the data, we take the non-missing value from the reporting country. However, there are a number of instances when each country reports data for the same international regional pair (challenge 3). In these instances, we take the average of the two reports. Finally, to update the data to the 2016 NUTS2 regions (challenge 4) we build a crosswalk using the history of NUTS information provided by Eurostat. 9 In instances when an older NUTS2 region split into multiple new regions, we set the number of passengers in each row that includes a new region equal to the corresponding old row's number of passengers, multiplied by the new region's population share of the old region population (i.e. we assume that passenger train travel in each of these regions is proportional to population). 8 In some instances, countries report the data to Eurostat, but flag them as confidential so that they are not included in the public release. We always treat these data as missing in our final analysis. 9 Available at: https://ec.europa.eu/eurostat/web/nuts/history Note: Table shows the regional passenger train travel data availability by reporting country, year, and whether the travel is domestic or international. 0s indicate the data were not available and 1s indicate the data were available. Reporting country is given by the two-letter prefix of each country's NUTS codes. The table only shows whether any data from a particular reporter were available, not whether any regions from this country are included in the final analysis. For example, although Austria did not report international data in 2015, pairs that include an Austrian region and a region in a country that did report international data in 2015 are included. Analyses in our primary paper use information on the country that each modern NUTS2 region was a part of in the years 1900, 1930, 1960, and 1990 . The maps in this appendix show these country classifications for each year. The data largely come from files provided by the Max Planck Institute for Demographic Research Population History GIS Collection (MPIDR and CGG). In cases when a modern region spans two historical countries, we classify the region as part of the country for which it had a greater land area overlap. The european trust crisis and the rise of populism Friendship networks and political opinions: A natural experiment among future french politicians Social networks, mobility biographies, and travel: Survey challenges. Environment and Planning B: Planning and Design The germany-turkey migration corridor: Refitting policies for a transnational age The economic effects of social networks: Evidence from the housing market Social connectedness: Measurements, determinants, and effects House Price Beliefs And Mortgage Leverage Choice. The Review of Economic Studies Peer effects in product adoption International trade and social connectedness Who voted for Brexit? A comprehensive districtlevel analysis The social dimension in action: A multilevel, personal networks model of social activity frequency between individuals Global competition and brexit Ppmlhdfe: Fast poisson estimation with highdimenstional fixed effects Social media update Europeans and their languages Flash eurobarometer 472: Public opinion in the eu regions Data file edition 1.0. NSD -Norwegian Centre for Research Data, Norway -Data Archive and distributor of ESS data for ESS ERIC The eu in brief: Goals and values of the eu Reference manual on rail transport statistics version 10 Inside little bucharest: How romanian immigration changed this small london suburb beyond recognition Trump, brexit, and the rise of populism: Economic have-nots and cultural backlash Social networks, social influence and activitytravel behaviour: a review of models and empirical evidence Social proximity to capital: Implications for investors and firms The geographic spread of covid-19 correlates with structure of social networks as measured by facebook The People's Choice: How the Voter Makes up His Mind in a Presidential Campaign Anti-immigrant prejudice in europe: Contact, threat perception, and preferences for the exclusion of migrants Social influence on travel behavior: A simulation example of the decision to telecommute Maltatoday survey: Maltese identity still very much rooted in catholicisim Why are croatians moving to ireland? Croatia Week Number of croatians moving to ireland increase tenfold. The Dubrovnik Times