key: cord-0324724-0pfks19m authors: Goglia, D.; Pollacci, L.; Department, A. Sirbu Computer Science; Pisa, University of; Pisa,; Italy, title: Dataset of Multi-aspect Integrated Migration Indicators date: 2022-04-26 journal: nan DOI: 10.5281/zenodo.6500885 sha: 682a3c27360a5b8d076f1d1d3bac1133ad5ff7a6 doc_id: 324724 cord_uid: 0pfks19m Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about the human mobility framework. In this context we presents the Multi-aspect Integrated Migration Indicators (MIMI) dataset, an new dataset of migration drivers, resulting from the process of acquisition, transformation and merge of both official data about international flows and stocks and original indicators not typically used in migration studies, such as online social networks. This work describes the process of gathering, embedding and merging traditional and novel features, resulting in this new multidisciplinary dataset that we believe could significantly contribute to nowcast to forecast both present and future bilateral migration trends. In the last years the pursuit of original drivers and measures is becoming an increasing requirement to migration studies, considering the new methods and technologies used to characterize and understand human migration phenomenon. Many researchers [35, 7, 2, 10] have proposed to employ non-traditional data sources to study migration trends, including so-called social Big Data such as online social networks. The usefulness of exploiting unconventional data sources for better understanding migration patterns, as well as the benefits of merging knowledge from both traditional and novel datasets, have already been proven [35] . This unconventional approach is intended to find an alternative methodology to ultimately answer open questions about the human mobility framework (i.e. nowcasting flows and stocks, studying integration of multiple sources and knowledge, and investigating migration drivers). Nevertheless, in this context of meaningful combination of the conventional and the original, many types of data exist, still very scattered and heterogeneous: in the variety of this background, integration is not straightforward. For this purpose we propose a tool to be exploited in migration studies as a concrete example of this new integration-oriented approach: the Multi-aspect Integrated Migration Indicators (MIMI) dataset. It includes both official data about bidirectional human mobility (traditional flow and stock data) with multidisciplinary features and original indicators, including the Facebook Social Connectedness Index (SCI), which measures the relative probability that two individuals across two countries are friends with each other on Facebook. The inclusion of SCI in the dataset enables it to be exploited as a non-traditional way to describe, understand and nowcast international migration. The combination of this index with socioeconomic variables measuring the similarity of two locations (such as per capita income, religiosity and language) already appeared in [4, 5] where it has been shown that pairs of locations that are more similar on these dimensions share MIMI is an open dataset that provides multidimensional information about several traditional and non-traditional aspects related to human mobility phenomenon. Thanks to this variety of knowledge, experts from several research fields (demographers, sociologists, economists) could exploit MIMI to investigate the behavior of many drivers and relate it to migration trends, so as to build a comprehensive overview and understanding of them. As an example, it could be possible to access existing correlations between original sources of data and traditional migration measures, explore and investigate them and try to identify any possible causal relationship. Moreover, it could be possible to develop complex models able to assess human mobility framework by evaluating related interdisciplinary drivers, as well as models able to nowcast and predict traditional migration indicators in accordance with original features, such as the strength of social connectivity. By means of these algorithms, companies and researchers could find an alternative methodology to answer open questions about emerging mobility trends. Human migration is a complex phenomenon characterized by several related factors. It is also ancient as human history, and it has been widely studied, explored and described over time. However, the technological advancements and the rapid and drastic changes that society faced in the 21st century have impacted on the human mobility phenomenon, which consequently has undergone radical modifications. We believe that taking into account this same information about society changes and technological progress (such as economic, cultural and social big data) can be an effective strategy nowadays to detect new trends in bilateral migration and to better understand and nowcast it. The motivations for building and releasing the MIMI dataset precisely lie in this need of new perspectives, methods and analyses that can no longer prescind from taking into account a variety of new factors. The heterogeneous and multidimensional sets of data present in MIMI offer an all-encompassing overview of the characteristics of international human mobility, enabling a better understanding and an original potential exploration of the relationships between migration and non-traditional sources of data. The MIMI [21] dataset version 1 (March 15, 2022) was released under the Creative Commons Attribution 4.0 International Public License (CC BY 4.0 1 ) and is publicly available on Zenodo (10.5281/zenodo.6360651). It consists of a single file containing more than 28,000 entries (records) and 480 different features. In this section we provide all the dataset specifications and describe the structure of the CSV file in detail, as well as how each feature was built. The MIMI dataset is made up of one single CSV file that includes 28,725 rows and 485 columns. The index consists in uniquely identified pairs of countries, built from the join of the two ISO-3166 alpha-2 codes of origin and destination country respectively. Indeed, the dataset contains as main features country-to-country bilateral migration flows and stocks, together with the Facebook strength of connectedness of each pair. The dataset comprises migration features and social strenght of Facebook connectedness for 254 different countries belonging to the following macro-areas: North America, South America, Europe, Asia, Africa, Oceania, Antarctica. Since our work does not focus on the study of migration phenomenon per sé but on its possible relationship with social networks, in particular with the use of Facebook, the choice of the time range has been calculated accordingly. Therefore, the initial decision was not to select migration data antecedent to 2004. However, our intention was to make available a tool that could also be useful for the study on the differences between contemporary and past trends (e.g. alterations of some phenomenons, consistent changes of values compared to the past, consequences of previous data on the last few years, etc...): for this reason some features have been selected starting from 2000. Certainly, data selection according to predetermined temporal ranges always depends on the availability of sources: for example, during our data collection phase, Eurostat was not providing information about population density of countries before 2008. Table 1 provides a detailed temporal coverage of each time-related feature, apart from SCI for which we included the only one made available (the latest, which refers to October 13, 2021, updated in December 15, 2021). In this section we are going to list all the indicators included in the MIMI dataset, then we will describe them in detail in the following section. Table 2 contains a complete declaration of all drivers, grouped and categorized by context ("feature area"). The column "Name" contains the identifier of each feature: since it would not be possible to list all features, a more compact replacement rule is presented in order to include them all in the table. From this simple rule it is possible to derive the exact name of each single indicator. The column "Name" should be read as follows: the invariant part of the identifier is static, while the interchangeable part must be substituted as explained below in order to obtain the exact name of the feature. • country should be replaced with origin or destination. • year and start-end should be replaced, respectively, with the reference year (in case of annual feature) or reference year range (for NET migration and NET migration rate features). Substituted values should be consistent with the temporal coverage available for each indicator, which can be found in Table 1 . • source allows UN and ESTAT as replacement values. • sex should be substituted with F, M or T (respectively, female, male or both). • age allows only T as replacement value for data obtained from UN (both flows and stocks), while it can take four different values for ESTAT flows: T (total), <15 (less than 15 years), 15-64 (from 15 to 64 years), >65 (65 years or over). Some examples are provided in Table 2 footnotes. In this section we are going to describe in detail each single feature listed in the previous section, also reporting all the data sources: some indicators may have multiple sources since they were necessary to better integrate missing values. As stated in Section 2, the purpose of the integration of all these different drivers in the MIMI dataset is to allow the exploration of any of their possible connections with the international migration phenomenon, and eventually exploit them to better understand and nowcast it. • Index (feature 1 in Table 2 ). The index consists in uniquely identified pairs of countries, built as follows: ISO2 code of origin country -ISO2 code of destination country (e.g. AL-FI index indicates records related to migration from Albania to Finland). Pairs having the same country codes for origin and destination indicate the so-called "returners" (e.g. BE-BE record represents people that were born or have citizenship in Belgium which moved their residence in Belgium in the reference year). • Facebook data (feature 2 in Table 2 ). This indicator represents one of the most non-traditional feature (i.e. social media data) within the context of migration studies that we included. It consists in the so-called Facebook Social Connectedness Index (bit.ly/Facebook_SCI) publicly provided by "Data for Good at Meta" 2 organisation on "Humanitarian Data Exchange, Data for Good" platform 3 . Country-to-country values of SCI are available in TSV format for more that 34,000 pairs, updated to December 2021 [32] . This indicator uses anonymized insights of active Facebook users and their friendship networks to measure the intensity of connectedness between locations [4] . In this way, the resulting formulation in Equation 1 is a measure of the social connectedness between the two locations i and j, that is representative of the relative probability that two individuals across the two locations are friends with each other on Facebook: if SocialConnectednessIndex i,j is twice as large, a Facebook user in country i is about twice as likely to be connected with a given Facebook user in country j. Specifically, in this work the concept of "locations" coincides with NUTS0 areas since our dataset only focuses on country-to-country bilateral migration. Nevertheless, SCI is also provided with respect to narrower geographical granularities, (e.g. NUTS2, NUTS3): we do not exclude future works focused on the study of migration trends at a smaller resolution (country-to-county, or county-to-county). The SCI has a symmetric structure by definition of the concept of "friendship" and has been re-scaled to have a maximum value of 1,000,000,000 and a minimum value of 1. In our dataset, the minimum possible value was originally 0 (indicating pairs of countries for which the index was not available), subsequently replaced with an arbitrarily small value (chosen as half of the minimum available) in order to fix problems when computing Pearson correlation of the logarithmic SCI. • Geographic features (features 3-15 in Table 2 ). These features portray and contextualize both origin and destination countries at geographical level providing all the necessary information to describe them, starting from the official codes and names, up to their land extent and how far they are. Specifically: features 5, 6, 7, 8 are ISO-3166 standards nomenclatures for country identification, retrieved from PyCountry Python module 4 and ISAN (International Standard Audiovisual Number) [23] . features 9, 10 identify continents as follows: Africa (AF); Antarctica (AQ); Asia (AS); Europe (EU); America, North (NA); Oceania (OC); America, South (SA). feature 3 consists in the pair code of origin continentcode of destination continent. Its functionality can be fully appreciated in chord diagrams of Section 5. features 11, 12, 13 locate the position of the centroids of both origin and destination countries in a classic geographic coordinate system. They are gathered and integrated from Google DSPL [11] and from latlng() method of CountryInfo Python library 5 , and then merged together in a tuple (feature 13) built as a specific GeoPandas data structure called "geometry array" 6 . feature 4 is the measure of distance between origin and destination, computed starting from the tuple in feature 13 of both countries and using the geodesic formulation 7 [27] provided by GeoPy Python library 8 . It has already been observed in [4] that, at county level, much of the estimated effect of distance on migration might be coming from the relationship between distance and social connectedness: therefore the use SCI indicator could better explain the variation of migration flows than geographic distance alone can. feature 14 consists in the list of countries that share a border with the given country. The utility of this feature is to find out if the two countries of origin and destination share a border, using a straightforward function to check if a country name (feature 6) is contained into the list of neighbors of the other, and vice versa. An additional binary feature (e.g. "neighbors", having value True or False) could be derived from this method. Countries having empty list are islands. The corresponding sources for this feature are the following: GitHub repository in [20] , borders() method of CountryInfo Python module 9 and Wikipedia [46] . feature 15 is the measure of the area extension of the country in squared kilometers. It is gathered from The World Bank [37] and integrated with area() method of CountryInfo Python module 10 . • Interdisciplinary indicators (features 16-25 in Table 2 ). Some of these drivers are considered non-traditional data in the context of migration studies since their use in migration understanding and nowcasting is poorly documented in literature. Despite this, most of the available studies consider these features as relevant in such context, as they are related to the behavior of international migration trends. feature 17 is an indicator that provides per capita 11 annual values for gross domestic product (GDP) of a country, expressed in current international dollars and converted by purchasing power parity (PPP) 12 conversion factor. Data is retrieved from The World Bank [36] . The gross domestic product is one of the "Development Indicators", already widely used in literature in combination with global migration. features 16, 18 correspond to two lists containing, respectively, the most practiced religions, and the most spoken languages in the country (both including official ones and minorities). The benefit of including these columns would be to discover if the two countries of origin and destination share some languages or religions (or both), since this could favor a migratory exchange between the two. Rare languages and religions used only in one country and not shared with any other have been removed as meaningless for our purposes. Languages have been gathered from Wikipedia [47] while religions comes from DataHub [8] and have been integrated with Wikipedia data [48] . features 19, 20 indicates the quantity (respectively, as absolute number and as percentage of the total population) of Facebook users that a given country has. The source is World Population review [50] , which refers to the latest available measure for each country (oldest date back to December 2020). 5 https://pypi.org/project/countryinfo/#latlng 6 https://geopandas.org/en/stable/docs/reference/api/geopandas.points_from_xy.html 7 https://geopy.readthedocs.io/en/stable/#module-geopy.distance 8 https://pypi.org/project/geopy/ 9 https://pypi.org/project/countryinfo/#borders 10 https://pypi.org/project/countryinfo/#area 11 calculated as the aggregate of production (GDP) divided by the population size. 12 a detailed definition of PPP provided by System of National Accounts 1993 Glossary can be found here: https://unstats.un.org /unsd/nationalaccount/glossresults.asp?gID=438 -features 21-25 represents Cultural Indices of a location, intended as dimensions along which cultural values of that location can be analyzed [26] . Their origin dates back to the work of [22] although, over the decades, independent research branches led to the creation and addition of new ones [49] . Our work includes five of these indicators, of which we provide a brief individual description. Their applications in literature have been several (e.g. cross-cultural studies using Twitter data [3] ), but the purpose of their inclusion in the MIMI dataset is to use them in an original way: our intention is to explore and understand their possible relation with international migration trends. Data about cultural indicators are available in different NUTS levels but in our work they only appear related to NUT0 (country) level since it is the only one that fits our geographic viewpoint. Features 21-25 are the result of the integration of the two different datasets [24, 9] . Unfortunately, they are provided only for 66 of the more than 250 available countries but, despite this, most of them have already shown to be strongly involved in migration trends (see the behavior of their correlation values with the absolute number of migrants of a country, in Section 5.1). Starting from cultural dimensions of both countries of origin destination, a new feature about cultural distance could be obtained: datasets with this configuration already exist [26, 25] despite, at the moment, data is available only for a third of the countries (22 in total). * feature 21 is Power distance indicator (PDI) which is defined as "the extent to which the less powerful members of organizations and institutions (like the family) accept and expect that power is distributed unequally" [49] . This index describes the extent to which hierarchical relations and unequal distribution of power in organisations and societal institutions are accepted in a culture. * feature 22: Individualism indicator (IDV) 13 (as opposed to collectivism) explores the "degree to which people in a society are integrated into groups" [49] : it reflects the extent to which people prefer to act as individuals rather than as members of a community. * feature 23 is Masculinity indicator (MAS), defined as "a preference in society for achievement, heroism, assertiveness and material rewards for success" [49] : as opposed to femininity, this dimension reveals to what degree traditionally masculine societal values, such as orientation towards accomplishment, prevail over values such as modesty, solidarity or tolerance. * feature 24 is Uncertainty avoidance indicator (UAI) defined as "a society's tolerance for ambiguity", in which people embrace or avert an event of something unexpected, unknown, or away from the status quo [49] . * feature 25: Long-term orientation indicator (LTO) associates the connection of the past with the current and future actions/challenges. A lower degree of this index (short-term orientation) indicates that traditions are honored and kept [49] . • Demographic features (features 26-33 in Table 2 ). These features correspond to traditional migration and population measures obtained from official statistics, either from national censuses or from the population registries. (from which only records with "Zero migration" variant were selected) and EUROSTAT [19] : these two sources often refer to different groups of countries so their mutual integration allowed to cover most of the countries of the dataset. Where both measurements were available for the same country, both were reported. The two sources refer to different methodologies, since the annual total population measurement is performed on July 1st by UN, while on January 1st by EUROSTAT. However, their ∼1 correlation value proves that the two measures, related to the same year, are well compatible and almost interchangeable: indeed missing values related to the former have been replaced with the latter, and vice versa. feature 27 represents annual population density, defined as the ratio between the annual average population and the land area. Therefore, its unit of measure correponds to "persons per square kilometre". Data has been retrieved from ESTAT [18] . feature 28, 29: absolute number of migrants (respectively, immigrants and emigrants) per country. Data was taken from ESTAT [14, 12] and from UN datasets on flows (see below feature 32) selecting, from these latters, records having "Total" as country (respectively, origin and destination country). features 30, 31 indicate quinquennial NET migration and NET migration rate of each country. The former is the difference between the number of immigrants and the number of emigrants in a given area during the reference year, while the latter is defined as the NET migration per 1,000 persons and so it indicates the contribution of migration to the overall level of population change. A positive value for them indicates that there are more migrants entering than leaving a country (NET immigration), while a negative one means that emigrants are more than immigrants (NET emigration). Values have been taken from UN Population Division [41, 40] : note that they apply also for EUROSTAT countries, and they have been widely used in literature in combination with them, even if NET migration rate calculation is based on midyear population (as required by the standard UN methodology). feature 32: yearly migration flows for each pair of countries are defined as the number of people that have moved the country (i.e. that changed residence). Unlike a static stock measure, flow data are dynamic, summarising movements over defined period and consequently allow for a better understanding of past patterns and the prediction of future trends [1] . Both EUROSTAT and UN divide migration flows into three categories: by residence [44, 42, 13, 17] , by citizenship [43, 15] and by country of birth [38, 16] . This is true in EUROSTAT for both inflows and outflows, while in UN only for inflows, as UN outflows exist only by residence. For our purposes, however, we selected EUROSTAT outflows only by residence, since the ones by citizenship and by country of birth cannot properly be defined "flows", having missing destination country. feature 33: quinquennial migration stocks for each pair of countries consist in the absolute number of migrants residing in the destination country at given time. Data is obtained from UN [39] and includes stocks by sex and age. The entire work was performed in Python 3.8 language, with the aid of Jupyter software 14 . The initial phase consisted in data collection and acquisition, starting from the exploration of open source portals and proceeding with data selection and download. Initially, only migration flows data were imported. Then a pre-processing phase started, where we carried out data understanding, cleaning and preparation. This has been managed by defining some functions that automatically clean and prepare source datasets. Here our data was subjected to various computational standard processes (such as outliers detection, duplicates handling, uniforming notation, etc. . . ). Some of the operations that have been performed at this level included the selection of task-relevant data (detection of country-to-country valid records, aggregation removal, and non-bilateral flows elimination). Data transformation phase was fundamental to reshape the data in order to resemble the final structure (previously established by our design choices) so that to have a huge matrix with pairs of countries as rows. Concretely, this meant converting, grouping, and unstacking records of source datasets in order to transform them in features (columns). We continued on shaping this framework by working on indexing: to obtain the dataset index we described in Section 3.2.2, duplicates of pairs of countries where not admissible. For this reason, specifically with respect to EUROSTAT flows, we established a priority for selection of pairs: the union of keys (pairs) was taken firstly selecting migration by citizenship, then by residence, and lastly by country of birth. The following step was data integration were we collected, included and computed all other indicators. Geographic and interdisciplinary features related to single countries (5-25 in Table 2 ) have been processed in a separate dataset since, neither containing demographic data nor information about couples of countries, it can be reused in different contexts where needed. This countries.csv dataset has undergone the same pre-processing pipeline, but not the trasformation one, since it has its own structure and design: it was then merged 15 with the MIMI prototype previously obtained (already structured according to our needs) by matching both countries of origin and destination. Finally the latest features (2-4 and demographic 26-33, in Table 2 ) were integrated by computing them or following the previously described merging process, matching single countries or pairs when needed. Once integration has been completed, it has been helpful to check data semantic and statistics of the resulting dataset and make some random inspections in order to verify the need for a further cleaning step. The final data quality assessment phase was one of the longest and most delicate, since many values were missing and this could have had a negative impact on the quality of the desired resulting knowledge. They have been integrated from additional sources reported, for each feature, in Section 3.2.2. In this section our focus is on documenting and describing salient patterns in distributions and correlations of data. We do not seek to provide causal analyses, nor do we want to imply causal relationships at this stage: however we believe it can be useful to analyze the obtained numerical results since they may guide possible future research and led to some interesting progress in human mobility studies. Unless otherwise specified, correlation values have been computed as simple Pearson's correlation [34] , measuring the linear relationship between two variables: values of -1 or +1 imply an exact linear relationship, while 0 implies no correlation. P-values have been computed in order to confirm of refute the relevance of each correlation value: results are indicated in heatmaps with a number of asterisks proportional to the relevance obtained. no asterisks no relevance p-value ≥ 0.5 * little relevance 0.1 ≤ p-value < 0.5 ** medium relevance 0.01 ≤ p-value < 0.1 *** high relevance p-value < 0.01 When no asterisks are reported for all the values in the matrix, all the correlations computed are highly relevant, meaning p-values always below the threshold of 0.01. In Despite the impact of COVID-19 pandemic on international human mobility, mostly related to travel restrictions and "stay-at-home" measures which reduced internal movements within a country [31] , Figures 9 and 12 confirm that the numbers in migration flows statistics did not suffer. However, a consistent flow of returners can be noticed for Thailand, probably due to COVID-19 itself, since in 2020 the pandemic prompted the return of hundreds of thousands of migrants to their countries of origin [33] . Regarding migration stocks in Figure 10 , the impact of COVID-19 on the global population of international migrants is difficult to assess, since the latest available data refers mid-2020, fairly early in the pandemic. However, it is estimated that the pandemic may have reduced the growth in the stock of international migrants by around two million [31, 29] . Moreover, almost all these pairs of countries are included in the "top 20 international migration country-tocountry corridors, 2020" list in the World Migration Report 2022 [31] , (e.g. Mexico -United States, Syria -Turkey, India -Saudi Arabia, United Arab Emirates and United States, Afghanistan -Iran, Myanmar -Thailand), meaning that the greatest communities of permanently residing migrants in a host country have developed over years for safety reasons. Boxplots in Figures 11, 12 and 13 display the statistical distribution of migration flows and stocks values over the years, divided by sex. Increasing trends and regular patterns over time are well recognizable from the timeseries data plotted, as well as the statistics evidence on male migration that reveals largest numbers with respect to female one (about gender dimensions on human mobility refer to [30] ). Heatmap in Figure 14 shows correlations between the computed ratio of total migrants and total population of a country and its cultural indicators, while Figures 15 and 16 correspond to the outcome of the division of that heatmap in immigration and emigration with the mapping of annual correlation values in a bidimensional plane. Values of almost all indicators seem to initially lie mostly in the upper zone of the plane, showing a quite strong positive correlation with emigration, until some breakpoint years occur and the correlation value becomes henceforth highly negative. This radical change in trend cannot yet be supported and explained by a causal relation, so we limit ourselves to report its behavior. Concerning correlation related to immigration, they lie on the middle region of the plot, quite far from the range in the upper and lower extremities, therefore assuming less polarized values. Besides, there are no trend reversal for it. Correlation between NET migration rate and GDP of a country shown in Figure 17 confirms the existing relation, well documented in literature, between these two variables. Correlation is always positive, meaning that countries with high GDP face a NET immigration trend and so confirming that high per capita income are conducive to mobility [6] . Specifically, human mobility is influenced by GDP values up to more than 10 years back. Heatmaps in Figure 18 illustrates the trends in Spearman correlations over years between EUROSTAT migrations flows and UN migration stocks. Although the existing correlation between stocks at a given time t and flows relative to previous years is self-evident (as those same flows will be included in the total counting of stocks), it is interesting to notice that quite strong positive correlations also propagate forward in time: this could mean that the higher the stock count at a given time t, the more migration flows will be shared by the pair of countries. Finally, Figure 19 explores the changes in trend of NET migration rate for a small sample of countries. The increasing trend encountered in the previous chart is not present for these distributions, where instead it is possible to notice a regularity in the behavior over time: a gradual descent takes a few years (which ends coincide with the drops in the previous plot) and then have a sudden peak of ascent. The discrepancy between male and female migration is sharper. Figure 13 : Distribution of migration stocks. The five year measurement prevents you from having a more detailed look as it was for the flows: nevertheless, an increase in the general trend over years is quite evident. Estimates of global bilateral migration flows by gender between Combining social media and survey data to nowcast migrant stocks in the united states Cross-cultural studies using social networks data Social connectedness: Measurement, determinants, and effects The determinants of social connectedness in europe Internal migration and development: Comparing migration intensities around the world Data innovation in demography, migration and human mobility World religion projections Modelling international migration flows by integrating multiple data sources Countries dataset Emigration by age and sex Emigration by age group, sex and country of next usual residence Immigration by age and sex Immigration by age group, sex and citizenship Immigration by age group, sex and country of birth Immigration by age group, sex and country of previous residence bordering-countries" github repository, "neighbors.csv" dataset Multi-aspect Integrated Migration Indicators (MIMI) dataset Culture's consequences: International differences in work related valuese List of iso 3166 country codes Ess/evs-based indicators of cultural dimensions A new dataset of cultural distances for european countries and regions Regional cultural differences within european countries: Evidence from multi-country surveys Algorithms for geodesics Friendship as a social process: A substantive and methodological analysis. Freedom and Control in Modern Society Immobility as the ultimate migration disrupter Understanding migration journeys from migrants' perspectives Migration and migrants: A global overview Migration data in south-eastern asia Human migration: the big data perspective Gdp per capita, ppp (current international $) Land area (sq. km) UN. International migrant stock 2020 Net migration rate (per 1,000 population Net number of migrants Number of emigrating citizens by future country of usual residence and sex Number of incoming foreign migrants by country of citizenship and sex Number of incoming international migrants by previous country of usual residence and sex List of countries and territories by land borders List of official languages by country and territory Wikipedia. Hofstede's cultural dimensions theory -Wikipedia, the free encyclopedia Facebook users by country 2022 This research was funded by HumMingBird H2020 Program grant number 870661 and SoBigData grant number 871042. The MIMI dataset v1.0 presented in this study is openly available on Zenodo at the following link: doi.org/10.5281/zenodo.6360651. The following abbreviations are used in this work: GDP Gross Domestic Product PPP Purchasing Power Parity SCI Social Connectedness Index UN United Nations