key: cord-0148856-qdvpgpcr authors: Chi, Guanghua; Fang, Han; Chatterjee, Sourav; Blumenstock, Joshua E. title: Micro-Estimates of Wealth for all Low- and Middle-Income Countries date: 2021-04-15 journal: nan DOI: nan sha: 7b10afb56a27a90a8cd5db3bd50b189288e2d82c doc_id: 148856 cord_uid: qdvpgpcr Many critical policy decisions, from strategic investments to the allocation of humanitarian aid, rely on data about the geographic distribution of wealth and poverty. Yet many poverty maps are out of date or exist only at very coarse levels of granularity. Here we develop the first micro-estimates of wealth and poverty that cover the populated surface of all 135 low and middle-income countries (LMICs) at 2.4km resolution. The estimates are built by applying machine learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, topographic maps, as well as aggregated and de-identified connectivity data from Facebook. We train and calibrate the estimates using nationally-representative household survey data from 56 LMICs, then validate their accuracy using four independent sources of household survey data from 18 countries. We also provide confidence intervals for each micro-estimate to facilitate responsible downstream use. These estimates are provided free for public use in the hope that they enable targeted policy response to the COVID-19 pandemic, provide the foundation for new insights into the causes and consequences of economic development and growth, and promote responsible policymaking in support of the Sustainable Development Goals. Many critical decisions require accurate, quantitative data on the local distribution of wealth and poverty. Governments and non-profit organizations rely on such data to target humanitarian aid and design social protection systems (1, 2) ; businesses use this information to guide their marketing and investment strategies (3) ; these data also provide the foundation for entire fields of basic and applied social science research (4) . Yet reliable socioeconomic data are expensive to collect, and only half of all countries have access to adequate data on poverty (5) . In some cases, the data that do exist are subject to political capture and censorship (6, 7) , and very rarely do such data allow for disaggregation beyond the largest administrative level (8) . The scarcity of quantitative data is thus a major impediment to policymakers and researchers interested in solutions to global poverty and inequality. Data gaps similarly hinder the broad international coalition working toward the Sustainable Development Goals, in particular toward the first goal of ending poverty in all its forms everywhere (9) . To address these data gaps, researchers have developed several approaches to construct poverty maps from non-traditional sources of data. These include methods from small area statistics that combine household sample surveys with comprehensive census data (10) , and more recent use of satellite 'night-lights' (11) (12) (13) , mobile phone data (14) , social media data (15) , high-resolution satellite imagery (16) (17) (18) (19) , or some combination of these (20, 21) . But these efforts have focused on a single continent or a select set of countries, limiting their relevance to development objectives that require a more global perspective. Here we develop a novel approach to construct micro-regional wealth estimates, and use this method to create the first complete set of micro-estimates of the distribution of poverty and wealth across all 135 LMICs (Fig. 1a) . We use this method to generate, for each of roughly 19.1 million unique 2.4km micro-regions in all global LMICs, an estimate of the average absolute wealth (in dollars) and relative wealth (relative to others in the same country) of the people living in that region. These estimates, which are more granular and comprehensive than previous approaches, make it possible to see extremely local variation in wealth disparities ( Fig. 1b and Fig. 1c ). Our approach, outlined in Fig. 2 , relies on "ground truth" measurements of household wealth collected through traditional face-to-face surveys with 1,457,315 unique households living in 66,819 villages in 56 different LMICs around the world (Table S1 ). These Demographic and Health Surveys (DHS), which are independently funded by the U.S. Agency for International Development, contain detailed questions about the economic circumstances of each household, and make it possible to compute a standardized indicator of the average asset-based wealth of each village (see SM1) (8) . We then use spatial markers in the survey data to link each village to a vast array of non-traditional digital data. This includes high-resolution satellite imagery, data from mobile phone networks, topographic maps, as well as aggregated and de-identified connectivity data from Facebook (Table S2 ). These data are processed using deep learning and other computational algorithms, which convert the raw data to a set of quantitative features of each village (Fig. S2) . We use these features to train a supervised machine learning (ML) model that predicts the relative wealth (Fig. 1a) and absolute wealth (Fig. S3a ) of each all populated 2.4km grid cells in LMICs (see . The estimates of wealth and poverty are quite accurate. Depending on the method used to evaluate performance, the model explains 56-70% of the actual variation in household-level wealth in LMICs (Fig. 3a) . This performance compares favorably to state-of-the-art methods that focus on single countries or continents (16, 19) (see SM4). To provide visual intuition for the fine granularity of the wealth estimates, Fig. 1c shows an enlargement of a region in the outskirts of Cape Town, South Africa. The satellite imagery shows the physical terrain, which juxtaposes high-density urban areas with farmland and undeveloped zones by the airport and off the main highway. The bottom half of the figure shows the wealth estimates for the same region, which highlight the contrast in wealth between these neighboring areas. To validate the accuracy of these estimates, and to eliminate the possibility that the ML model is 'overfit' on the DHS surveys, we compare the model's estimates to four independent sources of ground truth data. The first test uses data from 15 LMICs that have collected and published census data since 2001 (Table S3 ). These data contain census survey responses from 27 million unique individuals, including questions about the economic circumstances of each household. Importantly, the census data are independently collected and are never used to train the ML model. In each country, we aggregate the census data at the smallest administrative unit possible and calculate a 'census wealth index' as the average wealth of households in that census unit. We separately aggregate the 2.4km wealth estimates from the ML model to the same administrative unit. The ML model explains 72% of the variation in household wealth across the 979 census units formed by pooling data from the 15 censuses (Fig. 3c ) and, on average, 86% of the variation in household wealth within each of the 15 countries (Fig. S4 ). To test the accuracy of the model at the most granular level possible, we obtain three additional sources of survey data that link household wealth information to the exact geocoordinates of each surveyed household. The first dataset, collected by the government of the Togolese Republic (Togo) in 2018-2019, contains a nationally-representative sample of 6,172 households located in 922 unique 2.4km grid cells (Fig. 4a) . We find that the ML model's predictions explain 76% of the variation in wealth of these grid cells (Fig. 4b) , and 84% of the variation in wealth of cantons, Togo's smallest administrative unit (Fig. 4c ). The second dataset, similar to the first but independently collected by the government of Nigeria in 2019, contains a nationallyrepresentative sample of 22,104 households in 2,446 grid cells (Fig. 4d) . We find that the ML estimates explain 50% of the variation in grid cell wealth ( Figure 4e ) and 71% of the variation in wealth of Local Government Areas (Fig. 4f) . We further validate the grid-level predictions using a dataset collected by GiveDirectly, a nonprofit organization that provides humanitarian aid to poor households. In 2018, GiveDirectly surveyed 5,703 households in two counties in Kenya (Fig. 4g) , recording a Poverty Probability Index as well as the exact geocoordinates of each household (Fig. 4h ). Using these data, we show that even within small rural villages, the ML model's predictions correlate with GiveDirectly's estimates of poverty and wealth (Fig. 4i ). In addition to providing point estimates of the average wealth of the households in each grid cell, we calculate confidence intervals around each estimate (Fig. S3b) . These are obtained through standard resampling methods, combined with a more structural approach that models the prediction error as a function of observable characteristics of each location (see SM11). As expected, we find that prediction errors are larger in regions that are far from areas covered by the DHS data (Table S4) . While measures of uncertainty are not common in prior work on subregional wealth estimation, we believe this is an important step to help promote the responsible use of such estimates in research and policy settings (22) . We are making these micro-regional estimates of relative wealth and poverty, along with the associated confidence intervals, freely available for public use and analysis. These estimates are provided through an open and interactive data interface that allows scientists and policymakers to explore and download the data ( Fig. S1 ; see http://beta.povertymaps.net/ for a preliminary "beta" version of the interactive interface). How might these estimates be used to guide real-world policymaking decisions? One key application is in the targeting of social assistance and humanitarian aid. In the months following the onset of the COVID-19 pandemic, hundreds of new social protection programs were launched in LMICs, and in each case, program administrators faced difficult decisions about whom to prioritize for assistance (23) . This is because in many LMICs, planners do not have comprehensive data on the income or consumption of individual households (24) . The new estimates provide one potential solution. In simulations, we find that geographic targeting using our micro-estimates allocates a higher share of benefits to the poor (and a lower share of benefits to the non-poor) than geographic targeting approaches based on recent nationally representative household survey data (Table 1 and SM13). This is because the micro-estimates make it possible to target smaller geographic regions than would be possible with traditional survey dataa finding that is consistent with prior work that suggests that more granular targeting can produce large gains in welfare (2, 25, 26) . For instance, the most recent DHS survey in Nigeria only surveyed households in 13.8% of all Nigerian wards (the smallest administrative unit in the country); by contrast, the microestimates cover 100% of wards. In Togo, existing government surveys only provide poverty estimates that are representative at the regional level (of which there are only 5); we pro vide estimates for 9,770 distinct tiles. Based on the strength of these results, the Government of Nigeria is using these estimates as the basis for social protection programs that are providing benefits to millions of poor families (27) . Likewise, the Government of Togo is using these estimates to target mobile money transfers to hundreds of thousands of the country's poorest mobile subscribers (28) . These examples highlight how the ML estimates can improve targeting performance even in countries with robust national statistical offices, like Nigeria and Togo. In the large number of LMICs that have not conducted a recent nationally representative household survey, these micro-estimates create an option for geographic targeting that would otherwise not exist. The standardized procedure through which these estimates are produced may also be attractive in contexts where political economy considerations might lead to systematic misreporting of data (7) or influence whether new data are collected at all (6) . However, this does not imply the ML estimates are apolitical, as maps have a historical tendency to perpetuate existing relations of power (29) . One particular concern is that the technology used to construct these estimates may not be transparent to the average user; if not produced or validated by independent bodies, such opacity might create alternative mechanisms for manipulation and misreporting. While our primary focus is on constructing, validating, and disseminating this new resource, the process of building this dataset produces several insights relevant to the construction of highresolution poverty maps. For instance, we find that different sources of input data complement each other in improving predictive performance (20, 21) . While prior work has focused heavily on satellite imagery, we find that models trained only on satellite data do not perform as well as models that include other input data (Fig. S7a ). In particular, information on mobile connectivity is highly predictive of sub-regional wealth, with 5 of the 10 most important features in the model related to connectivity ( Fig. S2 and SM5 ). The global scale of our analysis also reveals intuitive patterns in the geographic generalizability of machine learning models (16, 30, 31) . We find that models trained using data in one country are most accurate when applied to neighboring countries (Fig. S6 ). Models also perform better in countries when trained on countries with similar observable characteristics (Table S4) . And while much of the model's performance derives from being able to differentiate between urban and rural areas, the model can differentiate variation in wealth within these regions as well (Fig. 3b ). Our hope is that these methods and maps can provide a new set of tools to study economic development and growth, guide interventions, monitor and evaluate policies, and track the elimination of poverty worldwide. . Non-traditional data from satellites and other existing sensors are also sourced from each location. c) These data are used to train a machine learning algorithm that predicts micro-regional poverty from non-traditional data, even in regions where no ground truth data exists. The ground truth wealth data used to train the predictive models are derived from household surveys conducted by the Demographic and Health Survey (DHS) Program. According to the program, the DHS collects "nationally-representative household surveys that provide data for a wide range of monitoring and impact evaluation indicators in the areas of population, health, and nutrition." i We elected to train our model exclusively on DHS data because it is the most comprehensive single source of publicly available, internationally standardized wealth data that provides household-level wealth estimates with sub-regional geo-markers. The fact that we use the DHS data as our ground truth measure of wealth and poverty means that we are effectively training our machine learning algorithm to reconstruct a DHS-style relative wealth indexalbeit at a much finer spatial resolution and in areas where DHS surveys did not occur. This is because we believe the DHS version of a relative wealth index is the best publicly available instrument for consistently measuring wealth across a large number of LMICs. However, it posits a specific, asset-based definition of wealth that does not necessarily capture a broader notion of human development. More broadly, a rich social science literature debates the appropriateness of different measures of human welfare and well-being (4, 33) . Our decision to focus on estimating asset-based wealth, rather than a different measure of socioeconomic status (SES), was motivated by several considerations. First, in developing economies, where large portions of the population do not earn formal wages, measures of income are notoriously unreliable. Instead, researchers and policymakers rely on asset-based wealth indices or measures of consumption expenditures. Between these two, wealth is much less time-consuming to record in a survey; as a result, wealth data are more commonly collected in a standardized format for a large number of countries (34) . We obtain the most recent publicly-available DHS survey data from 56 countries (Table S1 ). The criteria for inclusion are that the data are available for download through the DHS website (as of March 2020), the data contain asset/wealth information and sub-regional geomarkers, and that the most recent survey was conducted since 2000. The combined dataset contains the survey responses from 1,457,315 household surveys taken across Africa, Asia, Europe, and Latin America. Each individual household survey lasts several hours, and contains several questions related to the socioeconomic status of the household. We focus on a standardized set of questions about assets and housing characteristics. ii From the responses to these questions, and following standard practice (8, 35) , the DHS calculates a single continuous measure of relative household wealth, the Relative Wealth Index (RWI), by taking the first principal component of these 15 questions. It is this DHS-computed RWI that we rely upon as a ground truth measure of wealth. In addition to providing measures of wealth for each household, the DHS indicates the cluster in which each household is located. The 1.5M households are associated with 66,819 unique clusters, where a cluster is roughly equivalent to a village in rural areas and a neighborhood in urban areas. We calculate the average wealth of each "village" cluster by taking the mean RWI of all surveyed households in that cluster. iii This village-level average RWI is the target variable for the machine learning model. The prediction algorithms rely on data from several different sources (Table S2 ). To facilitate downstream analysis, all data are converted into features that are aggregated at the level of a 2.4km grid cell. We use 2.4km cells because that is the highest resolution at which many of our input data are available, and it best suited to the spatial merge with the survey data (see "supervised machine learning" below). We were also concerned that providing estimates of wealth at even smaller grid cells might compromise the privacy of individual households. Thus, if the native resolution of a data source is higher than 2.4km, we aggregate the smaller cells to the 2.4km level by taking the average of the smaller cells. The features input into the model indicate, for each cell, properties such as the average road density, the average elevation, and the average annual precipitation. Several features related to telecommunications connectivity are obtained from Facebook, which uses proprietary methods to estimate the availability and use of telecommunications infrastructure from de-identified Facebook usage data iv . All estimates are regionally aggregated at the 2.4km level to preserve user privacy. We use estimates of the number of mobile cellular towers in each grid cell, as well as the number of WiFi access points and the number of mobile devices of different types. These measures are based on the infrastructure used by Facebook users, so may not be representative of the full population. To the extent that these features are predictive of regional wealth (which they are), no deeper inference or causal interpretation should be drawn from the empirical association. Rather, these patterns simply indicate that the regional distribution of wealth is correlated with these non-representative measures of telecommunications use. Since the raw satellite imagery is extremely high-dimensional, we use unsupervised learning algorithms to compress the raw data into a set of 100 features. Specifically, following Jean et al. (16) , we use a pre-trained, 50-layer convolutional neural network to convert each 256x256 pixel image into 2048 features, and then extract the first 100 principal components of these 2048ii The full set of indicators are: electricity in household, telephone, automobile, motorcycle, refrigerator, TV, Radio, water supply, cooking fuel, trash disposal, toilet, floor material, wall material, roof material, and rooms in house. iii Our main estimates do not use the cluster weights provided by the DHS. We separately evaluate a model that used these weights to train a weighted regression tree, and find that the predictions of the two models are highly correlated (r = 0.9), and result in similar overall performance (R 2 =0.56 without weights vs. R 2 =0.54 with weights). iv https://research.fb.com/category/connectivity/ dimensional vectors. v These 100 components explain 97% of the variance of the 2048 features ( Fig. S8 ). All input features are normalized by subtracting the country-specific mean and dividing by the country-specific standard deviation. We match the ground truth wealth data to the input data using spatial information present in both datasets. The 2.4km grid cells are defined by absolute latitude and longitud e coordinates specified by the Bing tile system. vi The DHS data include approximate information about the GPS coordinate of the centroid of each of the 66,819 villages. However, the exact geocoordinates are masked by the DHS program with up to 2km of jitter in urban areas and up to 5km of jitter in rural areas. To ensure that the input data associated with each village cover the village's true location, we include a 2x2 grid of 2.4km cells around the centroid in urban areas, and a 4x4 grid in rural areas. For each of village, we then take the population-weighted average of the 112-dimensional feature vectors across 2x2 or 4x4 set of cells, using existing estimates of the population of 2.4km grid cells (37). This leaves us with a training set of 66,819 villages with wealth labels (calculated from the ground truth data) and 112-dimensional feature vectors (computed from the input data). We use machine learning algorithms to predict the average RWI of each village from the 112 features associated with that village. We do not perform ex ante feature selection prior to fitting the model. We use a gradient boosted regression tree, a popular and flexible supervised learning algorithm, to map the inputs to the response variable. To tune the hyperparameters of the gradient boosted tree, we use three different approaches to cross-validation. vii • K-fold cross-validation (labeled "Basic CV" in Fig. 3a ). For each country, the labelled data are pooled, and then randomly partitioned into k = 5 equal subsets. A model is trained on all but one subset and tested on the held-out subset. The process is repeated k times and we report average held-out performance for that country. This approach to cross-validation is used most frequently in prior work, but can substantially over-estimate performance(38). This bias arises because both the input (e.g., satellite) and response (RWI) data are spatially auto-correlated, leaving the training and test data not i.i.d. (39). viii • Leave-one-country-out cross-validation ("Leave-country-out"). For each country, a model is trained using the pooled data from all other 55 countries; the test performance is evaluated on the held-out country (16) . • Spatially-stratified cross-validation ("Spatial CV"). This method ensures that training and test data are sampled from geographically distinct regions (38, 39). In each country, we select a random cell as the training centroid, then define the training dataset as the nearest (k-1)/k percent of cells to that centroid. The remaining 1/k cells from that country form the test dataset. This procedure is repeated k times in each country. Fig. 3a compares the performance of these three methods, by showing the distribution of R 2 values for each approach to cross-validation (the distribution is formed from 56 countries, where a separate model is trained and cross-validated in each country). The difference in R 2 resulting from different approaches to cross-validation highlights the potential upward bias in performance that results from spatial auto-correlation in training and test data. By comparison, recent work on wealth prediction in Africa found that a mixture of remote sensing and nightlight imagery explains on average 67% of the variation in wealth (19) . That benchmark was based on an approach similar to the "leave-country-out" method shown in Fig. 3a ; the slight decline in performance that we observe is likely due to the fact that the 23 countries in Africa studied by (19) are substantially more homogenous than the full set of LMICs that we analyze. Unless noted otherwise, all analysis in this paper uses models based on spatially-stratified crossvalidation. While this has the effect of lowering the R 2 values that we report, we believe it is the most conservative and appropriate method for training machine learning models on geographic data with spatial auto-correlation. To shed light on which of the various data sources are driving the model's predictions, Fig. S2 provides two different indicators of feature importance. Fig. S2a (left panel) indicates the unconditional correlation between the true wealth label and each individual feature, calculated as the R 2 from a univariate regression of the wealth label on each single feature (each row is a separate regression; with 56 countries, there are 56 R 2 values that form the distribution of each boxplot). Fig. S2b (right panel) indicates the model gain, which provides an indication of the relative contribution of each feature to the final model (specifically, it is the average gain across all splits in the random forest that use that feature)(40). In general, we find that data related to connectivity, such as the number of cell towers and mobile devices in a region, are the most predictive features; nightlight radiance and population density are also predictive. While no single feature derived from satellite imagery is especially predictive in isolation, the large number of satellite features collectively contribute to model accuracythis can be seen most viii In an extreme example, imagine a single town that covers two adjacent grid cells. If one of the grid cells is in the training set and the other is in the testing set, a flexible model could simply learn to detect the town and predict its wealth. This sort of overfitting is not addressed by standard k-fold cross-validation. directly in Fig. S7a , which compares the predictive performance of models with and without satellite imagery. To produce the final maps and micro-estimates, as well as the public dataset, we pool data from all 56 countries and train a single model using spatially-stratified cross-validation to tune the model parameters. ix This model maps 112-dimensional feature vectors to wealth estimates. We then pass the 112-dimensional feature vector for each 2.4km grid cell located in a LMIC through this trained model to produce an estimate of the relative wealth (RWI) of each grid cell (Fig. 1) . We use the World Bank's List of Country and Lending Groups to define the set of 135 lowand middle-income countries. x Since we do not normalize these predictions at the country level after they have been generated, we do not expect that each country will have the same within-country RWI distribution (i.e., the amount of "bright" and "dark" spots will differ between countries). To help preserve the privacy of individuals and households, we do not display wealth e stimates for 2.4km regions where existing population layers indicate the presence of 50 or fewer individuals in the region (37). Instead, we aggregate neighboring 2.4km tiles (by taking the population-weighted average RWI) until the total estimated population of the larger area is at least 50. The "neighbors" of a tile are those tiles that fall within the larger tile, using the tile boundaries defined by the Bing tile system. xi All of the neighboring 2.4km cells in the larger tile are then assigned the same estimate of RWI (i.e., the population-weighted average). Our main objective is to produce accurate estimates of the current, cross-sectional distribution of wealth and poverty within LMICs. In training the machine learning model described above, we thus use the most recently available version of each data source. The ground truth wealth measurements cover a wide range of years (Table S1) ; the input data are primarily generated in 2018 (Table S2) . This often creates a mismatch between the dates of the input variables and the survey labels for a given region. In practice, this means that our estimates are best at capturing within-country variation in wealth that does not change over a relatively short time horizon (i.e., between the prior survey date and 2018). Analysis of DHS data from LMICs with multiple surveys suggest a high degree of persistence in the within-country variation in wealth (Fig. S11) xii . Still, this approximation likely introduces error into our model, and suggests that these ix In robustness analysis, we separately constructed complete micro-estimates for all LMICs in which the estimates for all countries without DHS surveys were based on the full model trained on pooled data from the 56 countries with DHS surveys; then, in each of the 56 countries with DHS surveys, we replaced the pooled estimates with the estimates from a model trained exclusively with data from the target country. We find that the average accuracy of this alternative approach (R 2 = 0.54, using spatial CV) is nearly identical to the pooled approach (average R 2 = 0.56, using spatial CV). x We use the 2018 version of this list, which includes countries whose Gross National Income per capita was less than $4,045. See https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-andlending-groups xi The 2.4km estimates correspond to Bing tile level 14; the next largest tile, Bing tile level 13, defines 4.8km grid cells, and so forth. xii Across the 33 countries with two or more DHS surveys conducted since 2000, the median R 2 between regional (admin-2) wealth estimates from the most recent DHS survey and the preceding DHS survey is 0.81. estimates are better suited toward applications that require a measure of permanent income than to applications that require an understanding of poverty dynamics. More broadly, see this model's performance as a benchmark that can be improved upon as more input and survey data become available. In an ideal world, we would obtain historical input data from the same years in which each survey was conducted. Unfortunately, historical versions of most of the input data described in Table S2 do not exist. Alternatively, we could restrict our analysis to input data that do exist in a historical panel. However, as shown in Fig. S7a , excluding key predictors substantially limits the model's predictive accuracy. Another option would be to only train the model using more recent surveys. In Fig. S9a , we observe that the accuracy of a model trained on the subset of 24 countries that conducted DHS surveys since 2015 is quite similar to the performance of a model trained on all 56 countries with DHS data since 2000. Related, when we validate the model's performance using independently collected census data (see below for details), we find no evidence to suggest that a shorter gap between the date of the DHS training data and the data of the census increases the predictive accuracy of the model (Fig. S10 ). We validate the accuracy of the ML estimates using census data that are collected independently from the DHS data used to train the models. Specifically, we obtain census data from all countries with public IPUMS-I data, where the census occurred since 2000 and where asset data are complete (41). In total, these data cover 15 countries on 3 continents, and capture the survey responses of 27 million individuals (Table S3) . We assign each of these individuals a census wealth index by taking the first principal component of the 13 assets present in the census data. This list is similar to the DHS asset list, but excludes data on motorcycles and rooms in the household. As with the DHS data, the PCA eigenvectors are computed separately for each country. Finally, we compute the average census wealth index over all households within each second administrative unit, the smallest unit that is consistently available across countries. Of the 1,003 census units, 979 have households with wealth information and also contain a 2.4km tile with a centroid inside the unit. (Table S3) ; We further validate the accuracy of the ML estimates at the finest possible spatial resolution by comparing them to three independently-collected household surveys in Togo, Nigeria, and Kenya. In each case, we obtain the original survey data for all households, as well as the exact GPS coordinates of each surveyed household. As with the census data, none of these datasets were used to train the ML model; they thus provide an independent and objective assessment of the accuracy and validity of our new estimates. Togo. As part of the 2018-2019 Enquete Harmonisee sur les Conditions de Vie des Menages (EHCVM), the government of Togo conducted a nationally-representative household survey with 6,172 households. xiii A key advantage of these data is that, in addition to observing a wealth index for each household (calculated as the first principal component of roughly 20 asset-related questions), we observe each household's exact geo-coordinates (Fig. 4a ). The 6,172 households are located in 922 unique 2.4km grid cells (which correspond to 260 unique cantons, the smallest administrative unit in Togo), of the 9,770 total grid cells in the country. We also note that there is nothing Togo-specific in how the ML model is trained: we simply use the estimates generated by the final model that is trained using spatially-stratified cross-validation from all 56 countries with DHS data (also shown in Fig. 1 ). Kenya. We also validate the accuracy of the grid-cell RWI estimates using GPS-enabled survey data collected in the Kenyan counties of Kilifi and Bomet (Fig. 4g) . These data were collected by xiii See https://inseed.tg/ xiv Borno State was excluded for security concerns. See https://www.nigerianstat.gov.ng/nada/index.php/catalog/64 GiveDirectly, a nonprofit organization that provides unconditional cash transfers to poor households in East Africa. xv When GiveDirectly works in a village, they conduct a socioeconomic survey with every household in the village. The survey includes a standardized set of 10 questions that form the basis for a Poverty Probability Index (PPI) xvi , which GiveDirectly uses to determine which households are eligible to receive cash transfers. GiveDirectly also records the exact geocoordinates of each household that they survey (Fig. 4g) . Fig. 4i compares estimates of micro-regional wealth based on GiveDirectly's household PPI census to corresponding estimates of wealth based on the ML model. We calculate the average PPI score of each 2.4km grid cell by taking the mean of the PPI scores of all households in the grid cell. We compare this to the predicted RWI from the ML model. Across the 44 grid cells shown in Fig. 4h (10 from region 1; 26 from region 2; 8 from region 3), the predicted RWI explains 21% of the variation in PPI (Pearson's r = 0.46). Within each region, the correlation between PPI and RWI ranges from 0.41 -0.78. While the ML model explains less of the variation in Kenya than it does in Togo, Nigeria, or in the 15 census countries, this is a much more stringent test. This is because the comparison is being done across 44 spatially proximate units (Figure 4h ) in 3 small and relatively homogenous villages. Within these villages, there is less variation in wealth than there is across an entire country (the variance in RWI across the 44 cells is 0.05; across all of Kenya the variance is 0.10). Our other tests -and indeed all prior work of which we are awaremeasure R 2 across entire countries. The Kenya test is also handicapped by the fact that the Kenyan PPI is not strictly a wealth index, containing questions about education, consumption, and housing materials. Measures of wealth and poverty are quite sensitive to the measurement instrument used. xvii To our knowledge, this is the first attempt to compare estimates of micro -regional wealth, based on variation within single villages, to independently-collected household survey data where the exact location of each surveyed household is known. We therefore find it encouraging that the predicted RWI roughly separates wealthier from poorer neighborhoods within these small regions. The primary intent of the model is to produce estimates of wealth in LMICs, and it is from LMICs that we source all of the ground truth data used to train the model. For completeness, we assess the performance of the model's predictions in high-income nations. This comparison is imperfect, because high-income nations do not typically collect asset-based wealth indices, which is what the ML model is trained to estimate. Instead, we compare the Absolute Wealth Estimates (AWE) of the ML model (see below for details on how these are constructed) to independently-produced data on regional Gross Domestic Product per capita (GDPpc) from 30 member nations of the Organisation for Economic Co-operation and Development (OECD). xv http://www.givedirectly.org/ xvi See https://www.povertyindex.org/country/kenya xvii For instance, Filmer and Pritchett find that, even within a single survey, the Spearman rank correlation between an asset index and a measure of consumption expenditures ranges from 0.43 (in Pakistan) to 0.64 (in Nepal). 21 These data are collected by the National Statistical Offices of each respective country, through the network of Delegates participating in the Working Party on Territorial Indicators. xviii In each country, we obtain the OECD's estimate of the average GDPpc of each 'small' (TL3) region. xix We separately calculate the AWE of each region by taking the population-weighted average AWE of all 2.4km grid cells in the region. Fig. S5a shows a scatterplot of these 1540 administrative units, sized by population, where the x-axis indicates the OECD-based measure of wealth of the administrative unit and the y-axis indicates the population-weighted average predicted AWE of the administrative unit. Fig. S5b shows the accuracy of the model in each of the 30 countries. The average population-weighted R 2 across the 30 countries is 0.50; the population-weighted regression line R 2 = 0.59 (obtained when pooling the 1540 regions from all 30 countries). We note that the AWE values are generally larger than the OECD estimates of GDPpc (the slope of the regression line in Fig. S5a is 1.35) . This is likely due to the fact that the GDPpc estimates used to construct AWE (sourced from the World Bank) are consistently higher than the GDPpc estimates sourced from the OECD. This comparison is made in Fig. S5c , where we compare, for the 30 OECD nations, the relationship between the World Bank estimate of GDPpc and the average regional GDPpc based on OECD data (the slope of the regression line Fig. S5c is 1.66) . In many applied settings, it is important to have not just a point estimate of the wealth of a particular location, but also to have an understanding of the uncertainty associated with each point estimate. We are encouraged by the fact that we do not find evidence that the model performs any worse in poorer regions (Fig. S12 ), as occurs with nightlights data (16) . Disaggregating this error, we find that model error is lower when the target country is near to many countries with ground truth data used to train the model, and when there are many training observations nearby. This can be seen in Table S4 , where we estimate the error of each individual 2.4km location l by fitting a linear regression of the model's residual at l (in the locations with ground truth data) on observable characteristics of l. We selected a broad set of observable characteristics that include: all of the features used in the predictive model (with the exception of the imagery-based features); how much "ground truth" training data was available near the spatial unit (such as the distance to the nearest DHS cluster); and country-level characteristics (such as average GDP per capita and continent dummy variables). We then regress the model error, in RWI units, of grid cell l on the l's vector of observable characteristics. We show the correlates of model error in Table S4 , column 1. To better understand the sensitivity of these error estimates, we re-estimate the results in column 1 of Table S4 using different subsets of available predictors. Columns 2 and 3 of Table S4 indicate that while the point estimates depend somewhat on the other variables included in the regression, the qualitative patterns are the same. More importantly, we observe that the actual error estimates (for any given location l) are not very sensitive to the variables included in the model. For instance, Fig. S14 compares the error predicted by the model in column 1 of Table S4 (x-axis) against the error predicted by the two alternative specifications in columns 2 -3. Fig. S14a shows the correlation between the median error of a country under the original specification and the median error of a country using a new specification that also includes the 100 satellite imagery features as predictors (r = 0.770). Fig. S14b shows the correlation between the median country error under the original model and a model that only includes the set of features that were not used to estimate RWI (r = 0.773). More broadly, Fig. S6 and Fig. S13 indicate that models trained with data from a single country perform best when applied to countries with similar characteristics. To construct Fig. S13 , we calculate the cosine similarity between all pairs of countries based on the country-level attributes listed in Table S4 . xx Our objective in constructing the micro-estimates of model error is to provide policymakers and other users with a sense of where the model is accurate and where it is not. Fig. S3b provides a granular map of expected model error. We also provide country-level summary statistics of model error in Table S5 (i.e., the mean, median, and standard deviation of estimated model error in each country), to provide policymakers in specific countries with at-a-glance estimates of model performance. The predictive models are trained to estimate the Relative Wealth Index (RWI) of each 2.4km grid cell. The RWI indicates the wealth of that location relative to other locations within the same country. However, certain practical applications require a measure of the absolute wealth of a region which can be more directly compared from one country to another. To provide a rough estimate of the absolute per capita wealth of each grid cell, we use the technique proposed by Hruschka et al. (2015)(42) to convert a country's relative wealth distribution to a distribution of per-capita GDP. This method relies on three parameters to define the shape of the wealth distribution: the mean GDP per capita, as a measure of the central tendency ( ); the Gini coefficient, as a measure of dispersion ( ); and a combination of the Pareto and log-normal distributions that are used to estimate skewness. Specifically, our Absolute Wealth Estimate (AWE) of a grid cell i in country c is defined by: xx Specifically, the features are: area, population, island, landlocked, distance to the closest country with DHS, number of neighboring countries with DHS, GDP per capita, and Gini coefficient. where is the rank of each grid cell's RWI (relative to other cells in c), is the mean wealth per capita of c, and is the inverse cumulative distribution of wealth, which is parameterized exactly following Hruschka et al. (2015) . xxi We collect indicators of each country's Gini coefficient and mean per capita GDP from the sources listed in Table S6 , and use it to produce the Absolute Wealth Estimates (AWE) shown in Fig. a. This conversion requires strong parametric assumptions about the national distribution of wealth based on information about the average wealth and wealth inequality in each country. These assumptions are not justified in many countries, particularly where Gini estimates are unreliable, or when the ICDF approximation is a poor fit. Thus, the AWE estimates should be treated with more caution than the RWI estimates, which were carefully validated with several different sources of independent survey data. Fig. S15 shows the global distribution of (predicted) absolute wealth, as derived from the Relative Wealth Index using the above procedure. The figure compares the predicted wealth distribution based on our method to the global income distribution in 2013, as independently estimated by Hellebrandt and Mauro (2015)(43) using household income surveys for more than a hundred countries that were collected through the Luxembourg Income Study. As expected, the average wealth distribution, which is a measure of per capita GDP, is uniformly higher than the estimated income distribution, which reflects actual family incomes (i.e., total economic output does not translate directly to better family outcomes). To illustrate one practical use case for these micro-estimates, we simulate the scenario in which an anti-poverty program administrator has a fixed budget to distribute to a country's population. Following Ravallion (25) and Elbers et al. (2) , we assume that the program will be geographically targeted, such that all individuals within targeted regions will receive the same transfer. Our analysis compares the performance of several different approaches to geographic targeting in Togo (Table S7) and Nigeria (Table S8) , with a subset of these results summarized in Table 1 . Performance is evaluated using recent nationally-representative household survey data collected in each country (see above for a description of the EHCVM and NLSS datasets used to evaluate targeting outcomes). In both Table S7 (for Togo) and Table S8 (for Nigeria), Panel A simulates geographic targeting using the high-resolution ML estimates. The first row simulates a scenario in which cash is transferred to households located in the poorest 2.4km tiles of the country; the second row simulates distribution to the households located in the poorest administrative units of the country (the canton is the smallest administrative unit in Togo and the ward is the smallest administrative xxi For the Pareto distribution, is the inverse cumulative distribution function with shape parameter unit in Nigeria), where the wealth of the administrative unit is calculated as the populationweighted average of the RWI of all tiles in that unit. The first column indicates the number of unique tiles in each country; the second and third columns simply indicate that every spatial unit (tile or canton/ward) has a corresponding wealth estimate. Column 4 indicates the number of spatial units for which ground truth data exist (in the EHCVM or NLSS), and column 5 counts the number of spatial units for which both ML estimates and ground truth data exist. Column 6 indicates the number of households that exist in those spatial units for which there are both ML estimates and ground truth data. This set of households is then used to measure the correlation between the ground truth wealth of each household (i.e., "true wealth") and the ML estimate of the wealth of the spatial unit in which that household is located (i.e., "predicted wealth"), which is reported in Column 7. xxii In subsequent columns, we assume that the government has a fixed budget which allows it to only target 25% or 50% of the population. We consider the "true poor" to be the 25% or 50% of households in the ground truth survey with the lowest household asset index. In Panel A, the targeting mechanism we simulate selects the 25% or 50% "predicted poor" households, where the prediction is based on the ML estimate of wealth assigned to the spatial unit in which each household is located. In instances where including one additional spatial unit would imply that more than 25% or 50% of households would receive benefits, households from that region are randomly selected to ensure that exactly 25% or 50% of households receive benefits. Columns 8 and 9 report the accuracy of this targeting mechanism; columns 10 and 11 report the precision and recall. xxiii For comparison, Panels B-D simulate alternative geographic targeting approaches that a policymaker might rely on in the absence of comprehensive household-level data on poverty status, as is the case in many LMICs (44). In these simulations, we assume that the policymaker does not have access to the ML micro-estimates of RWI or the ground truth data from the EHCVM/NLSS that is used to evaluate their allocation decisions. Instead, the policymaker designs a geographic targeting policy based on the most recent DHS survey, which was conducted in 2018 in Nigeria and 2013-14 in Togo. In Panel B, each row corresponds to targeting at a different level of geographic aggregation. For instance, the row labeled "prefecture average" in Panel B of Table S7 assumes that the program will be targeted at the prefecture level, the 2 nd -level administrative region in Togo, such that either all households in the prefecture will receive benefits or none will. Subsequent rows allow for targeting at smaller geographic units. The columns in Panel B are organized similarly to Panel A. Note, however, that it is no longer the case that each spatial unit will necessarily have a "predicted wealth" value. For instance, in the Canton targeting row of Panel B (Column 2) indicates that only 185 cantons have one or more surveyed households in the most recent DHS (i.e., only 47.8% of all cantons). Columns 4-6 are analogous to Panel A. In Column 7, the "predicted wealth" of each household is the average wealth of all households in that region from the most recent DHS. In subsequent columns, the targeting mechanism selects the 25% or 50% xxii This table reports the correlation in wealth at the household level, with one observation per household, using the household survey weights in the EHCVM/NLSS. This approach is most consistent with the targeting simulations, which require that the policymaker estimate each household's wealth. This approach is different than that taken to construct Fig. 4 , which shows correlations at the tile level, with one observation per tile, which is consistent with the earlier objective of evaluating the accuracy of the ML estimates at the geographic level. xxiii Precision and Recall are always equal in these targeting simulations because the fixed budget co nstraint implies that each additional targeting error creates exactly one new false positive and one new false negative. "predicted poor" households, where the prediction is based on the average wealth of all households in that region from the most recent DHS. Panel C simulates targeting in a similar manner to Panel B, with one important difference: In cases where a geographic unit has no surveyed households in the most recent DHS (e.g., 52.2% of all prefectures in Togo), we impute the wealth of that geographic unit by taking the average DHS RWI of all households in the geographic unit closest to the household i. The imputation on Panel C addresses a fundamental limitation of Panel B, which would otherwise leave policymakers without a mechanism to determine budget allocation in large regions of the country where survey data do not exist. Panel D simulates a "nearest neighbor" approach to targeting, where the wealth of a household i is inferred based on the average wealth of the households in the DHS cluster physically closest to i, irrespective of whether those nearest neighbors are located in the same administrative unit as i. The targeting simulations highlight three main results. First, the ML estimates allow for geographic targeting at a level of spatial resolution that would not be possible with traditional survey-based data. As highlighted in prior work (25, 26, 2) , geographic disaggregation can produce substantial welfare gains. The gains to disaggregation are quantified in the last several columns of Table S7 and Table S8 , which highlight how targeting at the tile level increases both precision and recalli.e., it reduces both errors of exclusion and errors of inclusionrelative to the other targeting options that provide 100% coverage. xxiv In practice, it may be logistically challenging to deliver benefits to such small geographic units, but recent and ongoing work that uses mobile money to deliver cash transfers directly to beneficiaries suggests that this type of approach may soon become feasible (1). Second, even if the delivery of benefits will be based on administrative divisions, we find that admin-region targeting based on the ML estimates performs at least as well asand often better thanadmin-region targeting based on recent nationally representative household surveys (i.e., the comparison of the last row of Panel A to the last row of Panel C or Panel D). This is because the ML estimates can be used to construct accurate estimates of the wealth of 100% of administrative units. By contrast, the DHS only surveyed households in 185 (47.8%) cantons in Togo, and only 1218 (13.8%) wards in Nigeria. Thus, a geographic targeting approach relying on the DHS data alone would either require implementation at a larger administrative unit, or would require some other form of imputation into unsurveyed regions (as is the case in Panels C and D) both of which adjustments reduce the effectiveness of geographic targeting. Third, and echoing previous results, the ML estimates are accurate at estimating household wealth (column 7 of Table S7 and Table S8) , and are at least as accurate as household wealth estimation based on recent DHS data. In this sense, Table S7 and Table S8 provide a conservative estimate of the gains from using the ML estimates for geographic targeting. Many LMICs do not have a recent nationally representative household survey available; for instance, xxiv In Panel B of Table a, the "canton targeting" approach slightly outperforms the tile-level targeting, but as we discuss below, the approach described in Panel B could not be used to target the majority of cantons in Togo, since only 47.8% of cantons contain households that participated in the DHS. only 24 of 135 LMICs have conducted a DHS since 2015. For such countrie s, these microestimates create options for geographic targeting that might otherwise not exist. Finally, we note that the above discussion compares universal geographic targeting using the ML estimates to universal geographic targeting using recent DHS data, such that all individuals in a targeted region receive uniform benefits. In practice, most real-world programs are more nuanced, and rely on additional targeting criteria (such as proxy means tests and participatory wealth rankings) to determine program eligibility. These additional criteria would be expected to increase the performance of all methods listed in Table S7 and Table S8 ; we do not simulate those changes to better highlight the gains from geographic disaggregation. Table S2 . Box plots indicate median (center line), interquartile range (shaded box), and 1.5x interquartile range (whiskers). We separately calculate, for each of the 30 OECD countries with available GDP data, the R 2 that results from regressing predicted AWE on GDPpc, across all admin-2 regions within each country. c) The estimate of a country's GDPpc from the World Bank, which forms the basis for the AWE estimates, is generally larger than the average regional GDPpc as reported in the OECD data. Values on axes represent thousands of US Dollars. For each of the 56 countries with ground truth wealth data, a separate model is trained using data from just that country (the columns in the above matrix). Those models are then tested on previously unseen data from each of the countries (the rows in the matrix). Colors indicate the R 2 between the model's predictions and ground truth. Models generally perform better on nearby and similar countries. Rows and columns are ordered using a hierarchical clustering algorithm (UPGMA). The distribution of performance across the 56 LMIC's, measured using spatially-stratified cross-validation, is shown as three kernel density plots, one for each subset of input data. The legend reports the average performance (R 2 ) in black, and the average performance using standard cross-validation in red (to facilitate comparison to prior work). b) Scatter-plot shows relationship between the actual wealth index (from survey data) and the predicted wealth index (output by the model), using all 66,819 labeled survey locations on four continents (AF=Africa, AM=Americas, AS=Asia, EU=Europe). Table S6 . b) Coefficients and standard errors from a regression of the country-level R 2 on country-level characteristics, for the 56 countries with ground truth data, indicates that model performance is slightly worse in upper middle-income countries (relative to the omitted category of lowermiddle income countries, but is not significantly different in low-income countries or specific continents. Table S4 , and plotted on the x-axis of both figures) to two alternative model specifications (plotted on the y-axis). Each dot represents a country. a) Alternate model includes 100 satellite-based features (defined by column 2 of Table S4 ). b) Alternate model limited to only features not used to predict RWI (defined by column 3 of Table S4 ). Orange line shows the global income distribution in 2013, based on household income surveys for more than a hundred countries. Blue line shows the distribution of predicted "absolute wealth", a measure of per capita GDP, which is derived from the Relative Wealth Index that is the focus of this paper. Table S6 . *significant at 10 percent; ** significant at 5 percent; *** significant at 1 percent. Table S8 | Targeting simulations in Nigeria. a) Panel A simulates the performance of an anti-poverty program that geographically targets households in the poorest 2.4km tiles in Nigeria, using the ML estimates of tile wealth. Panels B and C simulate the geographic targeting of households in the poorest states (admin-2), Local Government Areas (LGAs, admin-3), and wards (admin-4), where the most recent DHS survey is used to estimate the average wealth of each administrative region. Panel B ignores households regions wi th no DHS surveys; Panel C assigns such households the average wealth of the geographic unit closest to the household. Panel D simulates targets poor households where the wealth of a household is inferred based on the average wealth of the households in the DHS cluster physically closest to it. Simulations are evaluated using the 2019 NLSS survey. Machine learning can help get COVID-19 aid to those who need it most Poverty alleviation through geographic targeting: How much does disaggregation help Impact of Emerging Markets on Marketing: Rethinking Existing Perspectives and Practices Measuring Poverty in a Growing World (or Measuring Growth in a Poor World). The Review of Economics and Statistics Data deprivation : another deprivation to end That's why the UN should push countries to gather and share data The Political Economy of Bad Data: Evidence from African Survey and Administrative Statistics Data for Development: A Needs Assessment for SDG Monitoring and Statistical Capacity Development Micro-Level Estimation of Poverty and Inequality A global poverty map derived from satellite data Measuring Economic Growth from Outer Space Using luminosity data as a proxy for economic statistics Predicting poverty and wealth from mobile phone metadata Using Advertising Audience Estimates to Improve Global Development Statistics Combining satellite imagery and machine learning to predict poverty Proceedings of the Ninth ACM/IEEE International Conference on Information and Communication Technologies and Development Poverty from space : using high -resolution satellite imagery for estimating economic well-being Using publicly available satellite imagery and deep learning to understand economic well-being in Africa Mapping poverty using mobile phone and satellite data Combining disparate data sources for improved poverty prediction and mapping Don't forget people in the use of big data for development Social Protection and Jobs Responses to COVID-19: A Real-Time Review of Country Measures Sourcebook on the Foundations of Social Protection Delivery Systems Poverty alleviation through regional targeting: a case study for Indonesia. The economics of rural organization The Welfare Returns to Finer Targeting: The Case of The Progresa Program in Mexico 1m Nigerians to benefit from COVID-19 Cash Transfer, Osinbajo says. The Guardian (2021) A Clever Strategy to Distribute Covid Aid-With Satellite Data Geographic thought: a praxis perspective Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining Estimating Economic Characteristics with Phone Data Proceedings of the European Conference on Computer Vision (ECCV) The Economics of Poverty: History, Measurement, and Policy Estimating Wealth Effects Without Expenditure Data-Or Tears: An Application To Educational Enrollments In States Of India* Gini from closest neighbor, based on orthodromic distance, is used instead Table S6 | Sources of country-level data. While most of the country-level statistics come from the World Bank's Open Data portal, when the required indicators are missing we use data from the alternative data sources listed above. Sources below.