key: cord-0787342-74pd9tco
authors: Ramiadantsoa, Tanjona; Metcalf, C. Jessica E.; Raherinandrasana, Antso Hasina; Randrianarisoa, Santatra; Rice, Benjamin L.; Wesolowski, Amy; Randriatsarafara, Fidiniaina Mamy; Rasambainarivo, Fidisoa
title: Existing human mobility data sources poorly predicted the spatial spread of SARS-CoV-2 in Madagascar
date: 2021-12-03
journal: Epidemics
DOI: 10.1016/j.epidem.2021.100534
sha: 46a45c942a99d1155dea4744242993f403dec7e2
doc_id: 787342
cord_uid: 74pd9tco

For emerging epidemics such as the COVID-19 pandemic, quantifying travel is a key component of developing accurate predictive models of disease spread to inform public health planning. However, in many LMICs, traditional data sets on travel such as commuting surveys as well as non-traditional sources such as mobile phone data are lacking, or, where available, have only rarely been leveraged by the public health community. Evaluating the accuracy of available data to measure transmission-relevant travel may be further hampered by limited reporting of suspected and laboratory confirmed infections. Here, we leverage case data collected as part of a COVID-19 dashboard collated via daily reports from the Malagasy authorities on reported cases of SARS-CoV-2 across the 22 regions of Madagascar. We compare the order of the timing of when cases were reported with predictions from a SARS-CoV-2 metapopulation model of Madagascar informed using various measures of connectivity including a gravity model based on different measures of distance, Internal Migration Flow data, and mobile phone data. Overall, the models based on mobile phone connectivity and the gravity-based on Euclidean distance best predicted the observed spread. The ranks of the regions most remote from the capital were more difficult to predict but interestingly, regions where the mobile phone connectivity model was more accurate differed from those where the gravity model was most accurate. This suggests that there may be additional features of mobility or connectivity that were consistently underestimated using all approaches but are epidemiologically relevant. This work highlights the importance of data availability and strengthening collaboration among different institutions with access to critical data - models are only as good as the data that they use, so building towards effective data-sharing pipelines is essential.

Human mobility underlies the spatial patterns of many infectious diseases (Findlater and Bogoch, 2018; Grenfell et al., 2001; Kramer et al., 2016; Meloni et al., 2011; Tatem et al., 2006; Tizzoni et al., 2014; Wesolowski et al., 2016; Zhou et al., 2020) and will drive the dynamics of emerging epidemics. Quantifying travel patterns is key to predicting where and when the pathogen may spread and therefore to devising measures and policies to contain the epidemics (Wesolowski et al., 2015b; Charu et al., 2017) . As demonstrated by the COVID-19 pandemic (Badr et al., 2020; Chang et al., 2021; Kraemer et al., 2020; Nouvellet et al., 2021) , a broad range of travel from international trips to local commuting patterns drives the spatial spread of SARS-CoV-2. While data is increasingly being used to inform mobility patterns and inform predictive transmission models for public health planning (Grantz et al., 2020; Kishore et al., 2020; Oliver et al., 2020) , these data are often limited in low-and middle-income countries (Gupta et al., 2020) where routine data collection of mobility patterns may be sparse (Wesolowski et al., 2016 (Wesolowski et al., , 2015a .

Extrapolating generic patterns of mobility, e.g., weights in the gravity model, derived from data from High Income Countries (HIC) to Low & Middle Income Countries (LMICs) may be misleading (Wesolowski et al., 2016 (Wesolowski et al., , 2015a given greater subnational heterogeneity. For instance in Madagascar, beyond the road infrastructure, which is sparse and may be in poor condition (Fig. 1A) , there is one semi-functional railroad, and a handful of commercial flights directed mostly to tourist destinations. Standard travel estimates algorithms are vastly inaccurate. Moving is also highly dependent on local topography and road conditions and not only takes time but also is expensive. Although mobility data has the potential to shed light into how these limitations translate to realized mobility, high quality data on mobility are limited. There is no systematic digitized data for travelers, access to mobile phones is one of the lowest in the world (41% of the population at 11th place) (Mobile cellular subscriptions), and mobility data derived from the latter are not readily available. Yet the problem of understanding spatial spread of infectious diseases is persistent and there is a need to use data to inform decision-making around resource allocation.

Madagascar is a large island 400 km east of Africa with a population size of about 26 million (INSTAT-RGPH3) of which 78% earns less than 1.9 US dollar per day (UNDP-Multidimensional Poverty Index). The pandemic virus, SARS-CoV-2 was officially reported on the 20th of March 2020, with three imported cases from France arriving at the capital airport. One specific case was investigated after a tourist tested positive upon return in France prompting the first contact tracing efforts in Madagascar. About six months after the first official case, Melaky was the last of the 22 regions to have reported at least one confirmed case. Madagascar has undergone two waves with the second declared roughly one year after the first confirmed case (Fig. 1B) . At the time of writing (May 2021) a total of 40,474 cases have been reported. Because all airports were closed, minimizing the risks of importation, Madagascar is a near 'closed' system, making it an ideal setting in which to investigate the role of internal mobility in the spread of SARS-CoV-2.

Previous work in Madagascar has noted that the low numbers of reported cases has been attributed to delayed introduction due to quick closure of the international borders and small number of performed tests (Evans et al., 2020) . We extend this work here by providing an analysis of the spatial spread of COVID-19 using various measures of mobility to identify which source of mobility is best able to reproduce the spatial dynamics of the outbreak. Lack of an official compiled and accessible database on COVID-19 in Madagascar prompted us to develop a Madagascar-specific dashboard (covid19mg.org), which is used throughout our analysis (see below). We leverage a range of data and modeling tools to better understand the spread of SARS-CoV-2 in a data limited setting reflecting highly heterogeneous demographics, accessibility, and road networks throughout the country. 

SARS-CoV-2 Confirmed Case Data in Madagascar: Since there is no accessible national SARS-CoV-2 database, we compiled data communicated by the Ministry of Health of the Government of Madagascar on national television every day. These include the number of new cases (confirmed by PCR (Polymerase Chain Reaction)), severe clinical cases, deaths, recovered, and the number of tests (data accessible at: covid19mg.org). The detail and consistency of reporting for each category varies, however the number of cases per region was reported throughout the time period allowing us to reconstruct the spatial spread across the country. The reported cases used here only include those confirmed by detection of viral nucleic acids (RT-PCR (Real Time Polymerase Chain Reaction) tests and or geneXpert (Rakotosamimanana et al., 2020) ). Due to reporting delays, testing delays, and variable patterns of healthcare seeking behavior, we focused on the order in which cases were reported in each region.

Each region was ranked based on the date when the first and fifth cases were confirmed. The first confirmed case is the most obvious metric on the occurrence of the disease in that region and the spread across the island. However, because foreign tourists tested positive at the early phase of the pandemic, the first case might not necessarily reflect the mobility of Malagasy people (Ministry of Health). We chose the fifth case as an alternative and intermediate threshold to strengthen our results and investigate mobility patterns that are more likely to be non-tourist related travel. As some regions did not even reach the fifth confirmed case, this quantity reflects a balance between moving away from the possible biases inherent in using the first case, yet also capturing outcomes across regions. In the transmission model described below, we then used various mobility matrices to model the occurrence of the first and fifth cases to compare to the reported rank.

A mobility matrix describes how many individuals move from within and between regions per unit of time (Grenfell and Harwood, 1997) . Since our goal is to understand how cases spread across regions, we ignored mobility within a region. We used four types of mobility matrices. The first three matrices are based on the gravity model with various measures of distance (Erlander and Stewart, 1990) . The connectivity between region i and j is defined as , where and are scaling factors, N is population size and represents the distance between region i and j. The distance is either the Euclidean distance (referred to as Euclidean model) between the centroids of the two regions or average transit time between the regions (referred to as transit model). To estimate the average travel time between regions (excluding flying as it is not the primary mode of transportation), we interviewed national bus companies on the travel times between the capital (Antananarivo) and the capitals of each of the 21 remaining regions. Since Antananarivo is the primary hub of travel ( Fig. 1A) , we calculated trips between other regions by adding or subtracting the travel times to and from Antananarivo. Travel times were directly obtained for routes that do not pass through the capital (e.g., neighboring remote regions). We also varied the parameter (0.5, 1, and 1.5) giving a total of eight mobility matrices explored.

A third gravity-based model was used. The mobility matrix is the Internal Migration Flow (flow for short) accessed from the WorldPop project (worldpop.org). The Internal Migration Flow data was developed to study the spread of malaria where no migration data is available. In short, the model estimates the number of people moving between regions by fitting a gravity model extended to account for geographic and socioeconomic factors between 2005 and 2010 (more details in (Garcia et al. 2015; Sorichetta et al., 2016) ).

The fourth matrix comes from mobile phone data from Orange Madagascar, one of three main mobile phone operators in Madagascar, which records mobility traced by cell towers. Since current data is not available, we used data from a malaria study in 2015 (Ihantamalala et al., 2018) .

In practice, we used a hierarchical approach to calculate the number of individuals moving across the regions. First, we fixed the total number of individuals moving per unit of time (X). Then, we used a vector P = (P 1 , …, At each time step, we draw a random sample of S, E, I, and R individuals to move from region i to region j according to their respective frequency in that population, i.e., mobility is independent of whether individuals are susceptible, exposed, infected or recovered. We first specify the total number of individuals moving which are then randomly distributed across the regions using a multinomial distribution with parameters from the mobility matrix (see section above). We simulated the model until time T= 700 days and 100 replicates. Our analysis does not depend on the magnitude of time-step chosen for the simulation nor the total number of individuals moving (set to 10000), as we are comparing relative, rather than absolute arrivals in each region.

For each simulation using Analamanga region (capital) as the initial infected location, we ranked each region based on when the first and fifth cases occurred. We then compared the empirical and simulated rankings using both the cardinality of the matched rankings and Spearman rank correlations. Finally, we explored which regions were difficult to predict using the simulations using the root mean square error (rmse) of the simulated and reported rank.

In addition to the mechanistic model, we analyzed the statistical relationships between the mobility matrices and the order of arrival. We used the Network-Based Diffusion Analysis (NBDA) in the R statistical environment using the NBDA package v0.7.10 58. In network based diffusion analysis, the order in the regions (nodes) reported the first or the fifth case (acquire a trait) is compared to their position in the network to assess whether the trait is acquired through interactions with other nodes (Hoppitt et al., 2010) . The model fits a diffusion model to the reported data, more precisely it estimates a scalar (s) that controls the importance of the diffusion matrix (here the mobility matrix) to explain the order of acquisition. Significance is obtained by comparing the log-likelihood ratio between the fitted model and a null model where the scalar is set to 0.

The data compilation, metapopulation models, and figures were conducted with Mathematica 12.0 (Wolfram Research Inc.). All code is available in the repository https://github.com/ramiadantsoa/mobilityMada.

The timing of arrival of the 1 st case and the 5 th case yielded different rankings (Fig. 2) . For instance, although the Menabe (ME) and Diana (DI) regions had their first confirmed case in March, the 5 th case only occurred in July. Atsimo Atsinanana (AA) had less than five cases as of February 2021. Among the first five regions, the 1 st and 5 th case agree in three regions: Analamanga (AN), Atsinanana (AT), and Matsiatra Ambony (MA). For the 1 st case metric, the remaining regions were Menabe (ME) in the west and Diana (DI) in north whereas for the 5 th case metric , the remaining regions were Alaotra Mangoro (AL) in the east and Atsimo Andrefana (AD) and Anosy (AS) in the south. To assess the differences among the mobility matrices, we ranked the regions according to the number of individuals entering each region (Fig. 3) . The gravity models are quite similar, the Spearman correlation between the ranks are 0.98 between the Euclidean and the Internal Migration Flow and 0.88 between the Euclidean and the transit models. The gravity model based on Euclidean distance ranks the eastern regions in the central highland higher whereas the transit model ranks the southern regions higher (Fig. 3AB) . Although the east is indeed closer, the terrain is steep and windy, lengthening trip duration. The Internal Migration Flow provides a similar ranking except that it ranks Atsinanana (AT) higher, which is the second largest economic region in Madagascar, and also includes Atsimo-Andrefana (Fig. 3C) . The mobile phone model is markedly different and is heterogeneous -the correlation with the Euclidean based gravity model is 0.26. The model ranks more highly the remote northern and southern regions Diana (DI) and Androy (AY) (Fig. 3D) . Whether we looked at the overlap of the first five regions or the Spearman rank correlation for all regions, all mobility matrices better predicted the 5 th case than the 1 st case ( Fig. 4 and Fig. S2 ). When predicting the first five regions for the 5 th case, the mobile phone model performed best (mean = 1.9 regions corrected predicted, Fig. 4A) . Surprisingly the null model has a higher mean number of regions correctly predicted than the gravitybased mobility when predicting the first five regions for the 1 st case (1.5 vs 1.0, 1.0, 0.9 for the Euclidean distance, transit time, and Internal Migration Flow). The Internal Migration Flow model had the worst performance among the mobility matrices investigated (Fig. 4A) . We also looked at the first ten regions and the results are quite similar except that the gravity models performed better than the null-model (Fig. S2) .

In comparing simulated versus reported ranks, the overall median Spearman rank correlation was highest for the model using the gravity based on Euclidean distance (0.56), followed by the gravity based on transit time (0.53), the mobile phone model (0.4), and the Internal Migration Flow (0.38) (Fig. 4B ). Increasing the exponent in the gravity matrix improved the predictive ability of the metapopulation model (Fig. S2 for , the median of the distribution of the correlation increases from 0.45, 0.53, to 0.56 for the gravity based on Euclidean distance and from 0.44, 0.49, to 0.53 for gravity based on transit time). Interestingly, the highest correlation for the 5 th case among all replicates was with the null model with a value of 0.81.

A) The overlap (mean, minimum, and maximum number of regions correctly predicted) according to the first five regions reporting infection. B) The distribution of the Spearman rank correlation between each replicate and the reported rank. N, E, T, F, and M denote respectively the null model, the gravity model based on Euclidean distance between centroids, the gravity model based on transit time, the Internal Migration Flow, and the mobile phone model, respectively. The reported ranks are either based on the 1 st (red) or the 5 th (blue) confirmed case (Analamanga is excluded as it was used as the initial condition).

Given the challenge in predicting the order of the reported cases, we investigated if some regions are more difficult to predict than others. Overall, the southern area of the country was consistently the most difficult to predict (Fig. 5) . However, for all other areas of the country, there were no consistent patterns by the type of model. 

Finally, we compared the performance of each mobility matrix with a statistical approach. Table 1 shows the estimated parameter, s, representing the importance of the matrix in explaining the pattern, and the significance value. For the 1 st case, the models are quite similar showing intermediate s (0.5 < s < 0.8) but none are significantly different from a null model. For the 5 th case, the results are inconsistent with regards to the best mobility matrix compared to predictions of the mechanistic model. The gravity model based on transit performs poorly with the lowest s, while the Internal Migration Flow model performs best, but is not statistically significantly better than the null model. Overall, only the mobile phone model was significantly associated (p<0.02) with the order of detection, and only for the 5 th case. Complete results for all mobility matrices are shown in Table S1 . 

Understanding what types of mobility data and models can best predict spatial dynamics of infectious diseases, and particularly emergent pathogens, could importantly contribute to allocating scarce resources, prioritizing where improvements in healthcare and surveillance will be vital, and estimating the possible pace and severity of the epidemic (Grenfell et al., 2001; Rice et al., 2021; Tatem et al., 2006; Zhou et al., 2020) . Often in low and middle income settings, there are few data sets on human travel available (Wesolowski et al., 2016) , and limited surveillance data to estimate spatial dynamics directly from the pattern of cases (e.g., as in (Bjørnstad and Grenfell, 2008) ). Here, we leverage a range of possible data sources on human mobility in order to better understand the spread of SARS-CoV-2, by integrating matrices describing mobility between regions into a spatial model of SARS-CoV-2 and a network-based diffusion analysis. By comparing the simulated trajectories with data, we evaluate the ability of different measures of mobility to predict the spatial spread of SARS-CoV-2 in Madagascar.

A major challenge in approaches of this kind is data availability. First, from the side of infectious disease data, there is limited availability of case numbers, and although many major cities in Madagascar have uniquely detailed mortality records (Masquelier et al., 2019) with scale and scope adequate to detect major outbreaks data compilation and accessibility to the research and public health community have lagged. To fill this gap in the landscape of public health communication in Madagascar, we developed a dashboard by collating data from daily televised reports, and this is the data that we use in our analyses. The quality of the data on cases can thus only be as good as these available reports. For instance, daily reporting was interrupted between the 13 th of October 2020 and the 13 th of March 2021 and was replaced by weekly cumulative numbers.

Uncertainties in the case data will be of most concern if there are marked spatial differences in testing and reporting. Our analysis indicates large differences in the rank of the regions confirming the first case and the fifth case. The first reported case, especially the top five regions, is most likely driven by imported cases, while the fifth reported case is likely to emerge as a result of onward transmission. Importantly, different locations may have different probabilities of both early detection, and onward transmission. As an example, Menabe and Diana were among the first five regions to report a first case but then lagged before the fifth case was reported. These regions represent popular tourist destinations and access by air is easy. Perhaps in part due to this demographic, the first cases in these regions were thus quickly isolated, contact tracing was swiftly established, and thus chains of transmission were slow to develop, delaying the fifth reported case. One positive interpretation of this pattern is that with adequate testing and contact tracing (as we hypothesize could have occurred following detection among tourists) spread from imported cases could be quickly controlled. Rolling out and prioritizing these strategies early on can thus have had an impact on curbing disease spread. Typically, data on air travelers is digitized and detailed, and could be leveraged to identify most regions at risk of such early introductions.

Moving from availability of case data to considering availability of mobility data, there are also a set of challenges at this end. The first three mobility matrices that we use were formulated from gravity-based models with varying degrees of realism, encompassing for example the distance between two regions calculated using Euclidean distance between the centroid of the regions or the actual transit time from transport companies; while the fourth mobility matrix we use is directly based on mobile phone data. Our comparison focused on relative magnitudes of movement between regions rather than absolute magnitudes of movement, given the various uncertainties in the data available to develop a fully parameterized model. None of these approaches were able to correctly predict all of the reported patterns of spatial spread in Madagascar, although on average they all performed better than a null model. The best performing models were either the simplest (gravity with Euclidean distance) or extrapolated from data on mobility (mobile phone data). Adding realism in the gravity model with either transit data or Internal Migration Flows did not improve prediction. In a few rare replicates, the null model actually generated the most accurate predictions indicating the unpredictable nature of spread. Yet on average, the simplest gravity model most likely captures the core diffusion component of the process (infection ultimately spills into neighboring regions when the number of cases is high enough). Inference based on mobile phone data, despite being processed over five years ago (Ihantamalala et al., 2018; Wesolowski et al., 2016) , had the best performance, perhaps because it captures a wider diversity of connections than are commonly predicted by a gravity model (Ihantamalala et al., 2018; Oliver et al., 2020; Tizzoni et al., 2014; Wesolowski et al., 2016 Wesolowski et al., , 2015a Wesolowski et al., , 2015b . In fact, the simplest gravity and mobile phone models performed well in different (non-overlapping) regions, suggesting that they capture different important aspects of mobility. Notably, some approaches performed strikingly poorly (e.g., the internal migration flow model) indicating a need for caution in deciding what model or metric of mobility to use. Finally, the performance of the mobile phone data relative to other measures strongly suggests that accurate up-to-date measurements of mobility from this source (rather than the 2015 data we were compelled to use) might have opened the way to anticipating spread and reacting appropriately. Designing regulatory pipelines that efficiently enable sharing of detailed yet anonymous mobility from mobile phone companies in such times of crisis should be a priority in Madagascar, as it has been elsewhere Grantz et al., 2020; Kishore et al., 2020; Oliver et al., 2020) .

As most models did not reliably predict the rank of the timing of the first, nor fifth case per region (Figs. 4 and 5) this work is still some steps away from driving policy recommendations. In particular, all models largely failed to predict the spatial patterns in the South, less populated and connected area of the country, possibly because stochasticity and thus rare events are overwhelmingly important, and perhaps also because delays in testing and data reporting given the remoteness of the region blurred the signal in numbers of cases. A key direction for expanding this work is to identify where models and data provide reasonable predictions and where they do not. The analysis reported here provides a potential starting point for further sensitivity analyses that explores core drivers of expectations of outcomes given the topography of the network, providing general expectations for pathogen spread. Uncertainty in case data is a very general issue in developing a mechanistic understanding of infectious diseases (for example, case numbers often apparently paradoxically increase with vaccination coverage, but this is actually a result of concomitant improvements in surveillance (Prada et al., 2018) ). Various approaches to correcting for biases are available (Becker and Grenfell, 2017; Jarvis et al., 2021) , but transparency in data generation mechanisms is an essential component.

There are a number of caveats associated with this work. In particular, by focusing simply on region population size and connectivity patterns, the model simplifies a number of aspects that may be important to the pace of spread of SARS-CoV-2, such as within region dynamics (i.e., some regions may be more internally connected than others ), as well as interventions including travel bans and how these changed connectivity over the first months of spread. However, the better performance of the mobile phone model compared to the gravity-based model in both the mechanistic and statistical model suggests that the connectivity matrix used to link the regions is the core of the problem.

In this work, we leverage a range of possible data sources on human mobility and use a set of mechanistic and statistical models to explain the spatial spread of the COVID-19 in Madagascar. We had little success in reproducing the spread. The approximated mobility matrices poorly characterize mobility in Madagascar. Furthermore, uncertainties arising from testing to reporting might further complicate prediction of emergent pathogen's trajectories. Our analysis provides a first step for moving towards models that can capture the spread of an emergent pathogen. It also highlights the centrality of data availability and strengthening collaboration among different institutions with access to critical data -models are only as good as the data that they use -so building towards effective data-sharing pipelines is essential.

Authors Contributions: TR, BLR, CJEM, AW, FR conceived and designed the paper, TR and FR wrote and performed the analyses. SR gathered the data from the bus company, TR, BLR, CJEM, AW, FR, AHR, and FMR wrote the manuscript.

The authors declare no competing interests

Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study

SARS-CoV-2 (COVID-19) by the numbers

tsiR: An R package for time-series Susceptible-Infected-Recovered models of epidemics

Hazards, spatial transmission and timing of outbreaks in epidemic metapopulations

Aggregated mobility data could help fight COVID-19

Mobility network models of COVID-19 explain inequities and inform reopening

Human mobility and the spatial transmission of influenza in the United States

The Gravity Model in Transportation Analysis: Theory and Extensions

Reconciling model predictions with low reported cases of COVID-19 in Sub-Saharan Africa: insights from Madagascar

Human Mobility and the Global Spread of Infectious Diseases: A Focus on Air Travel

Modeling internal migration flows in sub-Saharan Africa using census microdata

The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology

(Meta)population dynamics of infectious diseases

Travelling waves and spatial hierarchies in measles epidemics

The need for COVID-19 research in low-and middle-income countries

Detecting social transmission in networks

Estimating sources and sinks of malaria parasites in Madagascar

Recensement Général de la Population et de l'Habitation

Measuring the unknown: an estimator and simulation study for assessing case reporting during epidemics

Measuring mobility to monitor travel and physical distancing interventions: a common framework for mobile phone data analysis. The Lancet Digital Health

The effect of human mobility and control measures on the COVID-19 epidemic in China

Spatial spread of the West Africa Ebola epidemic

Estimating cause-specific mortality in Madagascar: an evaluation of death notification data from the capital city

Modeling human mobility responses to the large-scale spreading of infectious diseases

Mobile cellular subscriptions (per 100 people) [WWW Document

Reduction in mobility and COVID-19 transmission

Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle

Improving measles incidence inference using age-structured serological data

GeneXpert for the diagnosis of COVID-19 in LMICs

Monitoring for outbreak-associated excess mortality in an African city: Detection limits in Antananarivo

Variation in SARS-CoV-2 outbreaks across sub-Saharan Africa

Mapping internal connectivity through human migration in malaria endemic countries

Global transport networks and infectious disease spread

On the Use of Human Mobility Proxies for Modeling Epidemics

Regional Mobility in Sub-Saharan Africa

Impact of human mobility on the emergence of dengue epidemics in Pakistan

Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: a modelling study using mobile phone data. The Lancet Digital Health

We thank Orange for sharing the original mobile phone data. We also thank Valerie Ranaivoson, Domoina Nadia Andriamamenosoa and Faly Aritiana Fabien for their assistance, support, and encouragement.Funding: AW is funded by a Career Award at the Scientific Interface by the Burroughs Wellcome Fund, by the National Library of Medicine of the National Institutes of Health (grant number DP2LM013102), and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health (grant number 1R01AI160780-01). CJEM, BLR and FR were supported by funding from the Centre for Health and Wellbeing, Princeton University.

J o u r n a l P r e -p r o o f Author statement: TR, BLR, CJEM, AW, FR conceived and designed the paper, TR and FR wrote and performed the analyses. SR gathered the data from the bus company, TR, BLR, CJEM, AW, FR, AHR, and FMR wrote the manuscript. All authors have contributed to and approved the manuscript and this submission.

 Traditional datasets on mobility are lacking in many LMICs  Proxy mobility matrices poorly characterize the spatial spread of the COVID-19 in Madagascar  Data availability and access is critical to improve model performances  Building effective data-sharing pipelines among institutions is essential