key: cord-0708710-iigjgeg5 authors: Guo, Feng; Huang, Yiping; Wang, Jingyi; Wang, Xue title: The informal economy at times of COVID-19 pandemic date: 2021-11-23 journal: China economic review DOI: 10.1016/j.chieco.2021.101722 sha: 1625849a0b562b37f548bbedb55566b37b948cdc doc_id: 708710 cord_uid: iigjgeg5 We provide a first view of vulnerable informal economy after the blows from COVID-19, using transaction-level business data of around 80 million offline micro businesses (OMBs) owners from the largest Fintech company in China and employing machine learning method for causal inference. We find that the OMBs activities in China experienced an immediate and dramatic drop of 50% during the trough. The businesses had rebounded to around 80% of where they should be seven weeks after the COVID-19 outbreak, but had remained at this level until the end of our time window. We find a larger disruption to the OMBs in urban areas, the female merchants and the merchants who were not grown up in the places where they conducted businesses. We discuss the implications for policy support to the most vulnerable, and highlight the importance to take full advantage of digital development to follow up the informal economy. Obtaining accurate statistics about the informal economy has become one of the most daunting challenges in most developing countries (Blackburn et al., 2012; Capasso and Jappelli, 2013) , especially examining the pain they felt at the times of big crisis such as COVID-19 pandemic. Even if the pandemic caused heavy blows to society as a whole, it would hit the informal workers disproportionately. For example, the street vendors mainly work in the services sector, are usually self-employed or informally employed without social insurance, and are mainly in micro and family enterprises. Measuring the size of the informal economy accurately and examining the impact that they felt is important for making effective economic policy decisions. In China, it is becoming possible due to the rapid digitalization of the economy, as paying with digital payment tools has become a daily occurrence. According to People's Bank of China, 1 the proportion of adults using digital payments was 82.39 percent in 2018. Domestic users of Alipay and WeChat Pay, China's two largest digital payment service providers, have exceeded 900 million. People use digital payment tools for online and offline transactions, and even small businesses such as street shops and peddlers have adopted digital payments such as Alipay, WeChat Pay, UnionPay, and others. The accumulation of digital information enables us to largely approximate the size of informal sector and gauge to what extent they have been hit by the COVID-19 pandemic. In China, mobility restrictions like lockdowns and social distancing measures have greatly contributed to the containment of the spread of the disease (Chinazzi et al., 2020; Tian et al., 2020) , and the numbers of domestic new cases came under control in March, 2020. However, aggressive countermeasures, such as stringent lockdowns, have also imposed tremendous economic costs on the country in the short run. China's economy shrank by 6.8 percent year-on-year in the first quarter of 2020, the first contraction in the past four decades. As a result, vulnerable groups are likely experiencing deterioration of their income and livelihood. China emerged from a two-month containment phase and moved into the mitigation stage by early April 2020. 2 In this article, we explore the impacts felt by China's offline micro businesses (OMBs) in the informal sector, which are mainly self-employed in the services sector and not able to work from home. Tens of millions of OMBs have been disproportionately affected by the pandemic and lockdown measures. First, OMBs operate largely in the informal services sector of the economy, and they are usually self-employed or informally employed without social insurance. Most OMBs employ low-skilled workers who could not work from home during the pandemic. Second, most OMBs survive hand-to-mouth, with limited savings and lack of access to unemployment benefits. They rely heavily on cash flows and short-term loans, due to limited savings. Those employed in the gig economy are particularly vulnerable to dramatic collapses of income and loss of livelihood. The negative effects on this group are likely to have a long-lasting impact and are a leading cause of persistent inequalities and low mobility (Lustig and Tommasi, 2020) . Using weekly data on around 80 million "QR code merchants" (Mashang in Chinese) from Ant Group (hereinafter "Ant"), the largest Fintech company in China, this article studies the initial impacts of COVID-19 on OMBs and their recovery since the virus has been largely under control. The QR code merchants are OMBs that collect payments via Alipay, a digital payment tool of Ant. One of the officially public statistic is that the registered individual business is around 100 million 3 , which is comparable to our sample including around 80 million OMBs. Our sample spans from December 31, 2019 to April 2, 2020 and the corresponding lunar calendar dates in 2018 and 2019. We use the lunar calendar dates to account for the seasonality in economic activities during the Lunar New Year. A notable day in COVID-19 control and prevention was January 20, 2020 (five days before the Lunar New Year), when human-to-human transmission of the coronavirus was confirmed and reported to the public. Thus, we use that day as the start of the outbreak and the event date in the following analysis. We define the periods before and after January 20 (December 26 in the lunar calendar) as the pre-and post-virus periods, respectively. We use the corresponding lunar calendar dates in 2018 and 2019 to define the pre-and post-virus periods. A simple year-on-year change in OMB activities would lead to mismeasurement of the real economic 3 https://mp.weixin.qq.com/s/qyGNRQTlQhWIMWvFJFguGw J o u r n a l P r e -p r o o f Journal Pre-proof impacts of the pandemic, because it is likely that the businesses were on a growing or decreasing path relative to the same period last year, if there were no COVID-19 pandemic. 4 We first predict the counterfactuals using a machine learning technique and further interpret the difference between the realized and counterfactual results as the causal impact of the pandemic (Athey, 2017) . Specifically, we merge OMB data with other economic, population, and geographic characteristics at the Thiessen-polygon level. 5 We predict the counterfactual activities of OMBs in the post-virus period in 2020 by modelling the relationship between the activities of OMBs in the post-virus period and the other feature variables. The feature variables include OMBs activities around the same period in the previous year, OMBs in the weeks before the COVID-19 outbreak in 2020, cross-section data as of December, 2019 such as economic environment, population, and geographics, and panel data like meteorological characteristics in 2020. The parameters are from the trained gradient boosting decision tree (GBDT) model using the data from 2018 and 2019, which reveals the relationship between the activities of OMBs in the post-virus period in 2019 and the feature variables. The difference between the predicted counterfactuals and the actual values in 2020 gives an estimate of the real impact of the pandemic on OMBs. We find a massive decline in the number and sales turnover of active OMBs in the post-virus period in China. The number and sales turnover of active merchants bottomed out in the second week of the post-virus period, January 31 -February 6, and remained in a downturn for the following three weeks. Relative to the counterfactual level estimated from the machine learning technique, the average weekly drops in the number and sales turnover of active merchants were around 50 percent, between January 31 and February 20, after which OMB activity started to rebound. As of early April, one month after the trough, OMB activities bounced back to around 80 percent of their counterfactual levels. In addition, we find that the announcements of government lockdown polies could explain a limited portion of overall decline of OMBs activities, and what matters most may plausibly be the voluntary containment measures. OMBs in urban areas bore the hardest hit. The largest weekly decline in the number of active 4 Please see Appendix B.1 for more details. 5 Please see section 2 and Appendix A for more details on Thiessen polygon. merchants was about 54 percent in urban areas, compared with a 41 percent contraction in rural areas. In addition, we see a simultaneous contraction of business activity during the entire month of February, followed by a nationwide synchronous recovery starting at the end of February, although there were regional variations in the spread of the virus. Female merchants saw drops of around 53 and 57 percent in the number and sales turnover of active OMBs during the trough, respectively, and the drops were about 5 and 9 percent larger than the average for the male merchants. The decrease in economic activities were larger for the outsiders, the owners who were not born in the province where they conducted businesses, with a larger drop of 7 percent in the number of active merchants than the natives who were managing businesses in their birth provinces. Generally speaking, the outsiders do not enjoy the same social benefits as the natives and are more vulnerable to shocks. Although there is fast-growing research on COVID-19, as far as we know, our paper is the first to study the impact on OMBs that would be disproportionately hard hit by the pandemic and lockdown measures. Other studies focus on the spread, containment, and economic and political consequences of COVID-19 and previous pandemics, given its significant damage (Atkeson, 2020; Baker et al., 2020; Barro et al., 2020; Chen et al., 2020; Eichenbaum et al., 2020; Fang et al., 2020) . Dai et al. (2020) examine how exposure of Chinese registered firms to the Covid-19 shock varied with a cluster index (measuring spatial agglomeration of firms in related industries) at the county level. However, few studies estimate the real impact on hard-hit informal workers that were particularly vulnerable to dramatic collapse during the COVID-19 pandemic. In addition, we contribute to the broader literature on informal economy. Informal businesses are inherently difficult to identify, because most of them are usually small-scale, frequently family-based, and perhaps low-productivity businesses with much informality and nearly no records of social security. However, informal economy such as OMBs contribute significantly to employment, especially in developing countries (La Porta and Shleifer, 2014; Maloney, 2004) . Informal workers in the Asia-Pacific region account for nearly 60 percent of nonfarm employment, ranging from around 20 percent in Japan to over 80 percent in Myanmar and Cambodia. 6 OMBs in China operate largely in the informal services sector, and most of them could not work from home during the pandemic. We identify informal business in the gig economy by taking advantage of their digital footprints, using data from the world's largest Fintech company. Although research on small and medium enterprises (SMEs) is growing rapidly, there are very few national datasets that provide information on the economic activities of micro businesses, especially of those that are not registered with the Industry and Commerce Department. We employ data on the weekly number and sales turnover of around 80 million QR code merchants from December 31, 2019 to April 2, 2020 and for the corresponding lunar calendar dates in 2018 and 2019 from Ant Group. On January 20, human-to-human transmission of the coronavirus was confirmed by Chinese authorities, which marked a dramatic change in how the evolving pandemic was managed and contained. Thus, we define the periods before and after January 20 (December 26, lunar calendar) as the pre-and post-virus periods, respectively, and the corresponding lunar calendar dates in the same period in 2018 and 2019 accordingly. 7 Lunar New Year's Eve in 2020 fell on January 24, four days after the event date, and there is often a seven-day Lunar New Year holiday. 8 Thus, the first timeslot in the post-virus period includes the three days before Lunar New Year's Eve and seven days afterward, taking the potential impact of the holiday into account. We propose to aggregate the OMB data at the Thiessen-polygon level. Thiessen polygon, otherwise known as Voronoi diagram, is an essential method for the analysis of proximity and neighborhood . 9 The method defines an area around a center point, where every location is nearer to this point than to all the others. In our analysis, each OMB belonging to a specific Thiessen polygon is the closest to its own center point, compared with establish 138,629 polygons across mainland China. In addition, we collected raster data on variables that may affect OMB activities, such as economic development, population, and geographic characteristics. The customers of OMBs are mainly nearby, as OMBs are offline and small in scale, and the surroundings of OMBs play a significant role in their daily business. First, given the spread of the virus and the sensitivity of OMB activity to the weather, we include meteorological conditions in the analysis, such as temperature, wind speed, air pressure, humidity, and precipitation. Second, we obtained data on around 35 million "points of interest" 10 from AutoNavi (Gaode in Chinese) Application Programming Interface in December 2019, to proxy for local general conditions. Third, we have the cross-section data as of December, 2019 including raster economic development, population, and geographic variables, specifically, nighttime lights data with a 500-meter spatial resolution, population data at the 1,000-meter grid cell level, and elevation at the 30-meter grid cell level. We further calculate the driving distance from the center of the polygon to that of the county, prefecture-level city, and capital city of the province, reflecting transportation convenience at the polygon level. 11 To estimate the real impact of the pandemic on OMBs, we need to predict the counterfactual level of OMBs economic activities without the COVID-19 outbreak. The difference between the actual transactions and the counterfactuals would tell the real drop. Most studies that focus on the economic impact of COVID-19 use linear regression in a difference in difference (DID) specification (Chen et al., 2020; Fang et al., 2020) , but in our setting and data frame, we document that two issues related to the linear DID specification would lead to a biased estimation. First, it is not clear that the explanatory variables, including economic, population, geographic characteristics, and the activities of OMBs in the previous year and in the pre-virus period, are linearly related to the changes in OMBs activities in the post-virus period. And it is also not clear ex ante what factors among the dozens of variables would be most relevant. The result of running a kitchen sink regression is that we would likely run the risk of overfitting the data and estimate spurious relations between regressors and regressand (Rossi and Utkus, 2020) . Instead, we rely on a machine learning method known as GBDT. GBDT not only allows large conditioning information sets, but it also allows for non-linearities without overfitting or falling prey of the so-called curse of dimensionality. Second, the core assumption to identify the treatment effect in DID estimators is the Parallel Paths assumption, namely that the average change in outcome for the treated in the absence of treatment equals the average change in outcome for the non-treated. However, the QR code merchants in China just have experienced rapid growth since 2018, and it is still on a growing path these two years. The average growth of OMBs activities in 2020 in the absence of COVID-19 pandemic is hard to equal that in 2019, which invalidates the Parallel Paths assumption. In fact, we tested paths of the number and sales turnover of active OMBs in the pre-virus period in 2019 and 2020 and did find a non-parallel pre-trend. Our setting employing GBDT and nonparametric estimation allows for more flexible dynamic responses of OMBs activities to the feature variables. The basic assumption in our model is that the "relationship" between inputs and response variables rather than the growth path of OMBs in 2019 still holds in 2020, which relaxes the restrictions on the average changes of outcome variables. To be more specific, in our setting, we predict the counterfactual level of OMBs activities in the post (1) and (2) but shifting the event date three weeks backwards. In another word, we assume that the pseudo-event date is 30 December, 2019, and follow the basic logic of the models in section 3 to predict the OMBs activities between 30 December, 2019 and 20 January, 2020. In this way, the weeks between 30 December, 2019 and 20 January, 2020 are pseudo post-virus periods. And the parameters used to predict OMBs activities in these three weeks are those estimated using data in the previous two years. The red dashed lines in Figure 1 show the predicted paths in 2020, while the red solid lines show the actual paths. We see the predicted paths in the pre-virus period in 2020 largely coincides with the actual paths (week -1, -2 and -3), implying that the relationship between feature variables and OMBs activities in the pre-virus period in 2019 does hold in the same period in 2020. Therefore, this at least seems to be more reasonable to believe that the relationship in the true post-virus period (weeks after 20 January, 2020) is also stable in 2020 if there were no COVID-19 pandemic. We first explore how business operations were affected by the pandemic. Figure We then provide a detailed exploration of the impacts on OMBs activities. Figure 2 shows the time series of the ratio of actual values to their counterfactuals in terms of the number and sales turnover of OMBs. A ratio of 1 indicates that the actual economic activity of OMBs was exactly at the level of what it should have been, and the smaller the ratio is, the sharper was the decline in business activities. The difference between the ratio and 1 indicates the percentage change in the weekly number and sales turnover of active OMBs relative to their counterfactuals. We have seen an immediate drop and partial recovery in the number and sales turnover of active J o u r n a l P r e -p r o o f different regions, and the specific terms and requirements of the lockdown also differed across provinces and cities, which brings great challenge to define the effective day of the lockdown policy. However, the mismeasurement due to difference between various measurements of policies effective date are not likely to exceed one week, the challenge is largely mitigated when we define the effective week. Therefore, we follow the information from He et al. (2020) where , , is the logarithms of the number or sales turnover of OMBs in polygon and city in the th week. , takes a value of one for OMBs in each polygon in city in the th week before lockdown policies taking into effect, and zero otherwise. While , takes a value of one for OMBs in polygons in city in the th week after lockdown, and zero otherwise. zero, which implies that there is no systematic difference in the trends between the treatment and control groups before the city lockdown. Therefore, we assume that the parallel trends across two groups would hold in the absence of the lockdown. In addition, we could see that the lagged terms (l > 0) are negative and statistically significant. The number of active OMBs had seen drops varying from 4 to 8 percent during the first seven weeks after lockdown, and the OMBs had seen sales fall by 6 to 11 percentage points in the same period. The average impacts of lockdown became moderate since the seventh week after lockdown, which is in line with the fact that some cities started to lift the lockdown policies then and hopped this would restart the economy. We then estimate the average impacts of lockdown policies by comparing the average changes of OMBs activities in the treatment group (lockdown cities) relative to the control group (non-lockdown cities). We have seen most cities had repealed their lockdown regulations starting from the seventh week after January 20,2020, therefore, we keep the dataset spanning from week -3 to week 6, i.e., December 31,2019 to March 5, 2020, to prevent the impacts of lifting local government regulations. We evaluate the aggregate impacts of lockdown policies following a standard DID regression: , , = 0 + 1 , + , , + + + , , where , , is the logarithms of the number or sales turnover of OMBs in polygon and city in the th week. , takes a value of one for OMBs in each polygon in city in the th week after lockdown policies taking into effect, and zero otherwise. , , is a vector of time-varying J o u r n a l P r e -p r o o f control variables the same as that in Equation (3). The terms and capture the polygon and week fixed effects, respectively. Table 1 presents the estimated results of Equation (4). Column (1) and (2) show the estimated impacts on the number of active OMBs. The coefficient of the term is -0.08 and statistically significant at 1 percent level. The results indicate that compared with OMBs in cities without formal lockdown policies, the number of active OMBs declined by 8 percentage points when including weather controls, polygon and week fixed effects. We further include the weekly new confirmed cases and deaths in each city to control for the COVID-19 prevalence, and find that the number of active OMBs had seen a drop of 5.9 percent, suggesting that the changes in the number of active OMBs caused by city lockdown cannot be fully explained by virus itself. Column (3) and (4) We move on to study the economic effects on OMBs across urban and rural areas. We classify the polygons into two types of regions based on the classification using raster nighttime lights data in 2019, reflecting the variations in the level of economic activity, with urban areas being the more active. To match the granularity of our data, we rely on nighttime lights and population data to classify Generally, LOT determines an optimal threshold according to ancillary data (e.g., socioeconomic data, medium-to high-resolution remote sensing data, and so forth) and extracts areas with nighttime light brightness greater than the optimal threshold as urban areas. We adopt a similar approach to determine the optimal threshold value of built-up urban area extraction as Dou et al. (2017) . First, two types of nighttime light images for each city were extracted from the global data sets by using a mask polygon of the administrative boundary. Then, a threshold of the minimum digital number value was used to segment the images into urban areas and non-urban areas. The absolute difference between the extracted area using the VIIRS nighttime lights data and the reference data was recorded. Such processes were iterated by increasing the threshold values until reaching the maximum pixel value of the image. The threshold value that produced minimum difference was selected as the threshold for urban built-up area extraction of the city. We evenly allocate the population raster data at 1-kilometer resolution into 500-meter raster cells, namely, each 1-kilometer raster cell is split into four 500-meter ones. In this way, we accordingly classify the population into two groups living in the above-defined subregions based on the nighttime lights data. In this paper, a polygon may contain many 500-meter grid cells, which could be different subregions. Thus, we suggest classifying the polygons according to the subregion in which most of its population (over 50 percent) lives. For example, if over 50 percent of the population in a polygon lives in the urban area, we would classify this polygon as urban area. week, while sales turnover in urban areas did not start to bounce back until the fifth week. Urban J o u r n a l P r e -p r o o f areas bore the hardest hit of the shock, indicating that the great lockdown following the outbreak created greater disruptions in the previously more economically active areas. However, OMB activities relative to their counterfactuals started to converge to a similar level between urban and rural areas in the seventh week, that is, in early March. There are three possible explanations of the larger disruption to urban areas. First, during the worst period, it is plausible that the urban community exacted more stringent restrictions than the rural villages due to the higher population density and thus higher transmission risks, which directly led to OMBs would have been more willing to start operation due to a relative lower population density in rural villages. Note: 1-4. The same as in Figure 2 . 5. The two types of regions are classified based on the raster nighttime lights data in 2018, reflecting the variation in the levels of economic activity, with urban areas being the more active. We further estimate the impact for groups of different genders in the subsamples, and study the heterogeneities of impact by differentiating whether the owner was born in the place of residence. We We find that the red line in the right panel of Figure 5 always lay below the dark line, indicating that the drop of sales turnover of female merchants were always larger than that of male owners. In the end of our sample period, the tenth week after the COVID-19 outbreak, the sales turnover of male merchants rebounded to around 80 percent, respectively, relative to their counterfactual levels. However, the female merchants recovered 75 percent of their businesses. There is possibility that the harder hit for female OMB owners may have a long-lasting effect. Note: 1-4. The same as in Figure 2 . 5. The OMBs are classified into two groups by the gender of their owners. We explore the heterogenous impact on different OMBs by differentiating whether the owner was born in the place of residence. For the owner who was born in the same province where he or she conducted businesses, we classify them into a group and label them as natives. We label the other owners who were not born in the provinces where they were managing businesses as outsiders. For J o u r n a l P r e -p r o o f example, if an OMB owner A was born in province B and had businesses in province C, then he or she was labelled as an outsider, but he or she would be a native if having businesses in province B. Given that the OMBs in our sample are all very small in scale and have no branches, every owner conducted business only in one province. Figure 6 illustrates the estimated decline in the economic activities of OMBs for natives and outsiders. The decrease in economic activities were larger for the outsiders, with the number of active merchants at 43 percent and sales turnover at 51 percent of their counterfactual levels during the trough period. In the worst week, the drop in number of active outsiders was 7 percentage points larger than that of active natives. We find that the OMB activities relative to their counterfactuals were nearly at a similar level between outsiders and natives in the tenth week after COVID-19 break, one month after the work resumption, although the number of active outsiders was still far from the normal level, which reflect a positive effect of lifted restrictions but a relatively lagged return of migrant workers. Note: 1-4. The same as in Figure 2 . 5. OMBs are classified into two groups by differentiating whether the owner was born in the place of residence. The owner who was born in the same city where he or she conducted businesses is labelled as native. While the other owners who were not born in the cities where they were managing businesses are outsiders. We suggest that obtaining the accurate statistics about the informal economy can be possible by taking full advantage of digitalization of economy and the widespread of mobile payments. Using transaction-level business data of around 80 million OMBs from the largest Fintech company in China and the machine learning technique to predict the counterfactual path of OMBs activities in 2020, we have provided a first view of vulnerable informal workers after the blows from COVID-19. The tens of millions of OMBs in China work largely in the informal services sector of the economy, J o u r n a l P r e -p r o o f mainly in micro and family-based enterprises, and employ low-skilled workers who mostly could not work from home during the COVID-19 pandemic. Many OMBs rely heavily on cash flows and short-term loans due to limited savings. If they cannot work for an extended period of time, the whole family would be at risk. Effective policies are needed to ease the pain felt by vulnerable OMBs in the gig economy and mitigate the potential poverty and inequality impacts. We find that the number and sales of OMBs experienced immediate and dramatic collapse, with the biggest weekly contraction of around 50 percent, while the decline due to lockdown policy was modest and negligible. OMBs in urban areas experienced a sharper contraction during the trough, with a weekly average decrease of around 54 percent in the number and sales turnover of active merchants in the worst week, compared with the drops of 41 and 43 percent in the number and sales turnover of active merchants for rural OMBs, respectively. Female merchants were hit harder than the male merchants, with drops of 5 and 9 percent greater in the number of active merchants and sales turnover during the trough, respectively, than the drops of male owners. We have also seen that the business owners who were not born in the province where they conducted businesses were disproportionately disrupted, and they are often migrants in the city without social insurance. In short, we find that the most vulnerable workers in the gig economy were hit hard by the COVID-19 pandemic. Therefore, we suggest an inclusive policy response to mitigate the impact of the crisis and support this vulnerable group. We should pay special attention to the potential urban poor that may be hard to make a living. The economic activities of OMBs plummeted during the whole month of February, before starting a sharp rebound at the end of February. The businesses had rebounded to around 80 percent of where they should be seven weeks after the COVID-19 outbreak, hovering at this level for additional three weeks until the end of our sample period. The quick recovery of OMBs since the nationwide encouragement of work resumption provides evidence of the necessity of prioritizing containment of the virus and the importance of government support in reopening the economy. Although the informal businesses are small and vulnerable, a sharp recovery is likely once the spread of the virus is largely or totally under control. However, the pain felt by informal workers is real, and we see a bottleneck for further rebound and an unbalanced recovery for different groups. The negative effects on the J o u r n a l P r e -p r o o f vulnerable informal workers as a whole, especially on the female and migrants group, are likely to be long-lasting, thus we suggest a more continuous policy response to ensure adequate support for the most vulnerable at a relatively longer-term amid the new normal of epidemic prevention and control. OMBs' quick recovery would not have been possible without significant policy support. For example, as of February 12, 2020, at least 25 provinces in China rolled out up to 90 measures to support small and medium-size enterprises, among which the most frequently mentioned measures included support for reopening, delay in fee collection (electricity, water, and gas), tax payment delay, rent or tax deduction, delay or refund of social security contributions, lower financing costs, and strengthened financing support. We caution that these are short-term responses and partial analysis, only months after the outbreak of the virus; however, we highlight the importance of taking full advantage of digital development to measure the size of and follow up the economic activities of the informal sector. . There are around 80 million offline micro business (OMB) merchants in our data set. With the data privacy issue in mind, we would aggregate the OMB data at an appropriate geographical level. The widely-used delineation based on administrative units, census tracks, and other established areas leads to a great loss of information in our large sample. For example, it would be unfortunate if we assigned 80 million OMB merchants into 2,800 counties. A principle of the proposed aggregation method is to start at the lowest possible geographical level and the area could adjust adaptively according to local business activities, namely, the tracts should be densely distributed in the areas with higher population density and level of economic activity, and vice versa. This avoids a substantial increase in the number of initial territorial units, like unified small grid cells, and ensures a closer relationship between the number of OMBs and the characteristics of economic development, population, and geography in every single tract. This ensures keeping information at the granular data to a large extent. We propose to create Thiessen polygons and assign the OMBs into each polygon. Thiessen polygons, otherwise known as Voronoi diagrams, are an essential method for the analysis of proximity and neighborhood, for example, airtime transfers and mobile communications , police enforcement and spatial distribution of crime , determinants of HIV infection (Bertocchi and Dimico, 2019) , and traffic crashes . The method defines an area around J o u r n a l P r e -p r o o f a center point, where every location is nearer to this point than to all the others. In our analysis, each OMB belonging to a specific Thiessen polygon is the closest to its own center point, compared with the distance to any other center point. A key to the delineation of Thiessen polygons is to determine the center point of each polygon. We propose to use bank branches (including self-service branches) as the center of the polygons. On the one hand, the distribution of bank branches is in line with the proposed principle of aggregation, namely, the branches are densely dispersed in areas with active businesses and economies. On the other hand, the bank branches almost cover all the places across the country 14 and, given the importance of financial services in business activities, it is reasonable to use bank branches as the center points of the polygons for the data aggregation. We merge the bank branches within each 500-meter grid cell into one as the center point, by taking the average of their geographic coordinates, 15 and establish 138,629 polygons around these points across mainland China. Figure We employ weekly data on around 80 million QR code merchants (or Mashang in Chinese), aggregated at the Thiessen polygon level over December 31, 2019 to April 2, 2020 and for the corresponding lunar calendar dates in 2018 and 2019 from Ant Financial, the largest Fintech company in China. Table B .1 shows a full picture of the time window in our analysis. We show four critical facts about the QR code merchants, the offline micro businesses (OMBs) that adopt the digital payment solution. First, they are offline micro businesses that accept customer payments via an Alipay QR code specialized for merchants, including businesses that are registered or unregistered with the Industry and Commerce Department. Second, they are mostly self-employed and family-based businesses. Only individuals but not legal entities may apply for the QR code for merchants, which coincides with the characteristics of self-employed OMBs that hardly distinguish between the owner and business entity. This excludes the large enterprises that operate and settle in the name of a company, such as large supermarkets, chain stores, restaurants, and so forth, from our sample. Third, they are largely very micro businesses. The average annual revenue was less than 400,000 yuan (about US$56,000) for 99 percent of these merchants in 2019. According to the classification of the Ministry of Industry and Information Technology, retail businesses with annual revenue less than 1 million yuan are small enterprises. Fourth, they are quite representative for the OMBs in China. The Alipay QR code of merchants is specialized for the OMBs, because accounts for merchants have lower withdrawal fees than personal accounts, and it takes only several minutes to apply for the account and associated QR code with easy taps. Thus, OMBs have strong J o u r n a l P r e -p r o o f incentives to use the QR code solution to accept payments, especially since mobile payment has become the first choice in daily transactions for most Chinese people. The total economic cost of the coronavirus is yet to unfold, but it is evident OMBs, which are largely in the services sector and cannot work from home, have borne the brunt of the lockdowns to combat COVID-19. In the two weeks before January 20, when people and governments were little aware of the coming epidemic, the sales turnover of OMBs increased by 33.1 percent year-on-year (yoy). Following the rapid escalation of the virus, the sales of OMBs sharply dropped, by 39.4 percent yoy over the two weeks after January 31,2020 (January 7, lunar calendar, seven days after the Lunar New Year). Figure B .1 shows the city-level yoy change in OMBs' sales turnover around the two weeks of the Lunar New Year holiday. Amid the nationwide virus outbreak, few OMBs were immune from the hard hit. It seems that the sales turnover of OMBs remained positive in some western provinces, like Inner Mongolia, Qinghai, Tibet, and Guangxi; however, compared with growth of up to 80 to 90 percent in the two weeks before the Lunar New Year, the decline in sales has been sharp in the post-virus period. So, the negative economic impact on OMBs would be much worse than the obvious drop shown by the yoy change, as the implied sales was on a growth path that was much larger than the previous year. J o u r n a l P r e -p r o o f We collected raster data on variables that may affect OMBs' activities, such as economic development, population, and geographic characteristics. First, we include the meteorological variables in the analysis. The National Meteorological Information Center of China provides temperature, wind speed, air pressure, humidity, and precipitation from more than 800 weather stations at the daily level. Second, we obtained 35 million points of interest 16 data from AutoNavi (Gaode in Chinese) Application Programming Interface, for 18 primary categories, in December 2019. Third, we have the raster nighttime lights data with a 500-meter spatial resolution, population data at the 1,000-meter grid cell level, and elevation at the 30-meter grid cell level. We further calculate the driving distance from the center of the polygon to that of the county, the prefecture-level city, and the capital city of each province, respectively, reflecting transportation convenience at the polygon level. To merge these variables with the OMBs' activities at the polygon level on a weekly basis, we first calculate the average of the variables each week in the window of analysis for each polygon, and then match the variables to the weekly average of OMBs' activities based on the polygon code. Table B .2 presents the weekly summary statistics of these variables. We divide the sample in each year into two periods: the pre-and the post-outbreak periods. Panel A reports the summary statistics for the OMBs' activities and meteorological variables at the polygon level. Relative to the pre-period, during the 10 weeks after the event data, the number of active OMBs fell by 11.28 percent in 2018, 10.38 percent in 2019, and 33.20 percent in 2020, and the sales turnover dropped by 1.41 percent in 2018, 1.31 percent in 2019, and 3.35 percent in 2020. The overall decrease in the post-virus period reflects the seasonality around the Lunar New Year; however, we see a relatively sharper decline in OMBs' activities in 2020. The average contraction of OMBs' activities relative to the pre-virus period in the 10-week period since the outbreak does not directly tell us the causal impact of the pandemic; for this, we would explore the causal collapse of the OMBs' activities in the post-virus period on a weekly basis. In this section, we clarify the details of prediction on the counterfactual path. The gradient boosting decision tree (GBDT), first proposed in Friedman (2001), is one of the typical algorithms of machine learning (Chen and Guestrin, 2016) and has a wide range of academic applications. It has been widely used recently mainly due to its high accuracy and fast training and prediction speed. Here are three specifics worth noting in the process of predicting the OMBs' activity. First, we train the GBDT model at the Thiessen polygon level in 2019. We predict the weekly number and sales of active OMBs in a 10-week time window in the post-virus period; thus, we need to train a total of 20 (=10*2) models for each response variable in each week, with the feature variable list including 56 variables. Second, 10 percent of the data set is selected as the test set, then we split the remaining data into five sets, and the model was optimized by five-fold cross validation before the optimized hyperparameters were selected. Next, according to the optimized parameters, we establish the prediction model on OMBs' activity and carry out the prediction analysis on the test set. We evaluate the generalization ability of the model by calculating the coefficient R 2 on the test set, which has never been seen by the model, and the prediction value of the algorithm is explained to the target number and sales turnover of OMBs. The larger is the R 2 , the higher is the predicted accuracy of OMBs' activity, and the stronger is the generalization ability. The R 2 for the models in predicting OMBs' activity are above 0.90. Finally, we update the selected hyperparameter sets using the whole data set for 2018 and 2019 and establish the prediction model for the post-virus counterfactual path in 2020 based on the final updated parameters. See the Tables C.1 and C.2 for more details on the hyperparameters and performance of the model. lunar calendar). 2. In our analysis, week 1 includes the three days before Lunar New Year's Eve and seven days afterward to take the potential impact from the holiday into account; all the other weeks include seven days. 3. The time windows The long-term determinants of female HIV infection in Africa: The slave trade, polygyny, and sexual behavior Airtime transfers and mobile communications: Evidence in the aftermath of natural disasters XGBoost: A Scalable Tree Boosting System Greedy function approximation: a gradient boosting machine Traffic crash analysis with point-of-interest spatial clustering Measuring the effect heterogeneity of police enforcement actions across spatial contexts Spatial tessellations: Concepts and applications of voronoi diagrams Beyond prediction: Using big data for policy problems What will be the economic impact of covid-19 in the us? Rough estimates of disease scenarios How does household spending respond to an epidemic? Consumption during the 2020 covid-19 pandemic The coronavirus and the great influenza pandemic: Lessons from the "spanish flu" for the coronavirus's potential effects on mortality and economic activity Tax evasion, the underground economy and financial development Airtime transfers and mobile communications: Evidence in the aftermath of natural disasters Financial development and the underground economy The impact of the COVID-19 pandemic on consumption: Learning from high frequency transaction data Industrial Clusters, Networks and Resilience to the Covid-19 Shock in China Urban land extraction using VIIRS nighttime light data: An evaluation of three popular methods The macroeconomics of epidemics Human mobility restrictions and the spread of the novel coronavirus (2019-ncov) in china Which night lights data should we use in economics, and where The short-term impacts of COVID-19 lockdown on urban air pollution in China Validation of urban boundaries derived from global night-time satellite imagery Traffic crash analysis with point-of-interest spatial clustering Informality and development COVID-19 and social protection of poor and vulnerable groups in Lain America: a conceptual framework Measuring the effect heterogeneity of police enforcement actions across spatial contexts Who benefits from robo-advising? Evidence from machine learning (No. 3552671), SSRN Scholarly Paper Evaluation of NPP-VIIRS night-time light composite data for extracting built-up urban areas others, 2020. An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China