key: cord-0896768-jiy08e6e authors: Dasgupta, Nataraj title: Using satellite images of nighttime lights to predict the economic impact of COVID-19 in India date: 2022-05-24 journal: Adv Space Res DOI: 10.1016/j.asr.2022.05.039 sha: 2eb08ab3797cc7a026e4c7c85d8546df3f9343c4 doc_id: 896768 cord_uid: jiy08e6e The outbreak of COVID-19 in early 2020 heralded a deep global recession not seen since the Second World War. With entire nations in lockdown, burgeoning economies of countries like India plunged into a downward spiral. The conventional instruments of estimating the short-term economic impact of a pandemic is limited, and as a result, it is challenging to implement timely monetary policies to mitigate the financial impact of such unforeseen events. This study investigates the promise of using nighttime images of lights on Earth, also known as nightlight (NTL), captured by the Visible Infrared Imaging Radiometer Suite (VIIRS) instrumentation onboard the Suomi National Polar-Orbiting Partnership (Suomi NPP) satellite mission to measure the economic cost of the pandemic in India. First, a novel data processing framework was developed for a recently released radiance dataset, VNP46A1, part of NASA’s Black Marble suite of NTL products. Second, the elasticity of nightlight to India’s National Gross Domestic Product (GDP) was estimated using panel regression followed by machine learning to predict the Year-over-Year (YoY) change in GDP during Fiscal Year (FY) 2020Q1 IARI, 2020). Electricity consumption, known to closely track economic output and precipitation were included as additional features to improve model performance. A strong relationship between both electricity usage and nightlight to GDP was observed. The model predicted a YoY contraction of 24% in FY2020Q1, almost identical to the official GDP decline of 23.9% later announced by the Indian Government. Based on the findings, the study concludes that nightlight along with electricity usage can be invaluable proxies for estimating the cost of short-term supply-demand shocks such as COVID-19, and should be explored further. The outbreak of the novel coronavirus disease (COVID- The most important aspect of nightlight that makes it a 48 good fit for India is that the process of estimation is com- The impact of COVID-19 in India during the study 58 period, was analysed using a three-step process. First, 59 radiance values (expressed in units of nW.cm -2 .sr -1 , i.e., 60 nanoWatts per sq. cm per steradian), were extracted from 61 VNP46A1 using a custom, cloud-based architecture. Sec-62 ond, elasticity, i.e., the rate of change of nightlight with 63 respect to national quarterly GDP was estimated using 64 Panel Regression. Finally, machine learning (ML) algo-65 rithms were used to predict Year-over-Year (YoY) change 66 in GDP during FY2020Q1. 67 The paper is structured as follows: Section 2 (Back-68 ground) traces the use of Panel Regression and Machine 69 Learning for nightlight-related research and discusses some 70 of the challenges with using NTL data. Section 3 (Data) 71 outlines the datasets used in the study. In Section 4 72 (Methodology), the process of extracting radiance values 73 from VNP46A1 has been discussed along with the specifi-74 cations for Panel Regression and Machine Learning. Sec-75 tion 5 (Results) presents the estimates of elasticity and 76 predicted economic impact of COVID-19 at the national 77 and sector-level. Section 6 (Analysis and Discussion) ex-78 amines the results and their practical implications. Sec-79 tion 7 (Conclusion) concludes with a brief discussion on 80 limitations and further research. The use of nightlight for socio-economic research first 83 started in the mid-70's (Croft (1973) , Croft (1978) , Welch 84 (1980 ( ), Foster (1983 ) for a range of research topics from 85 estimating energy consumption to industrial activities 86 (Elvidge et al. (1999) ). However, use of the data was lim-87 ited as they were only available on physical film strips 88 (Zhao et al., 2019) . The digitisation of the nightlight data 89 archive by National Oceanic and Atmospheric Administra-90 tion (NOAA) in 1992 made them accessible to the wider 91 research community. In the early 2010's, using NTL data from 1992-2003 94 for a cohort of 188 countries, Henderson et al. (2012) , 95 economists at Brown University, showed that national in-96 come data could be complemented with nightlight to em-97 pirically estimate true income growth. Estimates from 98 Henderson's model, especially for countries with low-99 quality economic data were more realistic and aligned with 100 the public consensus even though the results sometimes 101 disagreed with official figures published by state govern-102 ments, that may have been inflated or deflated to serve 103 political purposes. Following the findings in this land-104 mark study, nightlight came to be regarded as a viable 105 and reliable proxy for serious economic research. Hender-106 son's paper also marked the first time that a panel regres-107 sion framework had been used to study nightlight data 108 and subsequently, the method gained broad acceptance 109 for nighlight-related research. Value Added (GVA) (Doll et al. (2000) ; Ebener et al. ( 2005) Beyer et al. (2020) also used sub-annual, quarterly data 174 similar to Prakash et al. (2019) to estimate QoQ effects, 175 but included electricity usage as an additional regressor. 176 In the panel model, Beyer et al. (2020) , found that elec-177 tricity consumption was significant at the 1% level, under-178 scoring the high importance of the predictor in measuring 179 economic metrics. The current study bears semblance to 180 research by Prakash et al. (2019) and Beyer et al. (2020) 181 and models quarterly national GDP using panel regression 182 to examine the elasticity of GDP to nightlight. Subash et al. (2018) ) and more 187 generally machine learning algorithms, such as Gradient 188 Boosting (Bansal et al., 2020) , Random Forests (Otchia & 189 Asongu, 2019), Support Vector Machines (Pandey et al., 190 2013) and others with nightlight data have become com-191 monplace. In this study, a machine learning (ML) framework using 193 raw nightlight radiance values similar to Otchia & Asongu 194 (2019)'s approach was used to estimate the economic cost 195 of COVID-19 during FY2020Q1. Several ML algorithms 196 were benchmarked using a range of hyper-parameters and 197 the algorithm with the best overall performance was used 198 to build the final model. Following the example of Beyer 199 et al. (2020) , electricity consumption, in addition to data 200 on precipitation and population, was also used to train the 201 ML models. The NTL products, VNP46Ax, were scheduled for re-204 lease between 2020-2021. These datasets, produced us-205 ing the Black Marble Algorithm, took advantage of the 206 high low-light sensitivity of VIIRS and represented a ma-207 jor improvement in publicly-available NTL data quality 208 (Román et al., 2018) . However, despite the improvements, 209 VNP46A1 was suboptimal compared to VNP46A2 which 210 had not yet been released during the course of this study. 211 Specifically, VNP46A1 did not contain BRDF-corrected 212 radiance layers and being top-of-atmosphere (TOA) NTL, 213 was affected by noise due to moon illumination, clouds 214 and other artifacts (Xu & Qiang, 2021) . Although this 215 study has attempted to perform robust data cleansing, 216 additional data beyond nightlight had to be included in 217 order to improve ML accuracy. Three major sources of data were used in this study: Indian economic, nightlight and electricity usage data. GeoTIFF, a type of matrix where each cell is associated 248 with a latitude-longitude co-ordinate (geo-coordinate) in 249 addition to a numeric value. Each GeoTIFF, in this case 250 is of size 2,400 rows × 2,400 columns. The daily VNP46A1 HDF5 files contain 26 GeoTIFF 252 files and 4 of them were relevant to this analysis -DNB 253 Radiance, Cloud Mask, Solar Zenith, Moon Illumination 254 Fraction layers. Details have been provided in Table 6 . The layer that contains the geo-coded nightlight radi-256 ance values is the 'DNB Radiance' layer. Due to extra-257 neous artifacts, the data in the layer requires further pro-258 cessing as discussed in Section 4. Figure 1 shows a visual 259 representation of NTL recorded in March and May of 2020. 260 Data on state-level electricity consumption was ob-262 tained from monthly reports published by Power Systems 263 Corporation of India (POSOCO) from April 2012 to June 264 2020 and then aggregated to compute quarterly, national-265 level usage. In this dataset, electricity consumption is 266 recorded as actual drawal (as opposed to projected usage) 267 measured in units of MU, i.e., 1 Million KWh. In order to improve model accuracy, data on precipita-270 tion and population were included in the ML model. Pre-271 cipitation data was obtained from the website of Indian 272 Agricultural Research Institute (IARI, 2020). Monthly 273 rainfall, measured in mm, was aggregated by quarters for 274 FY2012Q1 -FY2020Q1. Population estimates were ob-275 tained from Trading Economics (tradingeconomics.com, 276 2019) also for the same time horizon. The approach taken in this study uses a combination 322 of all these methods with some additional steps based 323 on a study by Stathakis & Liakos (2021) . Specifically, 324 after removing clouds and moonlight, pixels represent-325 ing direct sunlight, non-land surfaces, snow and satellite 326 shadow were set to null values in the DNB Radiance lay-327 ers. Median-based time averaging of sets of 90-days of data 328 representing quarters was performed to extract nightlight 329 for the corresponding time frame. A summary of the lay-330 ers used has been shown in Table 10 . Figure 10 shows a 331 flowchart of the overall workflow. The cloud layer required special handling as it contained 333 16-bit values with encoded information at different bit po-334 sitions as shown in Figure 8 . Details of bit array process-335 ing and Boolean matrix operations is very involved and 336 has been shown in Section 9.4 in the Appendix. Quarterly 337 sum of the radiance values, known as "Sum of Lights" 338 was thereafter obtained by taking a sum of radiance in the 339 pixels within the national boundary. Equation 9.4.2 shows 340 the matrix-wise Boolean operations that were required to 341 produce the final output. Parallel processing tools were used extensively due to 344 the scale of the data. In particular, the library, multipro-345 cessing in Python (v3.6.6), packages foreach -doParallel 346 in R (v4.0.0) and GNU Parallel in Ubuntu Linux v18.04.5 347 (Tange, 2011) were used for parallel processing. Geospa-348 tial Data Abstraction Library (GDAL) was used for geo-349 spatial analytics and the R package, exactextractr, was 350 used to extract radiance values. A custom tool was also 351 developed using Adobe Java Libraries for MS Office and 352 pattern matching libraries in R to automatically extract 353 monthly state-level electricity consumption data from im-354 ages in the POSOCO PDF reports. A custom Amazon 355 Web Services (AWS) cloud-based hardware architecture 356 using high-performance and high I/O servers was imple-357 mented for faster data processing ( Figure 11 ). 408 ln(GDP) = β 1 ln(S umO f Lights) t + β 2 ln (S umElectricity) Generally, in NTL research, the elasticity of nightlight is 410 used for predictive analysis. In this study, instead of using 411 elasticity, a machine learning approach was implemented. 412 Several ML algorithms were evaluated with National 413 GDP as the outcome variable. Specifically, Support Vec-414 tor Machine (SVM) (Cortes & Vapnik, 1995) with Ra-415 dial Basis Function (RBF) Kernel, K-Nearest Neighbours 416 (Altman, 1992) , Random Forest (Breiman, 2001) , eX-417 treme Gradient Boosting (XGBoost) which implements 418 a Gradient Boosted Model (Chen & Guestrin, 2016) and 419 Lasso/Ridge Regression (Tibshirani, 1996) were compared 420 to assess individual model performance. The predictors shown in Table 2 were used for feature 422 engineering. First, the features were scaled and centered 423 to have mean 0 and a range between (0,1). Next the 424 dataset was split into 80/20 Training/Test set samples. 425 For each algorithm, a 5-fold repeated cross-validation was 426 performed over a set of hyper-parameters. The best model 427 was selected based on the least Root Mean Square Error 428 (RMSE) on the training set. The model was then retrained 429 using a more narrow and targeted range of hyperparame-430 ters. Objective Function = min The sector-level predictions shown in As seen in the Pooled OLS model, the coefficient of Sum 467 of Lights was significant across all models. In the Pooled 468 OLS (1) model, a 1% change in Sum of Lights increases 469 GDP by 0.189%. When Sum of Electricity is added to 470 the model, the premium drops to 0.113 although it still 471 remains significant. The variable is significant at the 1% 472 level and the coefficient suggests that a 1% increase in 473 electricity usage, increases GDP by 0.313%. In Fixed Effect models (3-6) with Quarterly FE, the 475 coefficient of Sum Of Lights is 0.189 when it is the only 476 independent variable. The premium drops to 0.023 when 477 population is added. Population is significant across all 478 models where the variable was included, at the 1% level 479 with an elasticity over 5.5. The magnitude is very high 480 but not surprising. Increase in population size increases 481 the labour pool and hence GDP, the national output. The 482 magnitude of Sum Of Lights reduces by almost 50% to 483 0.10 when Sum of Electricity is added. It is well known 484 that electricity usage is closely correlated with GDP (Lin 485 & Shi (2020)) and the results are consistent with findings 486 in earlier studies. The projected 24% contraction of National GDP in 523 FY2020Q1 was a significant change from the average YoY 524 GDP increase of 6% in years prior to 2020 (Plecher, 2018 On August 31, 2020, while the current study was being 532 reviewed, the Indian Government announced official GDP 533 figures for FY2020Q1 which estimated the YoY decline to 534 be 23.9% (Mundle, 2020) . It turned out that the predicted 535 GDP decline of -24% obtained in the study was a much 536 more accurate estimate, almost identical to the actual re-537 sults. The industrial sector-wise YoY estimates shown in Ta-539 ble 4, was lower compared to estimates published by the 540 Observer Research Foundation (ORF), (Mukhopadhyay, 541 2020) and McKinsey (Gupta et al., 2020) . For instance, 542 the impact in the real estate sector was estimated to be 543 approx. -17.3% according to ORF, whereas the nightlight 544 Second, based on low RMSE values, the study found 573 that nightlight and electricity usage could predict the short 574 term impact of a supply-demand shock with a high de-575 gree of accuracy at the national level in India. Electricity 576 consumption had a high predictive value and since it is 577 easier to obtain data on electricity usage, it should be ex-578 plored further, similar to studies that have been already 579 conducted elsewhere (Fezzi & Fanghella, 2020) . Precipi-580 tation data was also found to improve model performance 581 and in general linear ML models performed better at the 582 national level. In countries such as India, due to lack of data, changes 584 in the output of the informal economy cannot be estimated 585 in the short-run. This means that Real GDP data is not 586 available to institutions immediately and it can sometimes 587 take years before the numbers are known (Bhandari & 588 Roychowdhury, 2011) . Hence, nightlight and electricity 589 usage could be invaluable proxies and leading indicators 590 of economic changes. Nightlight might not always be a suitable proxy for 593 estimating the economic impact of a pandemic. During 594 COVID-19, nearly all companies in the services sector 595 adopted work-from-home policies. A study by Lenovo Re-596 search India claimed that remote working had improved 597 productivity by 83% (Lenovo, 2020). In urban areas dom-598 inated by the services sector, even though nighttime lights 599 had gone down significantly due to reduced highway traf-600 fic, business closures and other factors, it did not lead to a 601 proportional decrease in the output of the services sector. 602 Hence, using nightlight for estimating impact in the ser- Step Layer Operation ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Using daily nighttime lights to mon-653 itor spatiotemporal patterns of human lifestyle under covid-19: 654 The case of saudi arabia An introduction to kernel and nearest-neighbor 657 nonparametric regression Pandemic induced changes in eco-660 nomic activity around african protected areas captured through 661 night-time light data Temporal prediction of socio-economic 665 indicators using satellite imagery Nightlights as a development indicator: The 668 estimation of gross provincial product (gpp) in turkey India can hide unemployment data, but not 671 the truth Examining 674 the economic impact of covid-19 in india through daily electricity 675 consumption and nighttime light intensity Night lights and eco-680 nomic activity in india: A study using dmsp-ols night time im-681 ages Random forests Tracking the covid-19 687 crisis with high-resolution transaction data Will gst exacerbate regional 690 divergence? Shedding light on regional growth 692 and convergence in india Xgboost. Proceedings of the 22nd 695 ACM SIGKDD International Conference on Knowledge Discovery 696 and Data Mining The Cost of 698 the Covid-19 Crisis: Lockdowns, Macroeconomic Expectations, 699 and Consumer Spending Support-vector networks Burning waste gas in oil fields Nighttime images of the earth from 707 space Panel Data Econometrics with R: 710 the plm package in nighttime lights during covid-19 lockdown over delhi, in-713 dia Environmen-715 tal Resilience and Transformation in Times of COVID-19 Night-time im-720 agery as a tool for global mapping of socioeconomic parameters 721 and greenhouse gas emissions Consistent covariance 724 matrix estimation with spatially dependent panel data How much should we trust the dictator's gdp 799 estimates? Covid19 impact on in-801 dia to be more than 40.9 billion in first quar-802 ter covid19-impact-india-more-40-9-billion-in-first-quarter-64501 The coronavirus recession and its implications Session II: Traditional Estimation Prac-808 tices: Determining the Level and Growth of the Informal 809 Economy Measuring Informal Economy in India Indian Expe-810 rience Industrial growth in sub-saharan 813 africa: Evidence from machine learning with insights from night-814 light satellite images Monitoring urbanization 817 dynamics in india using dmsp/ols night time lights and spot-vgt 818 data Gdp of india: growth rate until 824 Night-time luminosity: Does it brighten understanding of eco-825 nomic activity in india? Reserve Bank of India occasional Papers Supply and demand shocks in the COVID-829 19 pandemic: an industry and occupation perspective. Ox-830 ford Review of Economic Policy Nasa's black marble nighttime 840 lights product suite. Remote Sensing of Environment Black Marble User Guide Version 1.0 Sbi research: Ecowrap Measurement of 851 blooming effect of dmsp-ols nighttime light data based on npp-852 viirs data Median shift lunar correction 855 for viirs Satellite data 858 and machine learning tools for predicting poverty in rural india Gnu parallel -the command-line power tool. ;login: 862 The USENIX Magazine Regression shrinkage and selection via the 865 lasso ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Nataraj Dasgupta Jan 16, 2020