key: cord-0817235-pmvg7e9y authors: Yudistira, N.; Sumitro, S. B.; Nahas, A. C.; Riama, N. F. title: Learning Where to Look for COVID-19 Growth: Multivariate Analysis of COVID-19 Cases Over Time using Explainable Convolution-LSTM date: 2021-02-16 journal: nan DOI: 10.1101/2021.02.13.21251683 sha: 817c7b7f204f3b3f71cf38c9b675e2b2c8d9a782 doc_id: 817235 cord_uid: pmvg7e9y Determinant factors which contribute to the prediction should take into account multivariate analysis for capturing coarse-to-fine contextual information. From the preliminary descriptive analysis, it shows that environmental factor such as UV (ultraviolet) is one of the essential factors that should be considered to observe the COVID-19 epidemic drivers, During summer, UV can inactivate viruses that live in the air and on the surface of the objects especially at noon in tropical or subtropical countries. However, it may not be significant in closed spaces like workspace and areas with the intensive human-to-human transmission, especially in densely populated areas. Different COVID-19 pandemic growth patterns in northern subtropical, southern subtropical and tropical countries occur over time. Moreover, there are education, government, morphological, health, economic, and behavioral factors contributing to the growth of COVID-19. Multivariate analysis via visual attribution of explainable Convolution- LSTM is utilized to see high contributing factors responsible for the growth of daily COVID-19 cases. For future works, data should be more detail in terms of region sample and more time. For future works, data to be analyzed should be more detailed in terms of the region and the period where the time-series sample is acquired. The explainable Convolution-LSTM code is available here: https://github.com/cbasemaster/time-series-attribution worldwide [12] , which is good for the environment. Moreover, vaccine development is not sufficient, and it takes a long time to discover [13] . Therefore, urgent, largescale, and natural immunity is needed. Some technologies have been developed by using UV light [14] [15] . Based on the above evidence, we investigate how UV rays dynamically affect the spread of COVID-19 based on geographic location, pollution levels, and human activities. Multivariate time series data analysis is a better choice for analyzing the growth of the COVID-19 pandemic because it has interdependence among multiple factors over time. The classification of multivariate time series is also an emerging hot topic in machine learning [19] . Deep neural network (DNN) can capture information from big data, thus it is the best candidate to perform classification tasks [20] [21] . The ability of DNN to generate meaningful feature representations in the learning process has attracted attention in the machine learning and data science circles. In this study, we use interpretable DNN forecasts to perform multivariate time series data analysis. This explanation helps to find critical joint characteristics to predict daily cases of COVID-19 over a period of time. One study using interpretable DNN is Roy Assaf et al.'s multivariate multi-factory PV energy prediction, [24] which uses a two-stage convolutional neural network (CNN). This multidimensional study uses 3 data sets of COVID-19 growth and its attribution, UVs, and people mobility data. The time-series data was taken from 2020-03-22 until 2020-09-11. The selected countries are located in tropical, northern subtropical, and southern subtropical regions. Data sets of worlds confirmed COVID-19, UV index, pollution, and people mobility time series were taken from Ourworldindata [1] , Tropospheric Emission Monitoring Internet Service (TEMIS) [2] [16] , and Google Mobility [3] , respectively. Specific data, like UV index and pollution in Jakarta, Indonesia, were taken from IndonesiaMeteorology, Climatology, and Geophysical Agency (BMKG) [4] . Confirmed, recovered, and death cases of COVID-19 data in Jakarta have been obtained from the Indonesia Ministry of Health [5] . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Some countries have had their deaths continue to flatten only as they have hit their hardest stringency, such as Italy, Spain, or France. As China took harder than initial stringency, its death curve was flattened. The aforementioned data set references can be accessed to investigate the detailed definition of each factor. There are 55 countries at various scales of geographical area and population trained on DNNs to reveal its explanations. Note that all factors are normalized into the 0-1 range before feeding into DNNs. The UV index (UVIEF) is derived from the measured solar radiation in the UV spectra that arrives on the surface. It is calculated by considering the proportional contribution of UV-A and UV-B, two of the three-wavelength based types of UV is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint radiation. UV-A is characterized as the UV radiation of which the wavelength ranges from 280-315 nm, while the wavelength of UV-B is between 315 nm and 400 nm. UV spectra are captured by the Global Atmosphere Watch (GAW) station. In this study, the is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint than the other, and vice versa. This dynamical pace happened between two adjacent groups, either blue with green or green with red. Blue countries were starting to converge, and conversely, red countries were starting to emerge both at the beginning of May even though outlier countries exist. These phenomena possibly can be explained in fig.1d where daily mean UV index in red countries were monotonically decreasing as the winter comes and the opposite for blue countries. It can be suggested that there is an indication that COVID-19 is a seasonal pandemic depending on geographical locations. The key drivers to be used in human mobility analysis are community activities dynamics during a pandemic depending on geographical locations (focusing on tropical countries). After the first outbreak, human mobility has changed from before pandemic due to lockdown or outdoor activity restriction from the government. To see the effect of these restrictions, we investigate activities dynamics relative to COVID-19 growth. To realize this, Google Mobility data that provide six different activities are utilized. Those activities are grocery and pharmacy, workplaces, transit stations, retail and recreation, residential, and parks percent change from baseline. To see the effect of reducing activities intensity, we analyse the time-lagged correlation between activities dynamics and COVID-19 growth. It means that the impact of activities reductions on COVID-19 growth patterns after several days is temporally investigated. The countries to be investigated in this study are India, Brazil, Malaysia, Saudi Arabia, Indonesia, Thailand (tropical countries). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint percentage of change to baseline. The duration of low activities was around a month, starting from the middle of April to May 2020. After May, the increasing activities were recorded, revealing new normal life has been adapted. Fig. 2d shows weekly mean confirmed cases in Malaysia that grew starting from the middle of March 2020. However, starting from May 2020, the weekly mean confirmed cases were decreasing. Note that the number of confirmed cases here has also been normalized across all countries considered in this dataset. Fig. 2e shows that Indonesia's human mobility reached the lowest activities percent change to baseline in the middle of May 2020 and then gradually increased its percentage of change to baseline. The increasing phenomena show that the new normal life has been adapted. Fig. 2f shows weekly mean confirmed cases in Indonesia that grew exponentially since the end of March 2020 as the number of tests increased. Note that the number of confirmed cases here also has been normalized across all countries considered in this dataset. To answer the question of whether there is any correlation between the decreasing activities before new normal and weekly mean confirmed cases after the new normal, time-lagged cross correlation was carried out. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint to weekly mean residential percent change from baseline by -0.33 with a correlation pvalue of 0.03. It means that weekly mean confirmed cases correlate to weekly mean workplaces, transit stations, and weekly mean residential percent change from baseline with statistically significant (p < 0.05). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint data. The forward propragation flows from input input layer layer followed by Sigmoid activation. optimization via backpropagation is utilized from which attribution maps (saliency maps) are generated. The visual attribution extracts attention to features that relevant to final Spatio-temporal time-series predictions (Fig. 5 GradCAM [23] is used to create its attribution maps. Grad hidden layer where its output activation is weighted with important weight associated with time-series predictions followed by Rectified Linear Unit (ReLU) activation. The formulation of Gradcam for LSTM hidden layer is given follows: The forward propragation flows from input input layer, hidden layers, and output layer followed by Sigmoid activation. After the learning phase, gradient optimization via backpropagation is utilized from which attribution maps (saliency maps) generated. The visual attribution extracts attention to features that relevant to final series predictions (Fig. 5) . Specifically, the method called GradCAM [23] is used to create its attribution maps. Grad-Cam is applied to the last hidden layer where its output activation is weighted with important weight associated series predictions followed by Rectified Linear Unit (ReLU) activation. The formulation of Gradcam for LSTM hidden layer is given After the learning phase, gradient-based optimization via backpropagation is utilized from which attribution maps (saliency maps) generated. The visual attribution extracts attention to features that relevant to final ). Specifically, the method called Cam is applied to the last hidden layer where its output activation is weighted with important weight associated series predictions followed by Rectified Linear Unit (ReLU) activation. The formulation of Gradcam for LSTM hidden layer is given in equation 2 as . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. We divide the dataset into training and validation, which are 55 and 4 countries, respectively. The total time-series length for all features is 174 days, 2020-03-22 to 2020-09-11. We use three architectures, which are 1D CNN with one layer, LSTM 1 layer, and proposed Convolution-LSTM (Conv-LSTM), which are validated to test the data set using Root Mean Squared Error (RMSE). The number of epoch is set to 3000. We use Adam optimizer to update weights for each iteration with a learning rate of 0.001. The best architecture based on best validation score is selected as a visual explanation model. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; Table 2 Even though the reconstruction result hardly matche RMSE is 0.008, which is the lowest compared to other models ( Table 2) . Even though the reconstruction result hardly matches the pattern of the actual one, the RMSE is 0.008, which is the lowest compared to other models ( Table 2) . s the pattern of the actual one, the RMSE is 0.008, which is the lowest compared to other models ( Table 2) . Improper is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint reconstruction is due to presumably Indonesia is a large country with heterogeneous character and behavior. Compared to Sweden, Norway, or Italy, Indonesia is greater in terms of population, and geographic area, leading to the need for more detail complete data in terms of region and period. Another suggestion is by adding more countries to be fed into the model; thus model can generalize well and reduce overfitting. For future works, data to be analyzed should be more detailed in terms of th region and the time-series sample period. reconstruction is due to presumably Indonesia is a large country with heterogeneous character and behavior. Compared to Sweden, Norway, or Italy, Indonesia is greater in terms of population, and geographic area, leading to the need for more detail complete data in terms of region and period. Another suggestion is by adding more countries to be fed into the model; thus model can generalize well and reduce overfitting. For future works, data to be analyzed should be more detailed in terms of th series sample period. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Coincidently, they are located in a similar geographical location of the northern subtropical area, which differs from Indonesia. It can be concluded that more samples are necessary to generalize countries that have a similar pattern to Indonesia's COVID is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint [25] . Visual attribution map also shows that government stringency is one of the critical factors contributing to daily new COVID-19 cases. Consequently, as shown in is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; https://doi.org/10.1101/2021.02.13.21251683 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. ; While environmental factor correlates to the global spread of COVID-19 pandemic, we believe that it is not just a standalone factor. Some other factors, like morphological and behavioral factors, influence the spread and growth of COVID-19 cases. Human activity also influences the spread and growth of COVID-19 cases via human-to-human transmission, especially in workplaces, residential, and groceries where direct human interaction is intense in closed spaces. Based on direct evidence, even though there is an indication that COVID-19 is seasonal flu where there is interchanging conditions between northern subtropical, tropical, and southern subtropical locations, some other anticipation should be taken into account: 1. The new normal life is an inevitable thing in daily life where wearing a mask, hand washing, increasing the hospital's capacity, and minimizing the number of activities that make up the crowd should be concerned. 2. Open space is safer than in a closed room with a crowd due to the UV light and air circulation. Unfortunately, the victims mostly occur in closed spaces such as groceries, residential, and workplaces. 3. For tropical countries, an abundance of UV light helps withstand the COVID-19 spread, especially in open space. There must be a good balance between activities inside and outside rooms, especially at noon, where the level UV index is high. In closed spaces like workplaces, it is suggested to expose UV rays in the room before leaving the places when the work hour ends. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 16, 2021. For subtropical country residents, wearing a mask is compulsory while living in an open space and during the cold season. By the time of the summer season A novel coronavirus outbreak of global health concern the continuing 2019-ncov epidemic threat of novel coronaviruses to global health-the latest 2019 novel coronavirus outbreak in wuhan, china On the airborne aspect of covid-19 corona virus Identifying airborne transmission as the dominant route for the spread of covid-19 The effect of environmental parameters on the survival of airborne infectious agents The International Bank for Reconstruction and Development/The World Bank Inactivation of airborne viruses by ultraviolet irradiation UU-c irradiation: A new viral inactivation method for biopharmaceuticals. american pharmaceutical review Estimated inactivation of coronaviruses by solar radiation with special reference to covid-19 Interrelations of uv-global/global/di use solar irradiance compo nents and uv-global attenuation on air pollution episode days in athens, greece Ultraviolet radiation and its interaction with air pollution, in: UV Radiation in Global Climate Change The coronavirus pandemic in ve powerful charts the timetable for a coronavirus vaccine is 18 months. experts say that's risky Advances in ultraviolet light technology for non-thermal processing of liquid foods Use of laser-uv for inactivation of virus in blood products Surface uv radiation monitoring based on gome and sciamachy John Hopkins Corona Virus Resource Center A brief survey on sequence classification Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks Deep learning for time series classification: a review Learning to forget: Continual prediction with LSTM Grad-cam: Visual explanations from deep networks via gradient-based localization Explainable Deep Neural Networks for Multivariate Time Series Predictions Closing borders is ridiculous': the epidemiologist behind Sweden's controversial coronavirus strategy