key: cord-0848641-198o9ssx
authors: Chew, Alvin Wei Ze; Wang, Ying; Zhang, Limao
title: Correlating Dynamic Climate Conditions and Socioeconomic-Governmental Factors to Spatiotemporal Spread of COVID-19 via Semantic Segmentation Deep Learning Analysis
date: 2021-08-05
journal: Sustain Cities Soc
DOI: 10.1016/j.scs.2021.103231
sha: a2a7f5378a5982778ae50e2e0ad21889585d8b0d
doc_id: 848641
cord_uid: 198o9ssx

In this study, we develop a deep learning model to forecast the transmission rate of COVID-19 globally, via a proposed G parameter, as a function of fused data features which encompass selected climate conditions, socioeconomic and restrictive governmental factors. A 2-step optimization process is adopted for the model's data fusion component which systematically performs the following: (Step I) determining the optimal climate feature which can achieve good precision score (> 70%) when predicting the spatial classes distribution of the G parameter on a global scale consisting of 251 countries, followed by (Step II) fusing the optimal climate feature with 11 selected socioeconomic-governmental factors to further improve the model's predictive capability. By far, the obtained results from the model's testing step indicate that land surface day temperature (LSDT) has the strongest correlation with the global G parameter over time by achieving an average precision score of 72%. When coupled with relevant socioeconomic-governmental factors, the model's average precision score improves to 77%. At the local scale analysis for selected countries, our proposed model can provide insights into the relationship between the fused data features and the respective local G parameter by achieving an average accuracy score of 79%.

The ongoing novel coronavirus disease 2019 has already been considered as one of the greatest challenges faced by humanity in the 21 st century. On 11 Mar 2020, COVID-19 was officially declared as a global pandemic by the World Health Organization (Mbbs et al. 2020) . As compared to the 2003 SARS outbreak in Asia, COVID-19 is much more infectious, hence bringing about new norms of lifestyles such as enforced safe distancing in public places, compulsory masks wearing, etc. At present, the number of confirmed COVID-19 cases worldwide has already crossed 190 million, while the number of deaths due to this new virus exceeds 4 million globally. While vaccinations are gradually deployed worldwide, there are still unanswered questions pertaining to the nature of this complex disease and its variants. One specific area of concern is the unknown relationship between climate conditions and the transmission rate of COVID-19. For example, Germany, France, and the UK imposed their 2 nd nationwide lockdown in late 2020 due to a spike in their numbers of confirmed COVID-19 cases which, at the same time, coincided with their winter season.

The spread of coronavirus (SARS-related) in urbanized cities was previously shown to be dependent on a multitude of climate parameters, specifically temperature, wind speed, and humidity (Dalziel et al. 2018; Maiti et al. 2021; Viezzer and Biondi 2021) . Yuan et al. (2006) demonstrated that the 2003 SARS outbreak in Beijing, China peaked at the following conditions: (a) mean temperature of 16.9°C; (b) mean relative humidity of 52.2%; (c) wind speed of 2.8 m/s. Most recently, Prata et al. (2020) highlighted that the daily number of confirmed COVID-19 cases in Brazil had an inverse relationship with temperatures ranging between 16.8°C and 27.4°C from Feb to Mar 2020, while Şahin (2020) observed a correlation between the spread of COVID-19 in several cities of Turkey with wind speed values recorded 14 days ago and the temperature of the actual day for the total number of confirmed cases reported. Other similar studies for specific countries, such as China, Spain, and the United States (Abdollahi and Rahbaralam 2020; Liu et al. 2020; Runkle et al. 2020; Shi et al. 2020) , have also been performed since the virus' inception.

There have also been recent studies made that informed positive or minimum correlations (Ahmadi et al. 2020; Bashir et al. 2020; Tosepu et al. 2020; Xie and Zhu 2020; Yao et al. 2020) between the transmission rate of COVID-19 and climate conditions, especially for cases where data is generally limited. Analysis for large-scale countries based upon multiple climate zones was also restricted and inconsistent in several reported results (Awasthi et al. 2020; Baker et al. 2020; Chiyomaru and Takemoto 2020; Islam et al. 2020; Le et al. 2020; Pan et al. 2021; Sobral et al. 2020 ). On a broader aspect, Wu et al. (2020) and Guo et al. (2021) investigated the effects of ambient temperature on the spread of COVID-19 worldwide where they reported a negative correlation between both parameters, hence highlighting the likelihood that higher temperatures can be useful to control the global transmission rate of COVID-19.

In general, the above-mentioned studies indicate that the transmission rate of COVID-19 can be directly or indirectly associated with different climate conditions which, however, remains an open and important research question to be investigated further. In March 2021, the United Nations highlighted the possibility of COVID-19 becoming a seasonal disease by pointing out that respiratory-related viral infections are often seasonal, especially during autumn-winter peak period for influenza and temperate conditions resulting in cold-related coronaviruses. On the other hand, it is also worth noting that almost all documented pandemics in humanity's history are originated from the exploitation of the natural environment (e.g., wildlife) which is directly linked to climate change (Rodó et al. 2021) . For example, due to economic development and globalization for the past several decades, the emergence of global warming has caused wildlife to abscond to the colder poles to avoid the increasing global temperatures. Consequently, there is a likelihood that they interact with other wildlife which they usually would not, hence creating opportunities for unknown pathogens and viruses to get into the new hosts. Likewise, continuous deforestation forces animals to migrate and contact with other wildlife, hence increasing the tendency for transmission of unknown pathogens and viruses which can subsequently lead to a spillover of infections from wildlife to humanity (Rodó et al. 2021) . Overall, we can recognize that climate can be linked to or to pandemic occurrences in general, both on short-and long-term basis, hence underlining the need for natural environment protections, as part of the present global sustainability mission, to minimize the risk of future outbreaks/pandemics.

While temporal climate conditions do affect (directly or indirectly) the spread of COVID-19, the exact seasonality and dynamic association between COVID-19 and climate conditions are still not well understood quantitatively in today's context. Continuous studies are therefore required to correlate the relevant climate conditions to the present virus spread. It is hopeful that epidemiological prediction models which accounts for climate and weather processes can assist humanity to identify populations who are at risk towards COVID-19, and thus improve our overall surveillance and control measures (Kroumpouzos et al. 2020) . At the same time, it is of common knowledge that socioeconomic factors and restrictive governmental policies do also control the transmission rate of COVID-19 in the different countries. Hence, it is also imperative that any novel prediction models for COVID-19 must also account for the relevant socioeconomic-governmental factors, atop of selected climate features, for modelling the spreading rate of COVID-19 both on the global and local contexts. In our present study, the focus lies in developing an alternative prediction tool which can fuse the different types of data features pertaining to climate conditions, socioeconomic factors, and restrictive governmental policies to model and forecast the spatiotemporal growth rate in the number of confirmed COVID-19 cases at both the global and local scales. We note that the global scale relates to an aggregation of 251 countries globally in our analysis, as subject to the data availability, while the local scale corresponds to any specific country based upon the modeler's choice. In addition, we propose an optimization workflow for the data fusion process in the model development phase which can attain a balance among the computational runtime for training the model, logic flow, and the model's resulting predictive accuracy/precision. This paper is thus structured as follows. Section 2 reviews related studies which model the complex relationship between the transmission rate of COVID-19 and multiple factors pertaining climate, socioeconomic conditions and/or restrictive governmental policies. Section 3 describes the selected data features, pertaining to the combined climate-socioeconomic-governmental factors, and the output targets for our proposed image-to-image semantic segmentation analysis to model and forecast the spatial classes distribution for the rate of increase in the number of confirmed COVID-19 cases across the different countries spatially, via a proposed G parameter. Section 4 summarizes the prediction results derived from the model's testing step, in relation to the proposed classification task for the G parameter via segmentation analysis using the selected combination(s) of data features. Finally, we investigate several major countries from the different continents to examine the level of agreement between the actual and predicted classes for the respective G parameter on a local scale, hence ensuring that the proposed prediction model can provide useful insights at both the global and local contexts. Section 5 compares the model's prediction capability with other relevant studies, followed by discussing on the implications of our proposed prediction tool and how the tool can be useful to the relevant stakeholders to build towards a post-Covid era. Finally, Section 6 succinctly summarizes the significance of our study and the key findings obtained by far.

As the global battle against COVID-19 continues, new information analysing the complex relationship between the transmission rate of COVID-19 and a multitude of factors which include, but not limited to, climate conditions, social control measures, economic stability, etc., has emerged continuously. In the following, we provide a literature overview on the critically important and relevant studies carried out by far on the above-listed relationship(s), which have been useful to contribute to the scientific community's qualitative and quantitative understanding on the current topic.

Investigations between climate conditions and the spread of COVID-19 have, by far to the best of our knowledge, been focused on using empirical one-dimensional (1D) quantitative formulation which aggregates multiple meteorological parameters/features via calibrated weightages using available data. For example, Wu et al. (2020) proposed a novel core generalized additive model (GAM) equation which predicts the number of daily new cases or the number of deaths in the different countries with a lead-time of 1 day as a function of wind speed, the population's median age, time, weather variables, etc. It is, however, likely that the selected input features into their prediction model are constricted to an extent where the features remain static during the model's computational runs, hence not maximising the model's predictive accuracy. Guo et al. (2021) used multiple average meteorological parameters, which are first weighted via population density, for the modelling step at the city-or country-level. Likewise, the number of input features does not augment during the computational runs, hence constricting the data-driven analysis. On a broader aspect for analysing infectious diseases, Kim (2021) leveraged on exploratory spatial data analysis and onedimensional (1D) spatial regression method to model infectious diseases, specifically measles and the 2015 Middle East Respiratory Syndrome (MERS), emergence data and 14 urban characteristics by analysing 225 spatial urban units in South Korea. The author's results demonstrated that new infectious diseases such as COVID-19 can be influenced by ecological factors, i.e., climate change. At the same time, the author also highlighted on the need to model the spatial relationship between weather conditions and the spread of COVID-19 in their future studies to better quantify their complex relationship.

To further underline the important relationship between climate conditions and the transmission rate of COVID-19, it is well-known that the virus, as caused by SARS-CoV-2, was first detected in low temperature areas of China towards the end of 2019, followed by major outbreaks in Japan, South Korea, Northern Italy, and Germany which too were experiencing their own winter periods (Kroumpouzos et al. 2020) . Observed cold weather patterns pertaining to COVID-19 (5°C-11°C air temperature, and 47%-70% relative humidity) are known laboratory conditions that also favour CoV survival (4°C air temperature, 20%-80% relative humidity) (Casanova et al. 2010 ). On the other hand, the gradual spread, and subsequent epicentre development, to India, Thailand, Singapore, Middle East, etc., which usually experience higher temperatures during the year, is attributed to global travelling and close human contact before the respective national lockdowns and air-travel controls. Therefore, while it is important for us to recognize that climate conditions do affect the transmission rate of COVID-19 globally, socioeconomic factors, and/or restrictive governmental policies also influence the level of virus spread in the different countries globally. For example, the virus transmission rate generally depends on the living environments, population densities, hygiene, space availability, etc., which can significantly affect the virus spread in relatively warmer areas (Huang et al. 2020; .

Since the pandemic's inception, several important studies have been made to investigate the relationship between different socioeconomic factors, as highlighted above, and the transmission rate of COVID-19 on a whole. For example, Li et al. (2021) performed clustering analysis and developed a novel structural model equation to quantify the relationship(s) between built environment attributes and the cluster size of COVID-19 in Hangzhou, China. Their results, by far, demonstrated statistically significant influences of commercial strength and the availability of transportation infrastructures on the number of confirmed cases in any infectious cluster. Similar clustering analysis has also been performed by Das et al. ( 2021) in the context of Kolkata megacity, India by correlating the level of living environment deprivation on the spatial clustering of COVID-19 hotspots in Kolkata. Multiple data regression models were also leveraged for the modelling analysis, where zero-inflated negative binomial regression (ZINBR)was found to best correlate the living environment deprivation factor with the spatial clustering of COVID-19 hotspots in Kolkata. Sannigrahi et al. (2020) also used different 1D regression models to determine that total population size and income level are most important to regulate the number of death cases due to COVID-19 in the entire European region. At the same time, the authors also highlighted on the need to model the influences of environmental conditions and socioecological statuses on the transmission and death rates of COVID-19 in future studies. Other similar studies focusing on investigating the quantitative relationships between defined socioeconomic-environmental nexus and the transmission rate(s) of COVID-19 in different countries or city states such as Washington, America (Hu et al. 2021) , New York City, America (Fu and Zhai 2021) , or America as a whole (Maiti et al. 2021 ) have been welldocumented. Those studies generally report that income level, level of social distancing, and stay-athome behaviours are key socioeconomic factors for controlling the transmission and death rates of COVID-19 in urbanized cities.

The effect of social distancing has been investigated by several studies. For example, Ugail et al. (2021) proposed a novel methodology which leverages on a well-known circle packing problem to re-configure common physical environments to generate the optimal design space in near realtime for the objective of minimizing the infectious rate of COVID-19 among the group of individuals within the defined space. On the other hand, Sun and Zhai (2020) innovated on the traditional perfect-mixing-based Wells-Riley model by introducing 2 new indices which modelled the effects of social distancing and space ventilation on the infection probability of COVID-19 in building spaces and transportation vehicles. Overall, their studies concluded that increasing social distance can significantly reduce the infection rate by an average of 30% during the first 30 min with appropriate ventilation conditions.

In summary, the above-outlined research studies have been very useful to improve the community's qualitative and quantitative understanding of the relationship between the different combinations of environmental conditions and socioeconomic factors, and the transmission rate of COVID-19 as the target parameter. However, going forward, several research gaps, which have also been highlighted by previous research works summarized above, can be delineated at this stage, namely:

 The previous studies, as discussed above, mainly focus on individual category of features (e.g., socioeconomic factors under one main category) to model either the transmission or/and death rate(s) of COVID-19. However, we are aware that there is a complex climatesocioeconomic-governmental nexus (and may be larger) which influences the level of COVID-19 spread in the different countries. By far, to the best of our knowledge, we note that there has been no prediction tool being developed to predict/classify the spatiotemporal growth rate in the number of confirmed COVID-19 cases, both on the global and local scales, as a function of climate conditions, socioeconomic factors, and governmental restrictive policies as an entirety.

 Most of the previous research studies focused on specific case studies which include, but not limited to, America, India, China, etc. On the contrary, we are hopeful that a singular competent prediction model can be developed, and built upon beyond this study, to predict/classify the growth rate in the number of confirmed COVID-19 cases for any specific country of the modeler's choice with a defined lead-time.

 To the best of our knowledge, there have been no studies made, by far, which leverage on two-dimensional (2D) input features to model the spatiotemporal growth rate in the number of confirmed COVID-19 cases both on the global and local scales. In general, 2D or 3D model features such as image representations for defined data features can provide more insights into their complex dynamics and how they may better correlate with the transmission rate of COVID-19 worldwide due to their higher data dimensions. On the contrary, 1D timeseries regression modelling, which is commonly used in the above-discussed studies, may limit the data-driven analysis due to lower data dimensions involved in the input features. We are thus hopeful that by using 2D input features for our proposed deep learning method can model the complex and high-dimensional correlation between the spatiotemporal growth rate in the number of confirmed COVID-19 cases (as the target objective) and the combination of climate conditions, socioeconomic factors, and restrictive governmental policies (as the data input features) on both the global and local scales.

In this study, we thus develop a personalized encoder-decoder deep learning architecture which is capable of extracting high-level features from 2D image representations for a range of input data parameters which include defined climate parameters, socioeconomic factors, and restrictive governmental policies. For convenience, we term the combination of the different data features as "climate-socioeconomic-governmental features". The extracted high-level features from the encoder component of the deep learning model are then further processed to reconstruct 2D map representations which quantify the spatial classes distributions for the rate of increase in the number of confirmed COVID-19 cases worldwide, via a proposed G parameter. To the best of our knowledge, this form of data exploitation for modelling the spread of COVID-19 has not been investigated before in the literature since the virus' inception. The team is thus hopeful that the formulated approach of using 2D image representations of the climate-socioeconomic-governmental features to model and forecast the proposed G parameter, both on the global and local scales, can support resilience and sustainability development in a post-Covid era where COVID-19 is expected to become an endemic in our society. For example, if the COVID-19 virus is found to correlate with seasonal weather variations and would result in higher infectious rates during certain periods, the quantitative information from the proposed prediction model can thus assist healthcare management to better prepare for possible spikes in community cases, especially when various socioeconomic factors, which are modelled as other input data parameters in the proposed tool, such as economic activities and global travelling are back to normality.

In the following, we first describe, in-detail, the selected data parameters pertaining to the defined climate-socioeconomic-governmental features which will be leveraged to train our prediction tool to model and forecast the proposed G parameter, as representative of the rate of increase in the number of confirmed COVID-19 cases, both on the global and local scales via classification analysis. Details relating to our deep learning model development, coupled with the proposed data fusion optimization process, are then subsequently provided.

The dynamic climate conditions/features are quantified using open-source two-dimensional (2D) satellite images which are first extracted from NASA Earth Observations (NEO) portal (https://neo.sci.gsfc.nasa.gov/). Our current analysis focuses on multiple climate parameters, namely: (a) Land Surface Temperature Day (LSTD); (b) Land Surface Temperature Night (LSTN); (c) Cloud Fraction (CF); (d) Cloud Optical Thickness (COT); and (e) Cloud Particle Radius (CPR), as subjected to their data availability ranging between Jan 2020 and Oct 2020 which aligned with the global COVID-19 pandemic situation. Table 1 describes the characteristic of each climate feature and their possible relationship with the transmission rate of COVID-19 globally. Note that NASA satellite images for the different climate parameters are collected via Moderate Resolution Imaging Spectroradiometer (MODIS), which is an instrument used for NASA's Terra and Aqua satellites. As summarized in Table 1 , it is likely that various climate features may have some level of overlap relationship with one another. For example, CF and COT features directly affect the amount of sunlight entering earth's lower atmosphere and ground level, hence influencing the LSTD and LSTN variations over space and time. As such, it is important to first evaluate which climate feature quantitatively is most influential in affecting the transmission rate of COVID-19 to avoid features overlapping, before fusing with the other defined socioeconomic-governmental factors as part of the data fusion optimization process which will be described in subsequent sub-section(s).

For each of the selected climate features, 257 quantities of 2D NASA satellite images, ranging between 22 Jan 2020 and 4 Oct 2020 daily, having dimensions of 180 (width) and 360 (height) pixels/grids respectively have been collated for our deep learning analysis. Examples of the 2D images for LSTD, LSTN, CF, COT, and CPR are shown in Figure 1a to 1e, respectively. Note that a 2D image can be represented numerically as a 3D array with the 3 rd parameter pertaining to its number of features to be modelled, while the 1 st and 2 nd parameters define the width and height of the corresponding image itself. For example, for typical red-green-blue (RGB) images, the 3 rd parameter is represented as a value of 3 to represent the total number of RGB features where each colour feature consists of their own 2D array. As such, the original 2D image can be quoted as × × 3 where and represent the width and height of the image respectively.

In our study, we instead convert the RGB colour representation of each 2D image into their normalized quantitative values, with respect to the known range of values from Table 2 , for each specific climate feature by associating with their respective colour tone at each grid/pixel in the original image. As a rule of thumb, the brighter colours (red, orange) generally represent the higher ranges of values for each climate feature as summarized in Table 2 , and vice versa for the darker colours (blue). By doing so, we can thus analyse every 2D satellite image with a single feature in their corresponding 3D array, where the 3 rd dimension is now represented by a single value which represents the colour intensity value instead of the conventional RGB features.

When processing each 2D image, we compute the mean and standard deviation values for each grid/pixel by using their respective range of values (from Table 2 ) for the corresponding climate feature. Each valid value in every grid/pixel location is then normalized via the following scaling formula:

(1)

where ̅̅̅ represents the normalized value for grid/pixel , represents the original value for pixel/grid which excludes the background representation (black colour in Figures 1a-1e ), represents the mean value of all valid values in the original 2D image, and represents the standard deviation value of all valid values in the same 2D image. LSTD represents the skin temperature of whatever is on the land (snow, ice, human-made structures, etc.). Note that LSTD is not indicative or equivalent to surface air temperature values. LSTD can also be correlated to changes in weather and climate patterns. For example, how does the increase in atmospheric greenhouse gases affects LSTD variations over time.

The temperature has been speculated to influence the spread of COVID-19, where countries with colder land temperatures may face greater challenges in controlling its spread rate. For example, in early Nov 2020, Germany, France, and the UK imposed their 2 nd lock-down period which aligned with the start of their winter season.

Land Surface Temperature Night (LSTN)

Identical in its data collection and investigative purpose as that of LSTD, with the exception that LSTN data are collected during the nights.

Same speculated relationship to COVID-19 as that of LSTD.

CF is generally indicative of the amount of sunlight energy that reaches the Earth from the Sun as well as the amount of energy that radiates from Earth and back into space, hence affecting Earth's temperature Building upon LSTD and LSTN, we need to investigate any possible relationship between CF and the transmission rate of COVID-19 globally and humidity in different locations. As a rule of thumb, the higher the CF values, the greater the amount of sunlight is being reflected back into space, and at the same time, higher CF values can also trap a greater amount of heat radiated from the earth's surface and preventing them from escaping back into space.

since CF directly affects land temperature.

Cloud Optical Thickness (COT)

COT generally measures the amount of sunlight passing through the clouds for reaching the Earth's surface. As a rule of thumb, the higher the COT values, the greater the amount of sunlight is being scattered and reflected by the available clouds.

Since COT and CPR affect the amount of sunlight reaching the Earth's surface, the earth's land temperature is again affected. Therefore, it is again useful to investigate the relationship between COT/CPR and the spread of COVID-19 globally as discussed in the preceding climate features above.

Likewise, to COT, CPR contributes to the amount of sunlight passing through the clouds to reach the Earth's surface. Generally, with smaller particles, brighter and more reflective clouds are generated which reflect sunlight back into space and cool the planet. Clouds with larger particles absorb more shortwave infrared light and, conversely, clouds with smaller particles absorb less shortwave infrared light. 

As discussed, besides considering the climate features as the input data for the prediction model, other relevant socioeconomic factors, and restrictive governmental policies, which are associated with the transmission rate of COVID-19 both on the global and local scales, are also included as the model's input features as part of the data fusion step in our analysis. The relevant socioeconomicgovernmental factors (total of 11 additional data features) considered are extracted from another open-source database in (https://ourworldindata.org/coronavirus-data), as summarized in Table 3 . For all listed factors in Table 3 , except for "Government stringency index", their representations are categorical via the use of unique classes for different ranges of values. For example, the "Level of income support" factor is quantified via 3 labelled classes of, namely: (Class 1) no support for loss salary; (Class 2) support > 50% of loss salary; and (Class 3) support < 50% of loss salary. Hence, to set up the common basis to correlate with the climate features, which are already represented as normalized floating values, all categorical data features are also normalized into discrete values via the same normalization formula from Equation (1), where the mean and standard deviation values are now computed via the data distribution of the number of classes for each of the data features. We also note that the same normalization procedure is undertaken for the discrete values of the "Government stringency index" factor to set up the same scaling of values for all data features, inclusive of the different climate features. As discussed earlier, each of the listed climate feature from Table 1 is represented as a 2D image, i.e., 3D array, of w × h × 1 where the value of 1 for the 3 rd dimension represents the single colour intensity feature. Data fusion of the above-listed 11 socioeconomic-governmental factors is then performed by constructing an initial 3D array for each of the climate feature, followed by concatenating with their respective colour intensity feature, hence resulting in a 3D array size of w × h × 12 for each of the climate feature from Table 1. Note that w (width) and h (height) of each 3D array are still fixed as 180 and 360 which controls the total number of pixels/grids in each image (3D array).

The global recorded numbers for the confirmed COVID-19 cases are extracted from another open-source database (https://ourworldindata.org/coronavirus-data). As mentioned earlier, we propose a G parameter ( ), as defined in Equation (2), which measures the daily growth rate in the total number of confirmed cases on the global scale. At this stage, the global scale encompasses 251 countries in total, where the respective total number of the confirmed cases for each country can be extracted from the same above-stated database. Figure 2a illustrates the cumulated number of confirmed cases globally for the selected period between 22 Jan 2020 and 4 Oct 2020, while Figure 2b represents the global variation of the proposed parameter for the same period.

( 2) where represents the global number of confirmed COVID-19 cases at time , and represents the global number of COVID-19 cases at time from the previous day. For example, the G value on 31 January 2020 ( ) is computed by dividing the difference between the respective number of COVID-19 confirmed cases on 31 January ( ) and 30 January 2020 ( ) by the number of COVID-19 confirmed cases from the latter's date.

At the same time, we also compute the individual values for all 251 countries using their respective reported numbers of the confirmed COVID-19 cases between 22 Jan 2020 and 4 Oct 2020. The computed values (in percentage quantities) are then group into 6 main classes, as summarized below:

≤ 10.0%  Class 4 -10.0%< ≤ 25.0%  Class 5 -> 25.0% 

As part of our data fusion process, the multiple parameters pertaining to the defined climatesocioeconomic-governmental factors are fused accordingly to construct a new 2D input image (3D array) of the defined size of 180 x 360 x d (64,800 x d grids) where each grid represents the normalized value(s) for the defined number of data features (d) being fused together. The resulting output target is also a 2D map representation having the same dimensions of 180 x 360 x 1, where the 3 rd parameter now represents a specific class type (Class 0 -5) as indicative of the rate of increase in the confirmed cases in earlier defined parameter. The 251 countries are then mapped accordingly to assigned grids/pixels on the 2D input and target maps, based upon their known longitudinal and latitude global coordinates for the defined size of the 2D maps.

For example, Figure 3 illustrates that if Country X occupies a selected pool of pixels in the defined 2D map of 180 x 360, then the same pool of pixels in the input and output maps of identical width and height dimensions will be tagged respectively with the normalized values for the modelled data features (fusion of climate-socioeconomic-governmental factors) and the resulting classes for as shown. The 2D input maps for the fused features will then be fed into our personalized deep learning encoder-decoder model to model the defined 2D output maps for the classes for all 251 countries at their respective pixels/grids, as part of the direct image-to-image segmentation analysis. More details pertaining to our proposed deep learning model will be provided in the next section. Figure 3 . Example of generating input and output 2D maps, i.e., 3D arrays, based upon assigned pixels/grids for specific countries.

To optimize the features selection process during the prediction model's training phase, Figure  4 illustrates a 2-step optimization workflow to determine the optimal set of fused data features which can best maximize the trained model's predictive performance on a defined testing dataset. Descriptions of the 2 steps involved in the proposed optimization workflow from Figure 4 are as follows:

Step I: Training of our proposed deep learning prediction model with each of the selected 5 climate features (LSTD, LSTN, CF, COT, and CPR), and determine which optimal climate feature can maximize the model's precision score on the defined testing dataset by attaining at least 70% precision score as our first optimization score. Again, we note that this step is performed to avoid the tendency of overlapping climate features, as described earlier, where the multiple features can be associated with one another. For example, the dependence of LSTN and LSTD on COT and CF as summarized in Table 1 .

Step II: Data fusion of optimal climate feature with the 11 listed socioeconomic-governmental factors (from Table 3 ), followed by performing the same model training process from Step I to further optimize the prediction model's predictive performance on the same testing dataset. Figure 4 . Example of generating input and output 2D maps, i.e., 3D arrays, based upon assigned pixels/grids for specific countries.

Step I: Optimize selection of climate feature

To determine the optimal climate feature from the initial model training step in Step I, we investigate a series of Scenarios for mapping the defined 2D maps of the different independent climate feature to 2D target maps, having the same dimensions, which represent the spatial classes distributions of the proposed metric for the same period between 22 Jan 2020 and 4 Oct 2020. Table 4 summarizes the series of Scenarios where each main Scenario investigates the relationship of specific climate feature with the different classes of for all 251 countries over time, while the Sub-Scenario within the main Scenario investigates the effects of varying lead-times of the same climate feature on the modelled output target. We again note that the locations of the different countries are appropriately mapped to specific pixels on the defined 2D input and output maps based upon their respective known longitudinal and latitude global coordinates, even though the original satellite images for CF, COT, and CPR features do not contain the specific contours of the 251 countries as that of Figure 1a and 1b. As illustrated in Figure 4 , the optimization objective in Step I is also to determine the optimal set of hyperparameters (batch size, epochs, type of Sub-Scenario) which can result in specific climate feature capable of maximizing the model's precision score to be above 70% at least when evaluating the model's predictions on the defined testing dataset. The selected hyperparameters will then be used the primary set for training the same prediction model using the fused data features in the subsequent Step II, before additional tuning to improve the model's resulting predictive capability. Step II: Data Fusion Upon determining the optimal climate feature from Step I of our proposed optimization workflow from Figure 4 , we proceed to fuse the selected climate feature with the 11 listed socioeconomic-governmental factors from Table 3 to construct a new 3D array of (180 x 360 x 12) in size as previously shown in Figure 3 . Note that our first approach models all 11 socioeconomicgovernmental factors, coupled with the optimal climate feature, to train and validate our proposed deep learning prediction model, following testing the trained model on the same defined testing dataset from Step I. Again, in this Step II, we continue to tune on the model's hyperparameters to best maximize the model's resulting predictive performance in terms of the precision and overall accuracy scores. Finally, Algorithm 1 summarizes the key computational steps involved in the entirety of Step (I-II) for our proposed workflow to optimize the data fusion process. re-train prediction model with additional hyperparameters ( ) tuning 6. end

The direct semantic segmentation analysis for both Steps (I) and (II) in our proposed optimization workflow (Figure 4) is performed with our own personalized encoder-decoder algorithm as illustrated in Figure 5 . At each epoch (i.e., iteration), the input data feature into the encoder-decoder model is a defined quantity (batch-size) of 3D arrays (2D maps) representing a specific climate parameter, while the output target is the same number of 3D arrays where each array represents the spatial classes distribution of the proposed parameter for all 251 countries in their respective assigned pixels. Table 5 summarizes the details of the different hidden layers inherent to our proposed encoderdecoder model. Learning rate and the number of epochs runs are maintained at the values of 0.0001 and 50 respectively, and no dropout probability is involved in the model's training step at this stage. The batch-size of the dataset is varied among 4, 8, 16, and 32 for training the model, while Adam optimizer is used to optimize the batch-size selection and the cross-entropy cost function is selected for the model training and validation steps. Our proposed model uses the mini-batch gradient descent method to minimize the selected cost function, where the different mini-batch sizes (4, 8, 16, 32 in quantities) of the defined 2D input and output maps are designated appropriately with respect to the different lead-times Sub-Scenarios analysed from Table 4 . Finally, the model is implemented using Keras version 2.2.4 and TensorFlow flow version 1.14.0 as developed by Google (Tensorflow.org). Figure 5 . Personalized encoder-decoder deep learning model, with pipeline parallelism protocol, to perform direct image-to-image segmentation analysis of defined 2D maps of climate features to 2D map representing the spatial classes distribution of the values for all 251 countries.

The principles of convolutional neural network (CNN) establish the basis for our proposed encoder-decoder model, which has shown promising results in many classification tasks using image-based features . A traditional CNN layer learns local features by sliding kernel filters on a feature map, as originally derived from 2D images of defined dimensions, as input features into a deep learning model. As such, there is a continuous transformation of the original low-level input features into high-level features which are expected to maximize the model's predictive accuracy. In CNN layers, the activation function usually uses the ReLU function, which accelerates the model's training phase and also avoids gradients disappearing (Schmidhuber 2015) . A pooling layer is also used to subsample the output from a CNN layer to reduce the total computational cost, while capturing the high-level features at the same time. A batch normalization layer can also be adopted to avoid model overfitting and to improve the model's training performance (Ioffe and Szegedy 2015) .

To optimize the computational workflow of our proposed model, pipeline parallelism is performed with 2 Nvidia V100 GPU cards on Azure Cloud. The 1 st and 2 nd GPU cards are affiliated to the encoder (GPU-0) and decoder (GPU-1) architectures (see Figure 1 ) respectively. Pipeline parallelism is initiated at every epoch as follows. During the forward pass, the first mini-batch of 2D maps for the corresponding climate feature is passed into the encoder architecture and then passing the extracted high-level features into the next decoder architecture via an assigned master thread. The master thread will also store a copy of the encoder model's initial weights for updating in the later back-propagation step. At the same time, the encoder model will immediately process the next minibatch of 2D satellite images while the decoder model processes the first mini-batch of extracted high-level features as initially derived from the encoder model. Note that another set of randomly initialized weights is adhered for the encoder model to process the second mini batch of data as the first backpropagation step has not yet been initiated at this stage to update the first set of initialized weights for the encoder model. However, since significant batches of data will be leveraged for model training and validation at each epoch, we would expect relatively fast convergence of the model's accuracy evaluation at every epoch regardless of the model's initial weights.

After completing the first forward pass, the decoder model will then initiate the first backpropagation step by updating its initial weights using the prediction results derived from the first mini-batch of data. At this stage, GPU-0 (encoder model) will remain idle while the Master thread stores a copy of the high-level features extracted from the second mini batch of data using the encoder model. Upon completing the first back-propagation step, GPU-1 (decoder model) will start the forward pass for the second mini-batch of high-level features, as retrieved from the main memory storage from the master thread, derived from GPU-0 earlier. Concurrently, GPU-0 commences on its backpropagation step using the encoder model's very first initial weights, as retrieved from the same main memory storage, and the respective backpropagation results derived from GPU-1 for the first mini-batch of data. Only after completing its own backpropagation step, GPU-0 will then initiate the forward pass for the 3 rd mini-batch of 2D input maps using the encoder's updated weights from the completed backpropagation step. The same data copying, and operational steps then repeat for both the encoder and decoder models. The summarized difference between the runs with pipeline parallelism (with 2 GPU cards) and without, i.e., with just 1 GPU card, for the first 2 mini-batches of data is outlined in Table 6 . Note that both GPU cards are maintained within the same machine, hence no inter-machines communication time is incurred.

Finally, the team proposed an encoder-decoder model for the classification analysis, instead of a traditional classification model, for the following reasons:

 All 251 countries vary in their respective sizes which thus affect their assigned number of pixels within the 2D map. As such, it becomes difficult to design a common input vector size as representative for all countries to perform the modelling step via traditional classification models. To do so, a significant amount of zero paddings is likely required for smaller scale countries which may affect the model's resulting accuracy for the local scale analysis of those countries.

 By analysing all countries within a single 2D map for each day, it becomes possible to correlate neighbouring countries for the modelled target as we would expect climate conditions to be reasonably close or similar for countries within the same continent or region. The convolutional layers deployed in the encoder-decoder model can capture the boundary relationship among the neighbouring grids as a representation of countries within the same continent or region. believes that the latter application provides the most useful insights at this stage. ) is defined as the number of TPs divided by the number of TPs plus the number of false negatives (FNs). F1-score, combines precision and recall is the harmonic mean of precision and recall and is defined in Equation 5. In our analysis, TP indicates that a specific class (0 to 5) for the computed parameter pertaining to the different grids/pixels is correctly predicted by the model, which is the most desirable outcome for multi-classification analysis.

(3) (4) (5) where measures the average precision magnitude within a time interval; measures the completeness of the classification performance within a time interval; measures the weighted average of the precision and recall, which reaches its best value at 1 and worst score at 0. 

Adhering to our proposed 2-step optimization workflow (see Figure 4 and Algorithm 1) for the data fusion process, this section systematically describes the results obtained from Steps I and II of the proposed workflow. Again, we highlight that the best possible set of hyperparameters, as determined from

Step I, will be leveraged as the 1 st set of primary hyperparameters for training (and validating) our proposed prediction model in Step II which involves the data fusion between the optimal climate feature and all 11 socioeconomic-governmental factors, followed by additional model tuning to further maximize the model's predictive capability if possible. Finally, we provide some high-level details on the application of the proposed prediction model to possibly control the spread of COVID-19 in the local communities in near real-time.

For each of the main Scenarios as summarized in Table 4 , we first evaluate the model's predictive performance on a global scale by computing the average precision, recall, and F1-score values derived from the model's testing step, with respect to Equations (3 -5). The dataset is first split into a 70:30 ratio, where the first 70% of the data pool, starting from 22 Jan 2020, is used for model training and validation. The remaining 30% is then used for model testing. Within the former 70% of data, we further split the dataset internally into 80:20 for training and validating the proposed encoder-decoder model respectively. We note that no random shuffling of the dataset is performed for the data splitting step.

Tables 7 to 11 summarize the average precision, recall, and F1-score values computed from the model's testing step by varying the selected batch sizes (4, 8, 16, and 32 in quantities) for model training and validation in each main Scenario and their corresponding Sub-Scenario as listed in Table  4 . The obtained results indicate that the model's precision score from the testing step can be ranked accordingly with respect to the multiple climate features being modelled: LSTD > CPR > COT > LSTN > CF. Readers are also referred to Figure 7 which illustrates the training loss and validation accuracy plots for the best Sub-Scenario determined for each of the climate features using the corresponding hyperparameters as shown. The current maximum precision score, as part of Step I optimization objective, attained from the model's testing step is approximately 72.1% by using the average 2D satellite image for the LSTD parameter, as derived from the previous leading 7 days, for the modelling step (see Table 10 , sub-Scenario 4f) which highlights the usefulness of reducing the "noises" in satellite images via the prior averaging step. In addition, the results align with that of recent studies Guo et al., 2021) which concluded that temperature does have some correlation with the transmission rate of COVID-19. In our analysis, however, we focused on land temperature instead of ambient of which both parameters are expected to be correlated to an extent.

Comparing LSTD and LSTN parameters, the former has a stronger relative correlation with the rate of increase in COVID-19 cases worldwide by an approximate 10% difference between their respective best precision scores (see Tables 10 and 11 , Scenarios 4f) from the model's testing step, which could possibly be attributed to the following reasons. Firstly, there are generally fewer interactions among individuals during the nights, hence lowering the probability of infection spread via human contact. On the contrary, human activity, even during lockdown periods in different countries, tend to be more significant during the day. The observed LSTD may thus coincide with the transmission rate of COVID-19 cases as it is likely that the bulk number of infections occurs during the day. At the same time, the land temperature during the day tends to be higher which may suppress the spread of COVID-19 cases worldwide, as reported in previous studies Guo et al., 2021) , hence resulting in more variations in the computed values over space and time. Consequently, the greater variations in the measured values may better match with the classes' predictions when using LSTD as the input feature. This pointer will be further verified in the later sections where we quantitatively investigate the relationship between the LSTD parameter and the classes' variations for specific major countries at the local scale analysis.

When using CPR, COT, and CF as input features, the resulting model performance is less ideal as compared to that of LSTD. The results make intuitive sense since CPR, COT, and CF are expected to affect the amount of sunlight being absorbed by the ground surface (see Table 1 ), hence there is a degree of relationship between CPR/COT/CF and LSTD/LSTN, where the temperature is the dependent feature. This observation aligns with our previous statement made that there may be overlapping relationship(s) among the different climate features, hence underlining the need to first identify the most suitable, i.e., optimal, climate feature which can best maximize the prediction model's precision score on the defined testing dataset as part of Step I in our proposed optimization workflow for the data fusion process. On a whole, the better predictive performance derived from using LSTD confirms the following: (i) temperature (land and likely ambient) is complexly affected by the other climate features (CPR, COT, and CF), hence the data parameter is likely to be most useful to model the rate of increase in the number of COVID-19 cases across the 251 countries; and (ii) temperature can therefore be taken as the key climate feature which can significantly affect the rate of increase of COVID-19 cases globally, thus reducing the dimensions of the current problem being modelled in Step II of the optimization workflow. Figure 6 . Training loss and validation accuracy plots for the best Sub-Scenarios determined for each of the climate feature modelled: (a) Sub-Scenario 1a -CF; (b) Sub-Scenario 2a -COT; (c) Sub-Scenario 3f -CPR; (d) Sub-Scenario 4f -LSTD; (e) Sub-Scenario 5f -LSTN. In summary, the optimal configuration from the completion of Step I for the optimization workflow is defined as follows: (i) Land Surface Temperature Day, LSTD climate feature; (ii) batch size of 8; and (iii) average 2D image representation using 7 days of lead-time from Scenario 4f. In the later Step II, the optimal feature in LSTD is fused with the other 11 socioeconomic-governmental factors to construct a new 3D array of 180 x 360 x 12 in size to undergo model training (and validation) with the same model configurations from Scenario 4f, followed by testing the trained model on the same testing dataset.

We further investigate the possible correlations between the defined input feature (climate feature) and target output (spatial classes distributions for ) at multiple lead-times of 3 days, 5 days, and 7 days, via the traditional cosine similarity (CS) metric as defined in Equation (6). This metric enables us to evaluate the level of similarity in the temporal trend of the defined input feature and target output a priori. While we do not expect to observe very high similarity between 2 disparate quantities (climate conditions and transmission rate of COVID-19), the CS computations for each climate feature may provide a preliminary indication of their respective level of statistical relationship with the spread of COVID-19 globally. In addition, the analysis may be helpful to complement with our finding, from the preceding sub-section, on which specific climate feature can best model the proposed G parameter, with respect to the testing dataset, for the optimal combination of the hyperparameters and Sub-Scenario for the defined lead-time.

For 2 defined images of size of , they can first be flattened into respective 1D arrays. In our analysis, the resulting size of the 1D arrays for each singular climate feature is 64800 by flattening the corresponding 2D maps for the input data feature and target output a priori. The CS index for each pair of 1D arrays (X, Y) can be quantified as follows:

where ‖ ‖ is the norm value of X, ‖ ‖ is the norm value of Y, and the cosine distance between X and Y. As a rule of thumb, an increasing similarity between X and Y is achieved with decreasing cosine distance between them. For each day with respect to the defined lead-time (3, 5 or 7 days), we compute the CS values for every pair of the input feature and target output, both in 2D map representations, for all climate features being modelled. Figures 6a to 6e illustrate the temporal variations of the similarity index values for each climate feature respectively at the different lead-times of 3, 5, and 7 days during the period between late Jan 2020 and Sep 2020 (200 days To cross validate our overall finding that LSTD, or temperature as the key climate feature, we cross-referenced with previous studies, as summarized in the below pointers, to evaluate if similar findings can be made at this stage. reported that temperature and humidity can generally suppress the spread of the COVID-19 virus, where a 1°C increase resulted in 3.08% reduction in the daily number of new cases reported globally by analysing 161 countries. Their key finding supports our correlation analysis where our present results report that there is negative correlation between LSTD and the global G parameter with p-value greater than the significance value of 5%.

 Guo et al. (2021) further affirms that temperature is most important in affecting the incidence rate of COVID-19 where higher temperatures can reduce the virus spread globally by analysing 190 countries. Likewise, their overall finding aligns with our present results attained for LSTD feature.

 Our present prediction model is built upon the analysis for 251 countries, hence the scale is larger than that of Wu et al. (2020) and Guo et al. (2021) , while considering the differing methodologies involved among the studies. Despite the variations, we have shown consistent finding, in terms of underlining the negative correlation between temperature and the transmission rate of COVID-19, with the above 2 key studies which are among the few that focus on developing a model which considers the global context, instead of individual countries/states.  However, in those previous studies, there is no extension of the respective prediction model to the local scale analysis, i.e., direct prediction of specific countries, based upon the modeler's choice, from the macroscale global model within the end-to-end prediction framework as proposed in our present study. Hence, we are hopeful of the additional novelty introduced by our proposed approach, coupled with the 2-step optimization process for the data fusion step, in modelling the global and local G parameters.

Building upon Scenario 4f (batch-size of 8 for LSTD parameter) from Table 10 , which provides the highest precision score of 72% at this stage, we perform a local analysis for several major countries from the different continents, namely: (a) Indonesia (Asia); (b) Brazil (South America); (c) Germany (Europe); (d) Somalia (Africa); and (e) Australia (Australian Continent), by examining the level of agreement between the predicted and actual classes for their respective classes over time. Likewise, a lead-time of 7 days is adhered to perform the model predictions, as starting from 1 Feb 2020, using the same encoder-decoder model from Figure 5 . 

We now proceed to fuse the selected climate feature in LSTD with the 11 listed socioeconomicgovernmental factors from Table 3 . Note that the new 3D array from the data fusion process has a resulting array size of (180 x 360 x 12) in size as previously shown in Figure 3 . Likewise, we leveraged on the same model configuration of batch sizes ranging among 8-32, epochs of 25 and lead-time of 7 days for constructing the average 2D map representation for the LSTD parameter to train and validate the same deep learning prediction model, following testing the trained model on the same defined testing dataset from Step (I). We also note that the same 7 days lead-time is used to construct the average 2D map representation for all 11 socioeconomic-governmental factors, as part of Sub-Scenario (f) from Table 4 . All other model configurations such as the size of training (validation) and testing datasets remain unchanged from that of Step I in the optimization process. In addition, we also increase the threshold precision score to a minimum value of 75% from Figure 4 to evaluate if the optimized set of hyperparameters, as previously determined from Step I, can be maintained for the same model training step in Step (II). Figure 9 illustrates the new training loss and validation accuracy profiles derived for the batch size variations among 4, 8, 16, 32 as shown. The figure generally shows that the similar validation accuracy scores can be achieved with the fused features at the varying batch sizes, when compared to Figure 6d , hence indicating that the model evaluation on the testing dataset will be more useful to determine the overall usefulness of the data fusion process. By increasing the total input feature dimensions by 12 folds/times in Step II of our proposed optimization process, we would expect an overall improvement in the trained model's predictive performance on the same testing dataset as summarized in Table 13 , when compared to Sub-Scenario 4f in Table 10 . At this stage, the highest average precision score attained from the same batch size of 8 is 77.1%, hence fulfilling our current threshold precision score of 75%. As such, the improvement in the average precision score on the global scale indicates that the classes predictions for all 251 countries are expected to better align with their actual classes for the analyzed period. Using the latest trained prediction model, as derived from the optimal model configuration of batch size of 8 and lead-time of 7 days in Table 13 , coupled with the fused data features, we then proceed to predict the respective classes for the same 5 major countries in Australia, Brazil, Germany, Indonesia, and Somalia as examined previously. Table 14 summarizes the local accuracy scores obtained from the trained prediction model where we can generally observe an average improvement of 5% from the previous scores in Table 12 , hence highlighting the overall benefit provided by the additional dimensions derived from fusing the 11 socioeconomic-governmental factors with LSTD. Using the latest results obtained, the following new observations/pointers can be made.

 To address our previous pointer made for the classes predictions of Brazil, the inclusion of the relevant socioeconomic-governmental factors may now better explain the observed higher rate of increase in the number of confirmed COVID-19 cases during their warmer period (approximately Day 90 to Day 180). Specifically, we note that during this period, Brazil's "Government stringency index" parameter (see Table 3 ) ranged between 75.0 and 81.0 which can be reasonably strict in terms of the level of government's control measures.

However, there were no strict stay-home restrictions imposed in Brazil in the same period, where the respective classes for the "Stay-at-home restrictions" parameter (see Table 3) ranged between Classes 1 and 2. Hence, the general populations were still able to travel out of home with some level of freedom which may have thus resulted in continual human contact. The gradual reduction in the rate of increase in the number of confirmed COVID-19 cases in Brazil from Day 100 onwards, as illustrated in Figure 8b , is likely ascribed to the case where Class 3 for the "Stay-at-home restrictions" parameter was maintained consistently for the next 6 months. At the same time, we also note that other social measures such as "Face covering", "Cancellation of public events and gatherings", "Schools and workplaces closures", "Controls for domestic travels", and "Controls for international travels" were maintained at their respective highest or 2 nd highest classes during the same period which may have thus controlled the virus spread.

 The example of Germany in Figures 8c also highlight the usefulness of including the relevant socioeconomic-governmental factors for controlling their rate of increase in the number of COVID-19 cases at as early as from Day 25 onwards which was around mid-Feb to Mar 2020. For example, schools were recommended (Class 2 for "Schools and workspace closure" parameter from Table 3 ) to be closed from 25 Feb 2020 onwards, hence limiting the interactions among youths which may have been useful to control the virus spread in Germany since Day 25. In addition, we do note that the different control measures ("Stay home restrictions", "Schools and workplaces closures", etc.) began to be less stringent (lower classifications from Table 3 ) from Day 160 onwards in Germany which was close to their usual autumn/winter period. Hence, the colder temperatures observed, at the same time, may have accelerated the virus spread which aligns with our previous pointer made for Australia (Figure 8e) where the nation experienced their winter period between Day 100 and Day 160 approximately. In summary, the results again underline that higher temperatures have a greater likelihood of suppressing the virus spread, for a given set of socioeconomicgovernmental factors, as compared to lower temperatures which, in turn, may accelerate the spread of COVID-19.

 While we note that the data fusion step between the optimal LSTD parameter and the 11 socioeconomic-governmental factors does improve the proposed prediction model's accuracy in predicting the global and local classes for the defined G parameter, it is obvious that the current highest precision and accuracy scores have not exceeded the 90% score. The reason is likely ascribed to the lack of data features which can effectively account for any "wrongdoings" by the individuals in the different local communities which may have led to unexpected increase in G parameter in the respective countries. The current prediction model cannot predict such dynamic behaviour which, however, may require finer scales of analysis within the community to do so. Examples of the defined "wrongdoings" include, but not limited to, breaking of home-restrictions, and social-distancing rules. The sudden increase in the growth rate in the number of confirmed COVID-19 cases can generally be observed for all 5 countries (Figures 8a -8e) , where there were unexpected occurrences of Class 5 at certain days or periods during the 200 days being analysed as shown in the respective figures. The inclusion of the present vaccination rates can also be considered in future modelling studies using the same concept. 

In near real-time context, the proposed prediction model, after undergoing the required hyperparameters tuning with the fused data features during their training phase, may assist communities to build towards resilience and sustainable development in a post-COVID era where COVID-19 is expected to become an endemic. Examples of the possible application are as follows:

 Since LSTD feature is expected to affect the virus spread, especially in colder countries as based on the current results obtained, the prediction model can assist healthcare management to better prepare for likely spike(s) in the number of confirmed COVID-19 cases based upon a defined set of socioeconomic-governmental factors as the input fused features into the prediction model with a defined lead-time. In the Western countries, where the winter season usually coincides with the festive period, the increase in social gathering, i.e., lower class of the "Stay-home restrictions" parameter from Table 3 , will likely supplement the virus spread as based on the current negative correlation with LSTD. Hence, the prediction model is likely to forecast a larger local G class over time in those countries.

By doing so, the local government can be informed beforehand, with a defined lead-time, and undertake the best possible social control measures to manage the amount of social interactions to mitigate, to an extent, the virus spread locally. At the same time, healthcare management can better prepare medical facilities and resources to handle an expected increase of patients due to COVID-19.

 In today's context, many Asian countries which include Hong Kong, Singapore, Australia, China, and others, are still reluctant to fully open up their economies, hence incurring the risk of being omitted from the gradual global recovery. The prediction model may thus be useful to the above countries to forecast the rate of increase in the local G parameter, given any input combination of the LSTD and other 11 socioeconomic-governmental factors into the trained prediction model. By being able to forecast with a reasonable lead-time, the local government or the different stakeholders can undertake appropriate control measures to mitigate the virus spread, hence alleviating the impact on their economies on the long run.

For example, by imposing work-from-home requirements with evidence of a forecasted spike in local community cases, the nation's economy can still function to some degree, instead of experiencing a more adverse situation where there is a sudden rise in the number of COVID-19 patients which can affect businesses and other sectors of the economy.  The current model workflow (Figure 4 , Algorithm 1) and architecture provides the flexibility to the modeler to continue the data fusion process with other data features which may further improve the present model's precision and accuracy scores. For example, as discussed previously, the current omission of data features which can quantify "wrongdoings" by individuals in the local community which have been shown to contribute to the virus spread. The inclusion of vaccination rates in the different countries should also be considered in future modelling analysis using the same approach as proposed in the study. We are hopeful that additional useful data features for training the same prediction model can only further improve its accuracy/precision in due time. Finally, we also note that the same approach can also be extended to forecast the transmission rate of other infectious diseases by varying the lead-times and data features involved for re-training the prediction model.

In this paper, we propose an alternative deep learning model, based upon semantic segmentation analysis, to quantitatively correlate climate-socioeconomic-governmental factors to the growth rate in the number of confirmed COVID-19 cases, both on a global and local scales, via a proposed G parameter. To the best of our knowledge, this form of data exploitation for COVID-19 data analytics has not been investigated in the literature by far. The global analysis encompasses a 2D map representation of 251 countries, while enabling a direct local scale analysis to any country of the modeler's choice. Overall, the model development phase comprises of a 2-step optimization process, where Step I first determine the optimal climate feature which has the strongest correlation with the global and local G parameter. By far, when considering a period of 200 days between late January and October 2020 with respect to different lead-times, the obtained results indicate that land surface day temperature (LSTD) can best correlate with the global G parameter with an average precision score of 72% from the model's testing step. This observation is further highlighted by our subsequent features analysis where LSDT has been shown to be moderately correlated to the spread of COVID-19, especially during the early stages of 2020 since the virus' inception, via the cosine similarity computations. At the same time, we also examine the relationship between LSDT and the rate of increase in COVID-19 cases for several major countries, which include Somalia, Australia, Germany, Indonesia, and Brazil, across the different continents. The results obtained from Step I indicated an average accuracy score of 65 to 70% for forecasting the classes in those countries with respect to the LSTD climate feature.

Step II of the proposed optimization process then fuses the same LSTD climate feature with 11 selected socioeconomic-governmental factors to further improve the model's predictive accuracy for both the global and local analyses. By doing so, the average precision score for the global analysis increases to 77% from the model's testing step, while the local scale analysis for the 5 different countries investigated experienced an average of 5% accuracy improvement in their prediction performance. We then described qualitatively on how the LSTD climate feature correlate with the different socioeconomic-governmental factors in modelling and forecasting the predicted classes for the local G parameter in Germany and Brazil, as examples, with defined lead-times.

Going forward, the team is working on multiple items to build upon our present analysis to continue contributing to the global efforts in combating the present pandemic. Our subsequent studies generally on: (a) further fusing the climate-socioeconomic-governmental features with the community's emotional responses towards the virus, via social media deep learning analysis, to model the transmission rates of COVID-19 on the global and local country scales; (b) developing an open-source platform tool that can provide near real-time predictions of the different classes of the G parameter at both the global and localized country scales as a function of the fused data features encompassing climate conditions, socioeconomic-governmental factors.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Effect of Temperature on the Transmission of COVID-19: A Machine Learning Case Study in Spain

Investigation of effective climatology parameters on COVID-19 outbreak in Iran

Temperature and Humidity Do Not Influence Global COVID-19 Incidence as Inferred from Causal Models

Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic

Correlation between climate indicators and COVID-19 pandemic

Effects of air temperature and relative humidity on coronavirus survival on surfaces

Global COVID-19 transmission rate is influenced by precipitation seasonality and the speed of climate temperature warming

Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities

Living environment matters: Unravelling the spatial clustering of COVID-19 hotspots in Kolkata megacity

Examining the spatial and temporal relationship between social vulnerability and stay-at-home behaviors in New York City during the COVID-19 pandemic

Meteorological factors and COVID-19 incidence in 190 countries: An observational study

The role of built and social environmental factors in Covid-19 transmission: A look at America's capital city

Clinical features of patients infected with 2019 novel coronavirus in

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Temperature, humidity, and wind speed are associated with lower Covid-19 incidence

Exploratory study on the spatial relationship between emerging infectious diseases and urban characteristics: Cases from Korea

COVID-19: A relationship to climate and environmental conditions?

Ecological and Health Infrastructure Factors Affecting the Transmission and Mortality of COVID-19

Convolutional Networks for Images, Speech, and Time Series

Gradient-based learning applied to document recognition

Built environment and early infection of COVID-19 in urban districts: A case study of Huangzhou

Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia

Impact of meteorological factors on the COVID-19 transmission: A multi-city study in China

Exploring spatiotemporal effects of the driving factors on COVID-19 incidences in the contiguous United States

Coronavirus Disease Coronavirus Disease ( COVID-19 ) Spreads

Warmer weather unlikely to reduce the COVID-19 transmission: An ecological study in 202 locations in 8 countries

Temperature significantly changes COVID-19 transmission in (sub)tropical cities of Brazil

Changing climate and the COVID-19 pandemic: more than just heads or tails

Shortterm effects of specific humidity and temperature on COVID-19 morbidity in select US cities

Impact of weather on COVID-19 pandemic in Turkey

Examining the association between socio-demographic composition and COVID-19 fatalities in the European region using spatial regression approach

Deep learning in neural networks: An overview

The impact of temperature and absolute humidity on the coronavirus disease 2019 (COVID-19) outbreak -evidence from China

Association between climate variables and global transmission oF SARS-CoV-2

The efficacy of social distance and ventilation effectiveness in preventing COVID-19 transmission

Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia

Social distancing enhanced automated optimal design of physical spaces in the wake of the COVID-19 pandemic

The influence of urban, socio-economic, and eco-environmental aspects on COVID-19 cases, deaths and mortality: A multi-city case in the Atlantic Forest, Brazil

Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries

Association between ambient temperature and COVID-19 infection in 122 cities from China

No Association of COVID-19 transmission with temperature or UV radiation in Chinese cities

A climatologic investigation of the SARS-CoV outbreak in Beijing, China

The authors declare no conflict of interest. This study was supported in part by Microsoft Corporation for the AI for Health COVID-19 Azure Compute Grant of ID:00011000272 and the Start-Up Grant at Nanyang Technological University, Singapore (No. 04INS000423C120).