key: cord-0741176-cx603kt3 authors: Muhaidat, Jihan; Albatayneh, Aiman; Abdallah, Ramez; Papamichael, Iliana; Chatziparaskeva, Georgia title: Predicting COVID-19 future trends for different European countries using Pearson correlation date: 2022-05-12 journal: EuroMediterr J Environ Integr DOI: 10.1007/s41207-022-00307-5 sha: 2a403bb9e532bd773b4f84d0f7f79d028302f380 doc_id: 741176 cord_uid: cx603kt3 The ability to accurately forecast the number of COVID-19 cases and future case trends would certainly assist governments and various organisations in strategising and preparing for the newly infected cases well in advance. Many predictions have failed to foresee future COVID-19 cases due to the lack of reliable data; however, such data are now widely available for predicting future trends in COVID-19 after more than one and a half years of the pandemic. Also, various countries are closely monitoring other countries that are experiencing a surge in COVID-19 cases in the expectation of similar scenarios, but this does not always produce correct results, as no research has identified specific correlations between different countries in terms of COVID-19 cases. During the past 18 months, many nations have watched countries whose COVID-19 cases have risen sharply, in anticipation of handling the situation themselves. However, this did not provide accurate results, as no research was conducted that compared countries to determine if their COVID-19 case trends were correlated. As official data on COVID-19 cases has become increasingly available, using the Pearson correlation technique to pinpoint the countries that should be closely monitored will help governments plan and prepare for the number of infections that are expected in the future at an early stage. In this study, a simple and real-time prediction of COVID-19 cases incorporating existing variables of coronavirus variants was used to explore the correlation among different European countries in terms of the number of COVID-19 cases officially recorded on a daily basis. Data from selected countries over the past 76 weeks were analysed using a Pearson correlation technique to determine if there were correlations between case trends and geographical position. The correlation coefficient (r) was employed for identifying whether the different countries in Europe were interrelated, with r > 0.85 indicating they were very strongly correlated, 0.85 > r > 0.8 indicating that they were strongly correlated, 0.8 > r > 0.7 indicating that they were moderately correlated, and r < 0.7 indicating that the examined countries were either weakly correlated or that a correlation did not exist. The results showed that although some neighbouring countries are strongly correlated, other countries that are not geographically close are also correlated. In addition, some countries on opposite sides of Europe (Belgium and Armenia) are also correlated. Other countries (France, Iceland, Israel, Kosovo, San Marino, Spain, Sweden and Turkey) were either weakly correlated or had no relationship at all. According to the World Health Organization (WHO), a pandemic is briefly defined as a worldwide epidemic that crosses international borders (Loizia et al. 2021) . Widely known as SARS-COV2, coronavirus was initially detected in the city of Wuhan in China (Shereen et al. 2020 ). Although the virus was first detected in December 2019, several countries around the world began reporting infections in late January 2020. As understanding of the virus was limited during the early period of the pandemic, it was able to spread rapidly worldwide. As a result, the WHO quickly announced that the crisis had become a pandemic on 11 March 2020. The infection spread rapidly across the globe, with almost 3.4 million confirmed positive cases by May 2020 and 5,705,754 deaths until February 2022 (Loizia et al. 2021; Responsible Editor: Antonis Zorpas. WHO 2022b). Specifically, according to the WHO, the US has the highest number of cases (78, 932, 322) , followed by India (43, 004, 005) and Brazil (29, 478, 039) (WHO 2022a) . The virus has affected people of all ages, although older people with at least one pre-existing condition or those who have previously had respiratory problems have been more affected (Zhou et al. 2020 ). There has been an exponential increase in the number of COVID-19 cases, which has put pressure on medical systems, causing a shortage of medical equipment and health professionals in many countries as well as a decrease in patient admissions to hospital-as illustrated in Malta (73, 725 cases) , where the fear of contracting COVID-19 in hospital settings, combined with a drop in communicable diseases due to school closures, led to a decrease in severe paediatric admissions of 57.7% in medical facilities (Degiorgio et al. 2022 ). The demand for intensive care units, respirators, medical equipment, physicians and new healthcare workers was high. Specifically, the importation of protective garments increased in Europe by 145% compared to the number prior to the pandemic, while facemask demand increased from approximately 1 billion to 17 billion pieces per year (Morales-Contreras et al. 2021) . At the same time, while 77,000 ventilators were able to cover the demand across the globe in 2019, in April 2020, New York City alone required the addition of 30,000 machines to its healthcare facilities, causing ventilator manufacturers to increase their production by 30-50%. This increase was still not sufficient to cope with the demand for a 500-1000% increase in ventilator production worldwide (World Economic Forum 2020). Indeed, as has recently been observed in Tunisia (1,029,762 cases), many nations still face difficulties in obtaining sufficient medical supplies (BBC News 2021; WHO 2022a). Due to the rapid increase in COVID-19 infections, the ability to predict how the virus will spread at an early stage has become very important, as this can assist governments and local authorities in planning the necessary measures, such as increasing healthcare staff and medical supplies, among other factors. As a result of the speed of the global transmission of SARS-COV2, it is critical that countries are able to accurately predict patterns in advance, in order to develop strategies for each situation. If governments and other institutions can precisely forecast how the number of cases will progress in the future, this will certainly be beneficial, as they will have the opportunity to prepare appropriate plans and strategies for dealing with new cases before the event. Traditional statistical techniques such as integrated moving average (ARIMA) have been used by several researchers to predict time series values in various fields (ArunKumar et al. 2022; Chyon et al. 2022; Salmna and Kanigoro 2021; Fanoodi et al. 2019) . ARIMA models make it easy to predict a given time series based on its own previous values. However, the accuracy of these models is limited, as they are not able to precisely capture the effects of external factors. In one study, ARIMA was used to predict the numbers of infections, deaths and recoveries from the virus. The ARIMA model was validated using the Akaike information criterion (AIC) values, which were 20, 14 and 16 for cumulative recorded cases, deaths and recoveries, respectively . In another study, an ARIMA-based framework was used to predict future coronavirus trends (Ceylan 2020) . A study was performed in the European countries of France, Spain and Italy using a model that used the logistic, Weibull and Hill equations to identify infection rates and determine the power index values for the 10 nations whose infection rates were the highest (Hembram and Kumar 2021) . In a recent study on the COVID-19 pandemic, it was suggested that supervised learning models could be used to predict coronavirus trends. The researchers used a variety of predictive models, including linear regression, support vector machines and exponential smoothing, in order to predict trends in the number of individuals who recovered, died or became infected with the virus on a daily basis. The calculated optimal values for both the mean absolute error (MAE) and the root mean square error (RMSE) in the number of new cases and recovery rates were obtained (Šegota et al. 2021) . In another study by Sahai et al. (2020) , an ARIMA model was used to predict transmission rates in the five nations hardest hit by the pandemic. The parameters of the model were determined using a combination of Rissenan and Hannan algorithms. The study predictions assessed the number of individuals who would be affected in the United States, Brazil and India (Sahai et al. 2020) . Lee et al. (2022) attempted to forecast COVID-19 cases locally in Korea by geographical area, with the goal of improving the forecasting framework to reflect the effects of governmental control interventions (i.e. social distancing, gathering restrictions, change of policies) using mathematical models. Methods included using the susceptible-exposed-infectious-hospitalised-recovered model (SEIHR) to estimate the effective reproduction number (Rt) by Korean geographical area and a statistical model to estimate the instantaneous reproduction number, comparing epidemic curves and comparing high-, intermediate-, and low-intensity control interventions. According to the different intensities of the control interventions, the mathematical model predicted COVID-19 transmission dynamics with good accuracy and interpretability. Forecasting was implemented successfully, with the majority of the future observed cases predicted within the 95% confidence interval; however, there were some failures corresponding to sudden increases in new cases. The forecast for the Gangwon area, in particular, had low accuracy due to large variations in daily confirmed cases over time (ranging from 3 to 38 cases) and variations caused by clustered infections at schools, churches, etc. Additionally, researchers used the SutteARIMA method to predict confirmed short-term COVID-19 infections in addition to the Spain Market Index (IBEX 35). contrast to the ARIMA model, according to the mean absolute percentage error (MAPE) values, the SutteARIMA method predicted the number of new confirmed cases every day (Ahmar and del Val 2020) . Researchers developed a simple ARIMA model to estimate how many people would be infected and recover from SARS-COV2 once lockdown measures in Italy were eased (Chintalapudi et al. 2020) . They used a variety of techniques to make their predictions, such as relevant Google trends of particular search terms that were associated with the COVID-19 pandemic (Prasanth et al. 2021) , in addition to artificial intelligence (AI) techniques for nonlinear time series prediction problems (Bhimala et al. 2021) . Long short-term memory (LSTM) is a form of recurrent neural network (RNN) that has the capacity to remember values across given intervals. It is a method that produces effective results for the classification and prediction of unrecognised intrusions. LSTM networks have been used in studies (Dutta and Bandyopadhyay 2020; Huang et al. 2020; Tomar and Gupta 2020; Pal et al. 2020 ) as an in-depth learning technique to predict COVID-19 cases. In one study, Chimmula and Zhang (2020) tried to identify the main features needed to assess the development of the pandemic in Canada. In another study, by Rauf et al. (2021) , an LSTM model was developed to predict the progression of infection rates in Canada. In addition, the researchers compared the results for Canada with the infection rates documented for the United States and Italy (Rauf et al. 2021) . Moreover, an LSTM model was used to predict 30 days in advance how many people would get sick, recover and die from COVID-19. Furthermore, the researchers provided evidence confirming the effectiveness of strategies aimed at preventing transmission of the virus, such as lockdowns and social distancing (Tomar and Gupta 2020) . Researchers used a variational LSTM autoencoder model to predict coronavirus patterns on a global basis. In addition to using data to show the trend in cases in the past, the researchers used various urban properties as well as actions taken by governments to curb the virus, including closing workplaces and schools, restricting public gatherings and limiting public transportation, and predicted how the pandemic will develop in the UK, the US and India (Ibrahim et al. 2021) . Google Trends has been used by researchers to capture the effects of the pandemic on different social and economical aspects. As mentioned by Prasanth et al. (2021) , Google Trends was used to capture how frequently certain terms related to COVID-19 were searched for in order to effectively predict transmission rates using the grey wolf optimiser (GWO) optimised LSTM model. The LSTM model was enhanced using metaheuristics by tuning the hyperparameters within the pandemic domain (Prasanth et al. 2021) . Similarly, Simionescu and Raišienė (2021) investigated the impact of the COVID-19 pandemic on employment expectations using Google Trends data in panel autoregressive distributed lag (panel ARDL) models and a Bayesian multilevel model. Between 2020 and 2021, COVID-19 searches on Google were found to have a negative impact on employment expectations in new EU Member States. Through the models, it was confirmed that unemployment rate had a significant negative impact on employment expectations of the public, while the findings of the study supported economic policies that aimed at the reduction of labour market tension and at increasing employment expectations. In another study, researchers attempted to predict the number of recorded COVID-19 cases within China using an adaptive neuro-fuzzy inference system (ANFIS) based on a flower pollination algorithm (FPA) and utilising the salp swarm algorithm (SSA). They determined the parameters of ANFIS by combining FPA and SSA. The performance of FPA was validated by comparing the results with existing modified ANFIS models, such as particle swarm optimisation (PSO), a genetic algorithm (GA), approximate Bayesian computation (ABC) and FPA (Al-qaness et al. 2020). A method has been proposed that can predict the reproduction number (R0) according to the susceptible, infected, recovered and deceased (SIRD) model, along with further important parameters, in order to estimate the transmission pattern of COVID-19 across China (Anastassopoulou et al. 2020) . Officially recorded COVID-19 infection cases were predicted in real time for multiple nations, and the assessment of risk was conducted in nations that had been affected the most by the virus via the application of the regression tree algorithm (Chakraborty and Ghosh 2020) . Researchers in Pakistan try to predict verified COVID-19 cases by using a simple moving average technique (Chaudhry et al. 2020) . Additionally, they used a logistic growth model incorporating five parameters to simulate and predict how the virus would spread across the US; however, it was noted in the study that the accuracy of the model was reliant on measures applied by federal and state authorities . Juhn et al. (2021) show that geospatial analysis reveals new geographic risk factors (i.e. geospatial trends in the prevalence of test-confirmed cases of COVID-19) that might be significantly responsible for the overall burden of COVID's contributions to racial/ethnic and socioeconomic inequalities in the community, in order to guide community outreach efforts (e.g. public health, education and the vaccine rollout) for populations at risk for COVID-19. According to Scarpone et al. (2020) , the strongest predictors of COVID-19 incidence on a country scale (in Germany) were related to community interconnections, geographical location, infrastructure and transportation. In addition, the research of Sun et al. (2020) predicts that a lockdown may be an effective measure to limit the relationship between COVID-19 spread and geographical location, and the relationship between COVID-19 spread and population density. The findings of Mustanski et al. (2022) also indicate that exposure to SARS-COV2 may be more consistent across neighbourhoods within the city of Chicago (USA) than previously thought based on reported COVID-19 case rates. This suggests that factors other than differential seroprevalence, such as geographical location, may play a role in driving disparities in COVID-19 outcomes. Mustanski et al. (2022) reports that the number of active cases in the United States can potentially be used as a warning sign and true predictor of the future trajectory of the epidemic based on geographical spread. Regarding COVID-19 pandemic forecasting, Laatifi et al. (2022) tested machine-learning-based models to predict the severity of the cases in Morocco (1,162,288 cases) (WHO 2022a). 337 COVID-19 patients from Cheikh Zaid Hospital were divided into categories according to the severity of the illness, and machine learning models were used along with reduction algorithms to predict the severity and sickness of the patient. The proposed method was intended to aid hospitals and medical facilities in determining who should be seen first and who has a higher priority for hospital admission. The transmission of SARS-COV2 can be affected by many different variables, including rate and type of vaccination, age range, preventative measures and treatment approaches, the rate of infection, the seriousness of the infection, the actions of the population and policy decisions by governments. Previous attempts to predict epidemics have been largely unsuccessful due to the increase in frequency of international travel, which is resulting in an increase in transmission rates and therefore an inability to track an emerging disease over time (BMC 2015) . The process can be enhanced by intricately modelling predictive distributions rather than focusing on point estimates, taking into account multiple effect dimensions, and performing regular assessments of the models based on their verified performance (Ioannidis et al. 2020 ). On the other hand, the virus has begun to mutate, causing new variants to emerge, which has made it increasingly difficult to forecast how the pandemic will continue. The constant evolution of the virus creates new mutations, which are expected to generate novel variants of the virus. While variants are known to occur and dissipate, it has been reported that multiple variants of the virus have been detected across the globe as the pandemic has progressed. Viruses undergo a process of continual and constant change and diversification, which is exemplified by the recent emergence of new mutations, four of which have attracted special attention from world authorities (Centers for Disease Control and Prevention Reports have indicated that these variants can be transmitted more easily and rapidly compared to other variants, which could lead to an increase in the number of infections. If such a situation arises, hospitals will be flooded due to the increase in the number of hospitalisations along with a possible increase in the death rate. According to Al-Raeei (2022), data regarding the COVID-19 pandemic from Cyprus, Egypt, Spain and another nine countries until September 2020 revealed no relationship between the force of infection and timing, weather or geographical location. Even so, there has been no research (to the authors' knowledge) correlating the increase in SARS-COV2 infections with the geographical location of a country, rather than at a local level. The present study focuses specifically on European countries due to the fact that their data recording systems are superior to those in other regions. For example, the majority of the nations in Africa do not have confirmed measurements that validate COVID-19 cases on a daily basis, which results in most of those countries being out of sync with the rest of the world. Therefore, an assumption can be made that caution should be exercised when interpreting daily COVID-19 case data due to significant differences in the recording processes employed, as it is evident that many cases are not documented (Bohk-Ewald et al. 2021; Musa et al. 2021) . Many attempts to predict future trends in COVID-19 cases have failed, and a critical factor in this lack of success is the unavailability of official data. Nevertheless, as the pandemic has now been ongoing for more than 18 months, the data continue to expand in volume, which could lead to improvements in the forecast accuracy. Furthermore, many countries are meticulously examining data from nations who have reported that their case numbers are rapidly increasing in order to predict whether similar patterns will occur in their own populations. However, the reliability of the results is not always high, as one country's trends may not conform to another's standards, and researchers have not been able to confirm that countries are correlated in terms of their COVID-19 cases. Since the virus first emerged approximately 1.5 years ago, almost the entire globe has been affected by the increase in the number of infections, which has led to various reactions and strategies based on the daily reported cases. The transmission of SARS-COV2 throughout the population can be affected by a number of factors, such as rate and type of vaccination, age group, gender, and preventative measures implemented by the authorities and policy decisions by governments, which are frequently hard to measure or predict. Indeed, certain variables are always changing, such as the virus itself, which has evolved to produce a number of different mutations, thus rendering the process of making accurate predictions of future trends particularly difficult. The ability to accurately predict COVID-19 infection cases and future trends before the event would certainly be of interest to the authorities and various institutions, as they could use it to develop suitable and appropriate strategies and to prepare for new cases before they occur. The present research aimed at the basic and real-time prediction of the spread of SARS-COV2 infections in the near future by analysing the correlations among countries in terms of their officially recorded daily COVID-19 cases. According to our search of the literature, there have been no previous attempts to forecast how COVID-19 cases will spread in the near future by analysing how the case numbers of different countries in Europe are correlated. It is possible to easily implement the proposed method (country correlations) to predict the transmission behaviour of viruses going forward because it is critical that authorities can precisely forecast new COVID-19 cases in order to formulate appropriate strategies and use available resources effectively. The present study analysed data regarding the number of daily official COVID-19 cases for each country across Europe in order to produce a possible correlation between the number of cases and geographical position, using Pearson correlation and the Microsoft Excel Analysis ToolPak, without taking into account the limitations of the present study (as presented in the section "Limitations of the current study"). The results (very strongly, strongly, moderately, and not related) for the correlation statistics are presented in Figs. 1, 2, 3, 4 and 5 and Table 1 of the current paper. During the analysis, the number of SARS-COV2 infections verified per day was examined for the European countries for the period from 22 January 2020 to 28 August 2021 (approximately 76 weeks) using the correlation method to determine whether the different countries were correlated. A strong correlation among two or more nations suggests that they have had highly similar COVID-19 case patterns during the past 535 days, leading to the expectation that such a correlation would be reflected in the future. The identification of such correlations will allow particular countries to easily scrutinise the country/countries with which they are correlated to generate accurate predictions of COVID-19 infection cases and patterns in the upcoming period. This prediction will also include variables that may change in the future, such as mutations of the virus. In order to determine whether the investigated countries are correlated, up-to-date data reflecting the number of cases were gathered. In this context, epidemiological data showing daily COVID-19 infection cases recorded in countries across Europe after 22 January 2020 were obtained from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE), along with other resources such as the WHO (Johns Hopkins University: Center for Systems Science and Engineering (CSSE) 2020) as well as the Novel Coronavirus (COVID-19) Cases database (Humanitarian Data Exchange (HDX) 2021). These data covered more than 45 nations and were used to analyse the correlations between the case numbers in the different countries between 22 January 2020 and 22 August 2021. The website of the WHO (https:// world healt horg. shiny apps. io/ covid/) provides the latest data on daily SARS-COV2 infections on a global basis, and users can use the "Overlay" functionality to select at most nine countries in order to compare their data. The statistics used in this research were extracted from the WHO COVID-19 Explorer to ensure that it was consistent with the WHO layout. It is important to note that the WHO does not require users to request approval to utilise or download data from the WHO COVID-19 Explorer (World Health Organization 2020). Correlation is a statistical measurement that reflects the nature of the linear relationship between two different variables. It is a frequently used instrument that is capable of describing relationships, although it does not explain cause and effect. When using a correlation method (the most commonly used being Pearson correlation), a correlation coefficient (r) value from − 1 to 1 is allocated, where a value of 0 denotes no correlation (when r moves towards zero, the linear relation weakens), 1 denotes that the correlation is positive, and − 1 denotes that it is negative. For the purpose of this research, the Microsoft Excel 2010 Analysis ToolPak add-in was utilised for calculating the correlation coefficients (Microsoft 2021 ) amongst more than 45 nations included in the WHO database between 22 January 2020 and 28 August 2021. The correlation for r values in the intervals of 0.8-0.85, 0.8-0.7, and < 0.7, the concerned countries are considered to be strongly correlated, moderately correlated, and either weakly correlated or uncorrelated, respectively. The design of the current study was subject to limitations. To investigate the geographical relationship between infection cases in Europe, other variables such as the rate and type of vaccination, age range, preventative measures and treatment approaches, actions of the population and policy decisions by governments were excluded. Similarly, environmental pollution (i.e. air pollution) and demographic factors that possibly affect the spread of SARS-COV2 were also excluded. The reason for setting the boundaries of the statistical research in this way was to independently investigate the effect of geographical position on the correlation between European countries' trends in COVID-19 cases, so as to provide a possible real-time prediction of the direction and trend of the spread without the need for multiple correlation coefficient statistical analysis. At the same time, the quick development of variations in the virus could not be included as a variable in the study for two reasons. Firstly, knowledge regarding the mutations is still limited, and thus the rate at which they are distributed could not be allowed to interfere with geographical data. Secondly, the future number of variations across the EU could not be predicted, as the virus is undergoing a continuous adaptation process of diversification where it constantly forms new mutations with rates of infection that are, for now, impossible to predict (Centers for Disease Control and Prevention 2021). Furthermore, as the recording process of each country cannot be validated in terms of its continuity, a limitation on our ability to correlate the results arises. At the same time, since the variable (the number of COVID-19 cases) remains the same, an assumption was made that all recorded data were obtained with the same methodology. During this work, Excel was used to generate coefficients for the investigated nations. Figure 1 presents only a small portion of the work done on the correlation coefficients using Microsoft Excel. The correlations between neighbouring countries are presented in Table 1 . In this research, if the correlation coefficient (r) between nations was greater than 0.8, this indicated that they were strongly correlated and that the numbers of daily COVID-19 infections in the different countries were significantly related. In Fig. 2 , the numbers of new verified cases of COVID-19 (confirmed on a regular basis to smooth the curve) per million population for the investigated nations are shown. As can be observed in Fig. 2a , Austria and Romania are strongly correlated with regards to the trend in COVID-19 cases, as the number of cases fluctuate within both countries with only a brief lag between their maximum case numbers (approximately 1 week); for example, in Austria, cases peaked on 10 November 2020, while the peak occurred on 17 November 2020 in Romania. While Austria and Ukraine are also correlated (Fig. 2b) , the time lag between the peaks for both countries is longer: it exceeds 2 weeks (it occurred on 26 November 2020 in Ukraine). Austria and Georgia are moderately correlated, with the number of COVID-19 cases peaking on 7 December 2020 in Georgia, which was approximately 25 days after the peak in Austria. Austria was not found to be correlated with Ireland (as shown in Fig. 2d) , because when the number of cases rose in Austria during the studied period, it was observed to decrease or remain at the same level in Ireland, and vice versa. Analysis of the correlations among the examined countries using the available data on daily COVID-19 cases for the examined period (from the end of January 2020) revealed that the following countries are highly correlated: Albania, Austria, Azerbaijan, Croatia, Czechia, Denmark, Estonia, Finland, Georgia, Hungary, Italy, Lithuania, Luxembourg, Moldova, Montenegro, North Macedonia, Poland, Romania, Slovakia, Slovenia, Switzerland and Ukraine. However, the following nations are either weakly correlated (r < 0.7) or have no relationship at all (i.e. there is no country in Europe with whom they are correlated): France, Iceland, Israel, Kosovo, San Marino, Spain, Sweden and Turkey. Conversely, four of the countries in central Europe have strong correlations with five other countries; for example, Romania is strongly correlated with Austria, North Macedonia, Italy, Poland and Ukraine (see Fig. 3a ), while Ukraine Fig. 3b ), North Macedonia has correlations with Italy, Austria, Moldova, Poland and Romania (see Fig. 3c) , and, lastly, Poland has correlations with North Macedonia, Italy, Hungary, Austria and Romania (see Fig. 3d ). Although it is observed that some adjacent countries are strongly correlated, the findings also indicate that countries that are not close geographically are also correlated. For example, there is a correlation between Azerbaijan and Croatia, which are located in Eastern and Central Europe, respectively. Additionally, Belgium and Armenia, which are situated on opposite sides of Europe, are also correlated (see Fig. 4 ). The number of nations that are highly correlated is significantly lower than the number that are moderately correlated. This is because the majority of these nations did not experience exactly the same trends in terms of COVID-19 cases, which means that it may not be possible to accurately predict when cases will suddenly increase. For example, while a moderate correlation was detected between Albania and Moldova, as both had peaks within a period of 1 week in December 2020, Moldova experienced its second peak around 2 months after Albania, as illustrated in Fig. 5 . At the start of the analysis, we determined which countries in Europe experienced similar patterns in terms of daily official COVID-19 cases, along with the strengths of the correlations. Table 1 presents a summary of the outcomes of this analysis. The findings indicate that while there are correlations between nations in group A and those in group B, it is not always the case that nations within group B are strongly correlated. This means that correlations of various degrees have been identified between groups A and B (B-I, B-II, B-III). It should be taken into account that up-to-date data from 4 July 2021 were used in the analysis. The ability to accurately forecast the number of COVID-19 cases and future case trends would certainly assist governments and various organizations in strategizing and preparing for newly infected cases well in advance. Many predictions have failed to foresee future COVID-19 cases due to the lack of reliable data; however, such data are now widely available for predicting future trends in COVID-19 after more than one and a half years of the pandemic. Also, various countries are closely monitoring other countries that are experiencing a surge in COVID-19 cases in the expectation of similar scenarios, but this does not always produce correct results, as no research has identified specific correlations between different countries in terms of COVID-19 cases. As a result of the exponential growth in COVID-19 cases, it has becoming increasingly necessary to be able to predict the spread of the virus in advance, as this will help governments and local authorities to plan required actions, including the deployment of human resources and medical equipment, among other factors. The ability to accurately predict the number of future cases will undoubtedly be advantageous for governments and different organizations, as they can develop suitable strategies and plans for new emerging infections before they occur. Due to the increase in the number of COVID-19 cases, it is essential to predict how the virus will spread in advance, as this will help both governments and local authorities to plan appropriate actions, such as the deployment of human resources and medical supplies. A number of variables can influence how COVID-19 is spread, such as the type and rate of vaccination, age group, gender, the transmission rate, mortality rate, the actions of citizens within the country, and measures implemented by the authorities. Efforts to predict epidemics in the past have had mixed levels of success, a problem highlighted by the current COVID-19 pandemic. In this study, we identified European countries that had similar trends in terms of the number of COVID-19 cases recorded on a daily basis over the past 76 weeks, with varying levels of correlation. The results indicated that Central European countries have more correlations with other countries compared to the rest of Europe. The limitations of the research mentioned in the section "Limitations of the current study" imply that geographical location cannot be considered the only vital correlation factor, as some countries that are not geographically close are related; indeed, some countries situated on opposite sides of Europe are correlated (e.g. Belgium and Armenia). Some countries (France, Iceland, Israel, Kosovo, San Marino, Spain, Sweden and Turkey) need to be considered individually, as they were either weakly correlated or unrelated to any other country. Further study of the geographical position of European countries compared to the spread of COVID-19 is needed to provide further correlations between them. As for the outliers of the statistical analysis (countries that show no correlation between geographical position and case number trend), further study of possible correlated variables other than geographical position, such as genetic similarities, age range, and vaccination rate and type, between populations of neighbouring countries, both independently and with multiple correlation analysis, could be a beneficial area of investigation for decoding the spread of SARS-COV2. SutteARIMA: short-term forecasting method, a case: Covid-19 and stock market in Spain Optimization method for forecasting confirmed cases of COVID-19 in China Numerical simulation of the force of infection and the typical times of SARS-CoV-2 disease for different location countries Data-based analysis, modelling and forecasting of the COVID-19 outbreak Comparative analysis of gated recurrent units (GRU), long short-term memory (LSTM) cells, autoregressive Integrated moving average (ARIMA), seasonal autoregressive Integrated moving average (SARIMA) for forecasting COVID-19 trends Tunisia says Covid-19 situation is 'catastrophic Prediction of COVID-19 cases using the weather integrated deep learning approach for India On biology: predicting epidemics A demographic scaling model for estimating the total number of COVID-19 infections Estimation of COVID-19 prevalence in Italy Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: a data-driven analysis Coronavirus disease 2019 (COVID-19): forecast of an emerging urgency in Pakistan Reconstructing and forecasting the COVID-19 epidemic in the United States using a 5-parameter logistic growth model Time series forecasting of COVID-19 transmission in Canada using LSTM networks COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: a data driven model approach Time series analysis and predicting COVID-19 affected patients by ARIMA model using machine learning Significant reduction in pediatric, population-based hospital admissions due to COVID-19 in Malta Machine learning approach for confirmation of COVID-19 cases: positive, negative, death and release Reducing demand uncertainty in the platelet supply chain through artificial neural networks and ARIMA models Epidemiological study of novel coronavirus (COVID-19): macroscopic and microscopic analysis Multiple-input deep convolutional neural network model for COVID-19 forecasting in Chine Variational-LSTM autoencoder to forecast the spread of coronavirus across the globe Forecasting for COVID-19 has failed 2020) COVID-19 data repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University Role of geographic risk factors in COVID-19 epidemiology: longitudinal geospatial analysis Machine learning approaches in Covid-19 severity risk prediction in Morocco Forecasting COVID-19 cases by assessing control-intervention effects in Republic of Korea: a statistical modeling approach Measuring the level of environmental performance on coastal environment before and during the COVID-19 pandemic: a case study from Cyprus The impact of COVID-19 on supply decision-makers: the case of personal protective equipment in Spanish hospitals Addressing Africa's pandemic puzzle: perspectives on COVID-19 transmission and mortality in sub-Saharan Africa Geographic disparities in COVID-19 case rates are not reflected in seropositivity rates using a neighborhood survey in Chicago Neural network based country wise risk prediction of COVID-19 Forecasting spread of COVID-19 using Google Trends: a hybrid GWO-deep learning approach Time series forecasting of COVID-19 transmission in Asia Pacific countries using deep neural networks ARIMA modelling & forecasting of COVID-19 in top five affected countries Visibility forecasting using autoregressive integrated moving average (ARIMA) models A multimethod approach for county-scale geospatial analysis of emerging infectious diseases: a cross-sectional case study of COVID-19 incidence in Germany Automated pipeline for continual data gathering and retraining of the machine learning-based COVID-19 spread models COVID-19 infection: origin, transmission, and characteristics of human coronaviruses A bridge between sentiment indicators: what does Google Trends tell us about COVID-19 pandemic and employment expectations in the EU new member states? Prediction of the COVID-19 pandemic for the top 15 affected countries: advanced autoregressive integrated moving average (ARIMA) model Impacts of geographic factors and population density on the COVID-19 spreading under the lockdown policies of China Prediction for the spread of COVID-19 in India and effectiveness of preventive measures A better answer to the ventilator shortage as the pandemic rages on World Health Organization (WHO) (2020) WHO COVID-19 Explorer World Health Organization (WHO) (2022a) WHO Coronavirus World Health Organization (WHO) (2022b) Coronavirus (Covid-19) data Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Acknowledgements We greatly thank Johns Hopkins University and the World Health Organization for publicly and freely providing COVID-19 data. Conflict of interest All authors declare that they have no conflicts of interest. Jihan Muhaidat 1 · Aiman Albatayneh 2 · Ramez Abdallah 3 · Iliana Papamichael 4 · Georgia Chatziparaskeva 4