key: cord-0925130-78ycyidn authors: Lo, Kai Lisa; Zhang, Minglei; Chen, Yanhui; Mi, Jinhong Jackson title: Forecasting the Trend of COVID-19 Considering the Impacts of Public Health Interventions: An Application of FGM and Buffer Level date: 2021-09-07 journal: J Healthc Inform Res DOI: 10.1007/s41666-021-00103-w sha: 1c7e53fbcf65c66e5fa026f6942d553a82f63511 doc_id: 925130 cord_uid: 78ycyidn PURPOSE: COVID-19 is still showing a tendency of spreading around the world. In order to improve the subsequent control of COVID-19, it is essential to conduct a study on measuring and predicting the scale of the outbreak in the future. METHODS: This paper uses rolling mechanism and grid search to find the best fractional order of Fractional Order Accumulation Grey Model (FGM). Buffer level is proposed based on the general form of weakening buffer operator to measure the effect of government control measurements on the epidemic. And the buffer level is associated with the Government Response Stringency index and the Mobility Index. RESULTS: Firstly, the model proposed in this paper dominates the ARIMA model which has been widely used in predicting the confirmed COVID-19 cases. Secondly, in the process of using the buffer level to modify the FGM, this paper finds that government measurements require the active cooperation of the public and often have a time lag when they are effective. Only when government increase its stringency and the public observe the order can the spread of COVID-19 be slowed down. If there is only the controlling measure and the public does not react actively, it will not slow down the epidemic. Thirdly, according to the Mobility Index and Government Response Stringency Index in December, this paper predicts the cumulative confirmed cases of the end of January in different scenarios according to different buffer levels. The study suggests that the world should continue to maintain high vigilance and take corresponding control measures for the outbreak of COVID-19. CONCLUSIONS: Government’s control measures and public’s abidance are both important in this battle with COVID-19. Governments control measures have time-lag effect and the time lag is about 9 days. When the government increases its stringency and the public cooperates with the government, we must consider the weaken buffer operator with proper buffer level in the prediction process. These prediction methods can be considered in the prediction of COVID-19 confirmed cases in the future or the trend of other epidemics. Global cumulative confirmed cases by regions. Note: The histogram shows the cumulative number of confirmed cases in various regions since April, and the cumulative number of confirmed cases worldwide is mainly distributed in Asia, Europe, and North America and the scale of confirmed cases is still essential, which can improve the subsequent control of Regarding the prediction of confirmed cases, it becomes a hot topic since the outbreak of COVID-19. Because accurate prediction of confirmed cases can help the government to arrange appropriate preventive measures and allocate medical resources scientifically, in the past few months, hundreds of related research articles have been published. Generally speaking, the models used in these articles fall into three categories. The first category uses linear regression model, among which ARMA/ARIMA is the most commonly used model. For example, Sharma et al. [27] used ARIMA (2,1,0) to forecast the short-term infected and recovered cases for Saudi Arabia. de Lima et al. [7] integrated ARIMA models for the real-time prediction of cumulative cases of Covid-19 in Brazil. Choi and Ahn [4] forecasted daily imported COVID-19 cases in South Korea with ARIMAX. Malki et al. [21] used the seasonal ARIMA model to predict the spread of COVID-19 in several selected countries. Other similar researches include Abuhasel et al. [1, 28, 29] , Singh et al. [28, 29] , and Yousaf et al. [36] . The other linear methods include quantile regression [23] and Kalman Filter [28, 29, 37] . The second uses artificial intelligence methods, since the increase of confirmed cases is nonlinear. With this in mind, Hazarika and Gupta [11] proposed a wavelet-coupled random vector functional link networks to predict cumulative number of positive cases for five countries. Chimmula and Zhang [3] adopted LSTM networks to predict the future infections in Canada. Other nonlinear growth models constitute the third category. Susceptible-infected-recovered (SIR) model is the most used models in this category. It is very popular in simulating the trend of epidemic. Liu et al. [16] used a modified SIR model to simulate the spread of COVID-19 in the USA. developed a generalized SEIR (susceptible-exposed-infected-recovered) model for the spread of COVID-19, taking into account mildly and symptomatically infected individuals. Venkatasen et al. [30] and Sarkar et al. [26] adopted different forms of SIR model for India. Besides SIR model, Mahanty et al. [20] used Gompertz and Verhulst model to predict the number of infected cases for the next 28 days. With reference to the three mainstream forecasting methods, we find that the first two methods are data-driven methods. Most of them use the number of confirmed cases only, such as Sharma et al. [27] , Chimmula and Zhang [3] , and Hazarika and Gupta [11] . The researches using the SIR model usually involves the impact of prevention measurements. For example, M. Liu et al. [16] took into account the effect of social distancing. Venkatasen et al. [30] examined the effectiveness of containment and lock-down. Uncertain systems often have some problems, such as incomplete information and inaccurate data. As a result, constructing complex models often fails to achieve corresponding accurate results. Under the condition of less data environment, the grey system theory makes full use of the known "minimum information," gives priority to extract the more valuable "new information," and realizes the modeling of the whole system. It can be used to predict the changing trend of the system in the future (Deng, 1982; [19] ). In order to improve the accuracy of prediction, many scholars derived more models from the traditional GM (1,1), such as Grey Verhulst model [8, 32, 38] , Grey Bernoulli model [2, 33] , and Fractional Order Accumulation Grey Model [35] . Grey system models are also applied to the prediction of COVID-19 epidemic, because it is proved to be effective for the data prediction of exponential growth. Due to the exponential growth in the number of cumulative confirmed cases, Şahin and Şahin [25] employed GM (1,1), nonlinear grey Bernoulli model (NGBM (1,1)) and fractional nonlinear grey Bernoulli model (FANGBM (1,1)) to predict the number of cumulative confirmed cases. Liu et al. [17] suggested that Fractional Order Accumulation grey model can accurately predict the spread of disease. Both studies focused on early-stage trends of COVID-19, but neither took into account the impact of social distancing, home quarantine order, and lock down. However, with the passage of time, the world is deeply involved in the secondary impact of COVID-19. Attention to the epidemic trend of the disease remains crucial. It is of more practical significance to predict the epidemic trend of the disease taking into account the influence of prevention and containment. When the development of the system is disturbed, the future trend of the system changes accordingly. The problem in forecasting at this time does not lie in the pros and cons of the model, but in the fact that the information provided by the system cannot accurately reflect the future trend. It is difficult to use the information directly to make accurate prediction. In order to deal with this problem in system prediction, Liu [18] proposed the concept of buffer operator in the grey system predictive method. The operator which slows down the growth rate of the sequence is called the weakening buffer operator (WBO). Many scholars have proposed a variety of buffer operators based on different data to accurately predict the system with disturbance [5, 6, 31, 34] , among which Wei and Kong [34] summarized the form of buffer operator and proposed a general form. As the COVID-19 spreads across the world, many disturbance factors are added to the prediction of the cumulative confirmed cases, which has an impact on the future trend. With regard to this problem, home quarantine order, gathering restriction, school closure, and international movement restriction issued by the government can effectively slow the spread of the virus, keep the public from being infected, and finally achieve the effect of controlling the spread of the epidemic. However, the implementation of the control measures varies greatly among the people of different countries, and the public's awareness of the COVID-19 and related control measures lags behind. When predicting the future trend of the cumulative number of confirmed cases, the introduction of relevant government control measures and public response should also be taken into consideration. In China, for example, after experts confirmed that the SARs-CoV-2 could be transmitted from person to person, a lockdown is imposed in Wuhan, China, on January 23, 2020. Other control measures are further taken, such as gathering restriction, mandatory wearing masks in public places, partial traffic control and restriction on international movement. Since February 2, home quarantine orders have been comprehensively strengthened nationwide. China is capable of control the COVID-19 epidemic soon because it adopts a comprehensive approach to control the spread of the disease. This is diametrically opposed to the early concept of herd immunity in some western countries, which underestimates the severity of the epidemic and delays control measures such as home quarantine orders. China's COVID-19 epidemic occurred in January, which coincided with the Spring Festival. It is also a time when China's annual Spring Festival travel rush is about to reach its peak. Visiting relatives and friends during the Spring Festival has been a Chinese tradition for thousands of years. Large-scale interpersonal communication activities accelerated the spread of the disease and many people did not realize the seriousness of the COVID-19 epidemic in the early stage after the lockdown in Wuhan. The Chinese government made a great deal of publicity on the prevention and control of the COVID-19 epidemic, as well as the full implementation of control measures, including persuasion to those who failed to comply with the regulations and accountability for those who seriously violated the regulations. More and more Chinese took the initiative to implement control measures and stay at home. As shown in Fig. 2 , this leads to a peak in the number of new daily cases in China on February 12, and the growth rate has slowed significantly since March. The indicator used to measure the level of government control measures in Fig. 2 comes from the Government Response Stringency Index (hereinafter referred to as "the Stringency Index") constructed by scholars from University of Oxford. In this paper, we use rolling fractional order accumulation grey model to predict the cumulative number of confirmed cases in various countries. In order to include control measures and public response in the prediction process to get more accurate results, we propose buffer level ∆ based on the general form of weakening buffer operator proposed by Wei and Kong [34] . We calculate the buffer level ∆ from the historical data of countries and introduce appropriate buffer level in the prediction according to the actual control measures and public response of each country. The introduction of different levels of buffer level can not only enhance the forecasting accuracy, but also provide predicted values in different scenarios. The reminder of the research is arranged as follows: Section 2 introduces the model and methods, including FGM and buffer level. Section 3 shows the model GM (1,1) is the most classical and basic model in grey system theory. Driven by practice, academic research on GM (1,1) has been very active over the past three decades. Scholars derive many models from the traditional GM (1,1) to improve the forecast precision and apply them to more new fields. Fractional Order Accumulation Grey Model (FGM) is one of these derived models. A standard FGM (1,1) is established as follow. Definition 1 It is assumed that the original sequence is. According to the definition of Gamma function, X (r) can be rewritten in the form of Gamma function as follow. Z (r) represents the generated mean sequence of consecutive neighbors of X (r) . Z (r) = z (r) (1), z (r) (2), ⋯ , z (r) (n) , in which Definition 2 If X (0) ,X (r) ,Z (r) are described as Definition 1, the following formula is obtained. This formula is the original form of FGM (1,1), and is the whitening equation of the FGM (1,1). Theorem 1 If X (0) ,X (r) are described as Definition 1, the parameter vector â = [a, u] T can be estimated by the least square method as follows. The solution to the whitening equation of FGM (1,1) is shown as follows By substituting the parameter vector â = [a, u] T into the formula (7), the fitting values sequence X (r) can be obtained. The fitting value sequence X (0) can be calculated by the r-order inverse accumulation. The rolling mechanism updates the initial value of the sequence, excludes the early information that loses its timeliness from the sequence, and adds new information with timeliness to the sequence, which can effectively enhances the predictive accuracy. This is in line with the principle of new information priority, which is one of the basic principles in grey system theory. The effect of new information on cognition is greater than that of old information, which is the information view of the whole grey system theory [19] . Many literatures that use the grey system method to model the confirmed, recovered, and death cases of COVID-19 adopt the rolling mechanism [17, 38] . The example of rolling mechanism used in this paper is shown in Fig. 3 . The original sequence is Rolling sequence length is set first, which is n − 1 in Fig. 3 . The FGM (1,1) is established according to the rolling sequence x (0) (1), x (0) (2), ⋯ , x (0) (n − 1) . In the in-sample prediction, the predicted value x (0) (n) is used to evaluate the predictive accuracy of the model. When rolling sequence is updated, the initial value of the rolling sequence is excluded and the original value x (0) (n) is added. In the out-of-sample prediction, the real value x (0) (n + 1) remains unknown. So, when rolling sequence is updated, the initial value of the rolling sequence is excluded and the predicted value x (0) (n + 1) is added instead of the real value. As it goes on, the new predicted value x (0) (n + 2) will be added in the next update. Through the mechanism above, we make full use of all the new information in the original sequence and exclude it when its timeliness weakens, which can effectively improve the predictive accuracy. Compared with the GM (1,1) model, FGM (1,1) is characterized by the use of fractional order accumulation. According to the research of Wu et al. [35] , when the fractional order r increases, the FGM model increases the weight of the old information. Therefore, when r is relatively small, FGM (1,1) is suitable for modeling short memory process. Conversely, when r is relatively large, FGM (1,1) is suitable for modeling the long memory process. In this paper, we use grid search to optimize the fractional order r. Grid search is an enumeration method for adjusting parameters. It is very suitable for small-scale and limited parameter adjustment. In the previous literatures, the fractional order r is accurate to one decimal place. We extend it to three decimal places, which generate 1000 grids on interval (0,1). We compare the in-sample predictive accuracy obtained by 1000 grids, select r with the highest predictive accuracy, and output its subsequent prediction value. The flow chart of the rolling FGM (1,1) is shown in Fig. 4 . In order to include control measures and public response in the prediction process to improve the forecasting accuracy, we propose buffer level ∆ based on the general form of weakening buffer operator proposed by Wei and Kong [34] . They summarized the form of buffer operator and proposed a general form as follows. In the in-sample prediction, the predicted value x (0) (n + 1) is estimated by the updated original sequence. In the out-sample prediction, the predicted value x (0) (n + 2) is estimated both by the updated original sequence and the predicted value x (0) (n + 1) = 1 , 2 , ⋯ , n is its corresponding weight vector, in which i > 0, i = 1,2, … , n . The sequence with buffer operator When < 0 , D is a weakening buffer operator (WBO) for both monotonically increasing sequence and monotonically decreasing sequence. When > 0 , D is a strengthening buffer operator (SBO) for both monotonically increasing sequences and monotonically decreasing sequences. When = 0 , D is an identity operator, which means there is no buffering. When the control measures are reinforced and the public respond actively, the growth trend of the cumulative number of confirmed cases slow down. Therefore, we only consider the WBO and identity operator in the general form, when ≤ 0 . We want to obtain the buffer operator through historical data, so we propose buffer level ∆ based on the general form of weakening buffer operator. We can obtain Δ from the formula Δ = − . As ≤ 0 , it means that Δ ≥ 0 . For simplicity, it is assumed that 1 = 2 = ⋯ = n . Then, formula (11) can be simplified to Proposition 2 Δ is defined as buffer level. When x (0) (k) > 1 , as Δ increases, the weakening effect on the sequence will be enhanced. Proof Let Δ 1 < Δ 2 , which means the original sequency will be buffered at different levels. The buffer operators are D 1 and D 2 separately. The sequency with buffer operator D 1 is. And the sequency with buffer operator D 2 is In which, When X (0) is monotonically increasing, Flow chart of rolling FGM (1,1). Note: This is the flow chart of the rolling FGM (1,1) model. After obtaining the raw data, we set the rolling sequence length and generate the initial rolling sequence. After the optimal parameter is calculated by grid search, we establish the FGM (1,1) model and update the rolling sequence until the predictive aim is reached So When X (0) is monotonically decreasing, To be specific, the sequence curve of X (0) D 2 is flatter than X (0) D 1 . In other words, through buffering, the increasing or decreasing speed is reduced. The larger the value of Δ is, the larger the speed is reduced. Because Δ is the negative value of α, the weakening operator D constructed from Δ satisfies three axioms of buffer operator proposed by S. Liu [18] . The calculation of buffer level takes China and Japan as an example. The early epidemic control in both China and Japan achieved certain results. After lockdown was implemented in Wuhan, China, on January 23, control measures were strengthened comprehensively nationwide. China's Stringency Index rose rapidly from 26.39 before the lockdown of Wuhan to 81.02 in early March. The increase in the number of daily confirmed cases has been significantly slow since March, and the spread of the disease has been initially controlled. At the beginning of March, the buffer level is estimated by constructing a corresponding weakening buffer operator to minimize the MAPE (mean absolute percentage error) of the sequence in a period of time after March. It can be estimated that the buffer level ∆ of the China's epidemic control in early March is about 1.5, which is bigger than the buffer level of AWBO. The early epidemic of COVID-19 in Japan is shown in Fig. 5 . With the emergence of a small-scale spread of COVID-19 on Feb. 6, 2020, the Japanese government gradually began to take measures such as gathering restriction, international movement restriction, and information campaigns. On April 7, the Japanese government issued the home quarantine order. In the following week, Japan's Stringency Index reached 47.22, which was the peak of the index as of November 31. We use the Google COVID-19 mobility dataset to construct a Mobility Index. This dataset contains the mobility trend of people in various places compared to the baseline (Jan 3 to Feb 6). We choose retail and recreation mobility and residential mobility. Generally speaking, reduction in the mobility of retail and recreation and increase in the mobility of residential means that people respond actively to government control measures. However, the raw data of mobility has obvious weekly periodicity. Therefore, we process the raw data in a seven-day moving average in order to smooth the unnecessary periodicity. As the retail and recreation mobility is almost all negative, we change it into additive inverse value and accumulate it with residential mobility. We construct an index namely Mobility Index, which can reflect public response to COVID-19 and government control measures. Since Google does not have relevant services in China, we calculate the buffer level of China on the basis of the Stringency Index. For Japan and other countries, we can take both Stringency Index and Mobility Index into consideration. Since the implementation of the home quarantine order in April, mobility has increased significantly and the epidemic has been effectively controlled. The buffer level of Japan was estimated from late April to mid-May, which is about 0.9. It is slightly smaller than 1, the buffer level of AWBO. The raw data of Stringency Index is collected from publicly available sources such as news articles and government press releases and sorted by the team from Oxford University. The raw data of Mobility Index is collected and sorted by Google. The data of cumulative confirmed cases used in this paper are collected from Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The data of cumulative confirmed cases used in this paper for training and testing is from January 2 to November 26, 2020, and the total length of the sequence is 330. Nine countries are contained in our model and prediction, including China, Japan, the USA, France, the UK, Germany, Brazil, India, and Argentina. We first select the optimal rolling sequence length for each country. Table 1 reports the MAPE of different rolling sequence lengths in nine countries. Except for China, which has large outliers in the early stage when the length is 7 and 8, the MAPEs of these countries are all less than 10. It is initiatively proved that rolling FGM (1,1) can well fit the cumulative number of confirmed cases in these countries. In this paper, the length with the minimum MAPE is selected as the optimal length of the corresponding country. The actual value and predicted value by the optimal rolling FGM (1,1) in various countries are shown in Fig. 6 . The other eight countries show an obvious growth trend, while the cumulative number of confirmed cases in China is stable. In this paper, rolling FGM (1,1) is compared with the ARMA, which is the most commonly used time series model in modeling the cumulative confirmed cases. The specifications of ARMA are different for different countries. We use AIC and BIC to choose the best order of ARMA. The specifications of ARMA are shown in the Appendix (Table 7) . We use the data from January 2 to October 31 as the training set and the data from November 1 to November 26 as the test set. Table 2 reports the predictive accuracy of a 14-day test set (from November 1 to November 14), and Table 3 reports the predictive accuracy of a 26-day test set (from November 1 to November 26). The predictive accuracy is calculated in both MAPE and RMSE. According to Tables 2 and 3, the predictive accuracy of the rolling FGM (1,1) is better than that of the ARMA in 14-day and 26-day test set. The MAPEs of the 14-day test sets in all countries are less than 10%, and the MAPEs of the 26-day test sets are less than 15%. This indicates that the FGM (1,1) model can effectively predict the cumulative number of confirmed cases. Figure 7 compares the performance of ARMA and FGM (1,1) in the 26-day test set. With the exception of Brazil and India, the FGM (1,1) model shows good predictive ability. With reference to the forecast of cumulative confirmed cases in Brazil and India in November, our model is not better than ARMA; our model predicts that there will be more cumulative confirmed cases in Brazil and India. In fact, related studies point out that the official cumulative confirmed cases of Brazil and India may be understated. Marinho et al. [22] pointed out that there was little statistical correlation in the number of confirmed cases in Brazil, because as of October 29, 2020, only less than 7% of the population had been tested. The number of confirmed cases in Brazil was underestimated due to the limitation of testing capacity and speed. Mukherjee [24] pointed out that the test speed in India was lower than that in the early stage of the outbreak, and the extensive use of only RAT (Rapid Antigen Test) instead of RAT + RT-PCR testing protocol led to a large number of false negatives in the test results, which seriously affected the reliability of Indian official data. As a result, FGM (1,1) which proves to be effective for the data prediction of exponential growth does not achieve excellent predictive results. As the increase of control measures and public response have a weakening effect on the cumulative confirmed number in the future, it is possible that using all the new information and excluding the old information timely may not predict the future trend accurately. This feature appears in the prediction of cumulative confirmed cases in Italy, of which MAPE reaches 25.2%. The problem comes down to the ignorance of the control measures and public response. Applying the buffer level to adjust the rolling sequence of Italy, the result with better predictive accuracy is obtained. Since the end of October, Italy has strengthened control measures such as traffic control and home quarantine order, making the Stringency Index gradually increase from 50 to 66.7 in early November. As shown in Fig. 8 , the Mobility Index increases at the same time. This indicates that people actively respond to these policies and their lifestyles change partly. These factors have a weakening effect on the future trend. Using FGM (1,1) only to predict produces a relatively large error. In the middle and late March of 2020, the Italy Government implemented relatively strict control measures and the public complied with these measures. This allows the growth trend of COVID-19 epidemic to be controlled, so we use the historical data to estimate the buffer level to be 0.55 at that time. By applying the buffer level to the data in early November (9 or 10 days after the increase of the We calculate the MAPE of the FGM (1,1) model with a rolling sequence length of 5-10 to select the optimal rolling sequence length for each country. Except for the USA, Britain, and Germany, the optimal rolling sequence length is 6, and the optimal rolling sequence length for other countries is 5 Rolling sequence length Stringency Index), the predictive accuracy is greatly improved, with MAPE decreasing from 25.2 to 2.84%. The MAPE of FGM in the 14-day test set is the same as that of ARMA, both at 0.89%. The Stringency Index increased from 66.67 to 79.85 in November 6, and the Mobility Index continued to increase at the same time. By applying the buffer level to the data in November 15 (9 or 10 days after the increase of the Stringency Index), the predictive accuracy of FGM (1,1) is better than that of ARMA in the 12-day test set (from Nov 15, 2020 , to Nov 26, 2020 . The comparison of model accuracy is shown in Table 4 . From the example of Italy, we find that government control measures require the active cooperation of the public and often have a time lag when they are effective. Applying the buffer level at that time can enhance predictive accuracy of FGM (1,1). The result in France can provide a similar example. As shown in Fig. 9 , the French government strengthened the control measures on October 30, 2020. The Mobility Index increased in the same time. However, it has only been implemented for 2 days. So, FGM is better than ARMA in predicting cumulative cases in 26-day test set (from Nov 1, 2020 , to Nov 26, 2020 . We use the historical data in the middle and late March of 2020 to estimate the buffer level to be 0.25 at that time. Applying the buffer level to the data of November 9 (9 or 10 days after the increase of the Stringency Index) can enhance the predictive accuracy in 18-day test set, which is shown in Table 5 . The example of the USA also supports our finding. As is shown in Fig. 10 , the Stringency Index of the USA rose twice on November 8 and November 16. However, there is no significant increase in the Mobility Index, which means that Americans do not respond actively to the control measures. As a result, there is no weakening effect on the cumulative number of confirmed cases in the USA. 6 Predictive results of the cumulative cases in different countries using FGM (1,1). Note: The picture shows the predictive results of the cumulative cases in different countries using the FGM (1,1), and the MAPE of these models is less than 3% Table 2 Fourteen-day model accuracy (from Nov 1, 2020 , to Nov 14, 2020 Note: This table reports the predictive accuracy of FGM (1,1) and ARMA in the 14-day test set. Calculated in both MAPE and RMSE, the predictive accuracy of FGM (1,1) is better than that of ARMA model except India and Brazil. We believe that this may be due to the limited COVID-19 testing speed of Brazil and India. As a result, FGM (1,1) which proves to be effective for the data prediction of exponential growth does not achieve excellent predictive results ,640.48 44,223.57 335,865.31 58,827.99 8204.87 69,199.27 The cumulative number of confirmed cases in the USA continued to grow exponentially throughout November. Better result is obtained by using FGM (1,1) with buffer level of 0. The Stringency Index and Mobility Index of Brazil and India in the end of October and in November remain stable, as there is no weakening effect on the cumulative number of confirmed cases. We do not use buffer level to improve our predictive accuracy. Our model predicts that the cumulative number of confirmed cases in Brazil and India is much more than the reported number, mainly due to bias between actual number and reported number mentioned above (Figs. 11 and 12 ). As the buffer level can be used to improve predictive accuracy in the test set, it can also be used to predict cumulative confirmed cases in different scenarios. In this section, we predict the cumulative confirmed cases from January 1 to 31, 2021, in Japan, the USA, France, Britain, Germany, Italy, Brazil, India, and Argentina based on the data as of December 31, 2020 in different scenarios. Different scenarios mean using different buffer levels to represent the mitigation effect of the coordination of control measures and public responses on the spread of the epidemic. We use the buffer levels of 0.55 and 0.9 as the benchmark, which is based on the early data in Italy and Japan. Δ = 0.55 means that the country has effective epidemic control while Δ = 0.9 means that the country has better epidemic control. The cumulative number of confirmed cases in Japan as of December 31 was 235,788. The actual number of confirmed cases in Japan in January 31 fall in the range of the predicted value with a buffer level of 0 to 0.9, and the it is most close to the predictive result with a buffer level of 0.55 (Fig. 13) . The cumulative number of confirmed cases in the USA as of December 31 was 20,098,800. Judging from the past trend, the cumulative number of confirmed cases in the USA has shown a trend of exponential growth since the end of October. The actual number of confirmed cases in the USA in January 31 fall in the range of the predicted value with a buffer level of 0 to 0.9, and it is most close to the predictive result with a buffer level of 0.55 (Fig. 14) . The cumulative number of confirmed cases in France as of December 31 was 2,620,425. Judging from the past trend, the growth rate of cumulative confirmed Since the end of October, France has strengthened control measures and people have actively responded to these policies cases in France has decreased since the mid-late November. The actual number of confirmed cases in the France falls in the range of the predicted value with a buffer level of 0 to 0.9 (Fig. 15 ). The cumulative number of confirmed cases in the UK as of December 31 was 2,488,780. Judging from the past trend, the growth rate of cumulative confirmed cases in the UK has increased since the mid-December. The actual number of confirmed cases in the UK in January 31 fall in the range of the predicted value with a buffer level of 0 to 0.9, and it is most close to the predictive result with a buffer level of 0.9 (Fig. 16) . The cumulative number of confirmed cases in Germany as of December 31 was 1,756,248. Judging from the past trend, the cumulative number of confirmed cases in Germany has shown a trend of exponential growth since the end of October. The actual number of confirmed cases in Germany in January 31 is most close to the predictive result with a buffer level of 0.9, which is consistent with our finding (Fig. 17) . The cumulative number of confirmed cases in Italy as of December 31 was 2,107,166. Judging from the past trend, the growth rate of cumulative confirmed cases in Italy has decreased since the end of October. The actual number of confirmed cases in Italy in January 31 falls in the range of the predicted values, and it is most close to the predictive result with a buffer level of 0.9 (Fig. 18 ). The cumulative number of confirmed cases in Brazil as of December 31 was 7,681,032. Judging from the past trend, the cumulative number of confirmed cases in Brazil has shown a trend of exponential growth since the end of October. The cumulative number of confirmed cases officially announced in Brazil in January 31 is most close to the predictive result with a buffer level of 0.9. However, due to the limited testing ability in the early stage, the actual number of the cumulative confirmed cases may be closer to the predicted value provided by FGM (1,1) with a buffer level of 0 (Fig. 19) . The cumulative number of confirmed cases in India as of December 31 was 10,286,234. Judging from the past trend, the cumulative number of confirmed cases in India has shown a trend of exponential growth since the end of October. However, the actual data in India did not fall within our forecasts, mainly due to the limited COVID-19 testing speed in India mentioned above. It can be observed that in late January, the number of daily confirmed cases is growing at an almost fixed rate, probably due to limited testing speed, resulting in an underestimation of confirmed cases (Fig. 20) . The cumulative number of confirmed cases in Argentina as of December 31 was 1,625,514. Judging from past trends, the cumulative number of confirmed cases in Argentina has increased slowly since the end of October. The actual number of confirmed cases in Brazil in January 31 fall in the range of the predicted value with buffer levels from 0 to 0.9, and it is most close to the predictive result with a buffer level of 0 (Fig. 21) . We forecast the cumulative number of confirmed cases in Japan, the USA, France, the UK, Germany, Italy, Brazil, India, and Argentina on January 31, including about 402,417 in Japan, 27,271,051 in the USA, 3,096,515 in France, 4,130,429 in the UK, 2, 389, 240 in Germany, 2, 523, 042 in Italy, 9, 250, 362 in Brazil, 11, 368, 243 in India, and 1, 944, 242 in Argentina. Table 6 reports the predictive accuracy of the January test set (from January 1 to January 14) for different countries. The predictive accuracy is calculated in both MAPE and RMSE. FGM (1,1) with buffer levels show a strong capacity to predict the cumulative confirmed cases in these countries. The MAPEs of the test sets in these countries are less than 5%. In this paper, we adopt the rolling FGM (1,1) to predict the cumulative number of confirmed cases. FGM is one of the forefront methods in grey system modeling. It makes full use of known information under the condition of less data environment and has an excellent performance in predicting the future trend. We have a marginal contribution on proposing the concept of buffer level based on the general form of WBO. Combining buffer level with FGM in an appropriate way can improve the forecasting accuracy. The example of Italy, France, and the USA shows that government measures require the active cooperation of the public and often have a time lag when they are effective. Only when a government increases its stringency and the public observe the order can the spread of COVID-19 be slowed down. The practical significance of this paper lies in the comprehensive consideration of government control measures and public response in the prediction. Our main marginal contribution lies in the concept of buffer level, which takes into account both government control measures and public response in predicting cumulative confirmed cases. In practice, we find that only when government control measures are in place and people response actively to these measures can the spread of COVID-19 be Table 7 The results of augmented Dickey-Fuller Test and the ARMA specifications USA with trend ARMA(1,1) ARMA(1,1) with trend ARMA(1,1) ARMA(2,1) with trend slowed down. Appropriate use of buffer level according to government stringency and mobility can improve the predictive accuracy of FGM (1,1). Buffer levels are obtained from countries' historical data, which can be used to calculate the predicted value in various scenarios. We predict the cumulative confirmed cases in various countries with buffer levels of 0.55 and 0.9. Based on the Government Response Stringency Index and Mobility Index in December, we forecast the most likely results at the end of January. Compared with the actual data in January, FGM (1,1) with buffer level has high predictive accuracy. Based on rolling FGM (1,1) with different buffer levels and indices in December, we forecast the cumulative number of confirmed cases in Japan, the USA, France, the UK, Germany, Italy, Brazil, India, and Argentina from January 1 to 31. The conclusion indicates that countries around the world should cooperate to control the second wave of the COVID epidemic as soon as possible. Analyzing and forecasting COVID-19 pandemic in the Kingdom of Saudi Arabia using ARIMA and SIR models Application of the novel nonlinear grey Bernoulli model for forecasting unemployment rate Time series forecasting of COVID-19 transmission in Canada using LSTM networks Forecasting imported COVID-19 cases in South Korea using mobile roaming data Study on the buffer weakening operator Study on characteristics of the strengthening buffer operators COVID-SGIS: a smart tool for dynamic monitoring and temporal forecasting of Covid-19. Front Public Health 8 An interactive web-based dashboard to track COVID-19 in real time Reconstruction of the full transmission dynamics of COVID-19 in Wuhan Modelling and forecasting of COVID-19 spread using wavelet-coupled random vector functional link networks Clinical features of patients infected with 2019 novel coronavirus in Wuhan Modeling, control, and prediction of the spread of COVID-19 using compartmental, logistic, and Gauss models: a case study in Iraq and Egypt Control problems of grey systems Listings of WHO's response to COVID-19 Forecasting confirmed cases, deaths, and recoveries from COVID-19 in China during the early stage Forecasting the spread of COVID-19 under different reopening strategies The three axioms of buffer operator and their application Prediction of COVID-19 active cases using exponential and non-linear growth models ARIMA models for predicting the end of COVID-19 pandemic and the risk of second rebound Covid-19 in Brazil: A sad scenario COVID-19 predictability in the United States using Google Trends time series COVID-19 situation in India: Artifact or Fact Forecasting the cumulative number of confirmed cases of COVID-19 in Italy, UK and USA using fraction Modeling and forecasting the COVID-19 pandemic in India Nonlinear time series analysis of pathogenesis of COVID-19 pandemic spread in Saudi Arabia Kalman filter based short term prediction model for COVID-19 spread Forecasting daily confirmed COVID-19 cases in Malaysia using ARIMA models Forecasting of the SARS-CoV-2 epidemic in India using SIR model, flatten curve and herd immunity Study on buffer operators with variable weights and their effect strength to original sequence Unbiased grey Verhulst model and its application An optimized NGBM (1, 1) model for forecasting the qualified discharge rate of industrial wastewater in China Constructing methods of several kinds of strengthening and weakening buffer operators and their inner link Grey system model with the fractional order accumulation Statistical analysis of forecasting COVID-19 for upcoming month in Pakistan Dynamics identification and forecasting of COVID-19 by switching Kalman filters Prediction of the number of patients infected with covid-19 based on rolling grey verhulst models Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations 1550 Haigang Avenue All the data are open data. We collected them from GitHub (https:// covid 19dat ahub. io/ artic les/ doc/ data. html). In this paper, we coded with Matlab R2013b. The authors declare no competing interests. Journal of Healthcare Informatics Research