key: cord-0761004-ujzys2ub authors: Vaishnav, Vaibhav; Vajpai, Jayashri title: Assessment of impact of relaxation in lockdown and forecast of preparation for combating COVID-19 pandemic in India using Group Method of Data Handling date: 2020-08-07 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110191 sha: 85bb69031c44b6bacdf471122b95571fcd4cab8d doc_id: 761004 cord_uid: ujzys2ub Ever since the outbreak of novel coronavirus in December 2019, lockdown has been identified as the only effective measure across the world to stop the community spread of this pandemic. India implemented a complete shutdown across the nation from March 25, 2020 as lockdown I and went on to extend it by giving timely partial relaxations in the form of lockdown II, III & IV. This paper statistically analyses the impact of relaxation during Lockdown III and IV on coronavirus disease (COVID) spread in India using the Group Method of Data Handling (GMDH) to forecast the number of active cases using time series analysis and hence the required medical infrastructure for the period of next six months. The Group Method of Data Handling is a novel self organized data mining technique with data driven adaptive learning capability which grasps the auto correlative relations between the samples and gives a high forecasting accuracy irrespective of the length and stochasticity of a time series. The GMDH model has been first validated and standardised by forecasting the number of active and confirmed cases during lockdown III-IV with an accuracy of 2.58% and 2.00% respectively. Thereafter, the number of active cases has been forecasted for the rest of 2020 to predict the impact of lockdown relaxation on spread of COVID-19 and indicate preparatory measures necessary to counter it. Human civilisations have been periodically challenged by the onset of infectious diseases. In the realm of infectious diseases, a pandemic is the worst case scenario. The latest one in the series of pandemics has been caused by the family of corona viruses. Corona viruses are pleomorphic, single stranded ribonucleic acid (RNA) viruses. The "novel" coronavirus is a new strain that has not been previously identified in humans. The name derives from the crown like appearance produced by the club shaped projections that stud the viral envelope. The 21 st century saw its first pandemic in 2002 as Severe Acute Respiratory Syndrome or SARS followed by Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV) [1] . Today the world is fighting another pandemic known as Coronavirus disease 2019 abbreviated as COVID- 19 . The initial cases of COVID-19 were reported on 8 December 2019 in Wuhan, Hubei Province, China. Cases were reported after exposure to the local Hunan South China seafood market that sells a variety of wild animals, suggesting that the zoonotic Coronavirus crossed the barrier from animal to human at this wet market [2] . The COVID-19 is said to be caused by 2019-nCoV (Novel Coronavirus 2019, 2020) termed by World Health Organization (WHO) or SARS-CoV-2(Severe Acute Respiratory Syndrome Coronavirus 2) as termed by the International Committee on Taxonomy of Viruses. COVID-19 virus is categorized by WHO as β-CoV of group 2B [3] . The genome of this virus is identified and it resembles the SARS-CoV (80% similarity) and MERS-CoV (50% similarity) [4, 5] . As of 30/06/2020, the world has registered 1,01,85,374 confirmed cases and 5,03,862 deaths due to COVID-19. With nearly 25% of total cases in world, USA has been the most effected country followed by Brazil, Russia, and India. The first confirmed case of novel coronavirus in India was reported on 30 January 2020, in the state of Kerala. As of today, India has reported 5,66,840 confirmed cases and 16,893 deaths due to COVID-19 [6, 7] . The spread of coronavirus is by sneezing, cough droplets and contact. This virus tends to enter the body through the mouth, nose, and eyes [8] . It is speculated that the virus may infect a person at a distance of about 6 ft (1.8 m) radius. The virus can survive for about 2 h to few days in sneezing and cough droplets lying on the surface or ground. Studies have shown that the infection can spread through fomites but it is not the major source of the infection. This virus has been detected in stools of the patients but no infection via stool has been reported. Similar to SARS-CoV, nCoV infects cells of the respiratory tract through the angiotensin-converting enzyme 2 (ACE2) receptors [9] . A proteolytic cleavage occurs at SARS-CoV S protein at position (S2') mediating the membrane fusion and viral infectivity [10] . It is likely that the infection may arise if a person comes in direct or indirect contact with any of the body fluids of an infected person. The entire clinical picture of the disease is not fully known however the symptoms vary from mild to severe. Risk is greater in extremes of age and in the patients having other health problems like lung diseases, diabetes, heart diseases, and cancer. The common signs of infection are fatigue, muscle pain, sneezing, sore throat, dry cough, high fever, respiratory problems, etc. with some severe cases having pneumonia, serious respiratory syndrome, kidney failure and death. The incubation period as reported by World Health Organization is between 2-10 days [11] . Prevention and management are the most important aspects of controlling COVID-19 spread. Thus, the need for collective efforts of the public and the government arise. Simple steps like avoiding sneezing and coughing at the public places, covering mouth and nose with mask during sneezing and coughing, frequent cleaning of hands with soap and alcohol based sanitizers are essential. It is advised to avoid the interactions with persons; suspecting respiratory problems symptoms like sneezing, coughing, breathing problem, etc. Different nations have imposed different duration and natures of lockdown as per the severity and number of cases in their country. The Government of India too, has introduced social distancing as a precaution to avoid the possibility of a large-scale population movement that can promote the community spread of the disease. The purpose of these initiatives is the restriction of social interaction in workplaces, schools, and other public spheres, except for essential public services such as hospitals. Despite no vaccine, social distancing has identified as the most commonly used prevention and control strategy [12] . The Indian government on 25 th March, 2020 implemented a nationwide lockdown to slow the spread of COVID-19. It was world's largest lockdown which sent 1.3 billion people into isolation. After 40 days and one extension, due to declining economy and GDP the government decided give some relaxations from 3 May, 2020. Since the beginning of lockdown 3.0, the country has seen an unprecedented growth in confirmed cases. This paper analyses the effect of opening or relaxation of lockdown on novel coronavirus spread in India using time series analysis and forecasts its nationwide quantitative spread for next six months thereby giving an extra edge to the medical fraternity in combating COVID 19. Although the mortality rate in India today is 2.98% but with world's 4 th highest number of cases and 2 nd largest population, the next few months are very critical in deciding the overall long term impact COVID 19 will have on Indian population and economy. This paper is summarized as follows: Section 2 gives a review of time series forecasting, Section 3 elaborates the GMDH technique and algorithm, Section 4 discusses Application and results and finally Section 5 describes the Conclusion. A time series is a set of quantitative observations on a variable of interest arranged in sequential order. Over the years, time series analysis has been used to study the statistical properties of data and propose a suitable mathematical model for data generating process so as to forecast the future values of time series. Time series forecasting has been used to forecast future values of numerous economic, demographic, climatic, financial and industrial variables. Monthly electrical peak load demands, daily minimum temperatures, yearly population growth, weekly industrial emissions, hourly manufactured units are all different examples of time series. The idea of stochasticity in the era of early 19 th century time series analysis was first introduced by Yule [13] and Kolmogorov [14] who proposed that every time series can be considered as a stochastic process. This idea became the foundation of first autoregressive and moving average models by the researchers like Slutsky, Walker, Yaglom and Yule. In 1970, Box & Jenkins integrated the existing knowledge and proposed the historic Autoregressive Integrated Moving Average (ARIMA) model [15] which became a stepping stone for application of modern time series analysis and forecasting in various areas of science. Since then, univariate ARIMA, multivariate (Vector) ARIMA, Seasonal ARIMA (SARIMA), Autoregressive Moving Average with Exogenous Inputs (ARMAX), exponential smoothing, multiple regression, etc have been used for time series forecasting in all domains of quantitative measurement. Forecasting in the field of epidemics aims at estimating the size and impact of an infectious disease in near future. The above mentioned time series models have been used from time to time to predict the impact of various infectious diseases across the world. Teng et. al. [16] have used ARIMA (0,1,3) model for dynamic forecasting of worldwide Zika virus outbreaks in November 2016. Similarly, Li et. al. [17] have applied piecewise exponential smoothing on logarithmic transformed data to predict epidemiological trend of measles in Shandong Province, China for the year 2005. Linear and Poisson regression have been used by Pelat et. al. [18] for retrospective detection of pneumonia and influenza mortality, and prospective surveillance of diarrhoea in France from 1968-1999. With the corona outbreak, researchers throughout the world have used ARIMA and exponential smoothing models to forecast the COVID 19 effects across countries like China, USA, Italy, India, Canada, France, South Korea and UK for different forecasting horizons. [19] [20] [21] [22] [23] [24] Apart from all these methods, the most commonly used epidemiological mathematical model for disease modelling has been the SIR model [25] where abbreviation stands for the number of suscepted, infected and recovered individuals in total population. It is used for prediction of infected population after taking the values of transmission rate and recovery rate of the disease in population. The SIR and modified SIR models have been used to predict the cycle of almost all the epedemics may it Ebola virus in Africa [26] , Measeles in Britain [27] , smallpox in Bangladesh [28] , H1N1 in Japan [29] , Influenza in Honk Kong [30] or COVID 19 across different countries across the world [31, 32, 33] . But all the above mentioned time series models are parametric in nature, i.e., one needs to mention the fixed set of parameters which describe the relationship between input and output variables and which are estimated from time series realizations. Hence, the data generating process is hidden and model structure has to be prespecified. Moreover, most of these models are accurate for linear and stationary time series. Although, most of the diseases form seasonal time series but predicting the spread of a disease that has never been encountered before is a challenging task. The time series associated with many diseases are highly dynamic in nature and cannot be approximated accurately by using traditional epidemiological and statistical models mentioned before. Therefore, it is necessary to use advanced computing models like the ones of neural networks for their modelling. The prior assumptions imposed on the data generating process are lesser in case of neural networks and they are more robust and tolerant to non linearity in the forecasting data. Hence, with strong adaptive learning capability, neural networks have been widely used to model various disease time series which exhibit complex nonlinear patterns. Zhang et. al [34] have shown the outperformance of neural networks over SARIMA model in forecasting typhoid fever incidence in Guangxi province of China for year 2010. Chakraborty et. al. [35] have forecasted dengue epidemic for San Juan and Iquitos regions using hybrid Neural network ARIMA model to capture both linearity and non linearity in time series. Zhu et. al. [36] have proposed a novel deep neural network to forecast the Influenza outbreak in Guangzhou, China for year 2018. Pertaining to its exponential growth, the novel Corona virus outbreak has also been predicted using various versions of neural networks in past three months. Chemmula & Zhang [37] have made predictions for Canada using long short term memory neural nets while Huang et. al. [38] have used deep convolutional Neural Network for forecasting confirmed cases in China. Uhlig et. al. [39] have combined epidemiological and neural network approach to prepare an online dashboard for forecasting and providing COVID 19 prognoses for all countries of Europe and South East Asia. But despite their strong nonlinear mapping ability, researchers have been able to reduce but not completely remove the requirement of neural networks regarding prior information about the system under investigation. Despite their combination with other optimization techniques in hybrid networks, their black box nature still remains as an important limitation that cannot be ignored. Also, in all the above examples where neural networks have been used for forecasting, the length of all the time series is sufficiently large with high correlation between data samples for optimum training of the network. This is not the case with COVID 19. In India, the virus has marked its presence by March 2020 end and the associated time series is not large enough as compared to datasets of previous pandemic outbreaks. Therefore, to model the growth and effects of COVID 19, a highly self organized model is required that is competent enough to decode the nonlinear trend within data even from a short length of time series. GMDH is one such nonparametric nonlinear model which automatically extracts knowledge from data samples and trains itself without any prior knowledge about the system. It is an advanced neural network with a nonlinear optimization process which is capable of predicting real, dynamic and chaotic time series without affecting the forecasting accuracy. The authors of this paper have already used GMDH from forecasting the number of monthly airline passengers (a benchmark linear time series in forecasting literature) [40] to forecasting monthly peak electrical load demand for India's largest state, Rajasthan (a real time non linear time series) [41] . In both the cases, GMDH has performed better than reported references. Apart from this in past one decade, GMDH has been used for forecasting wind speed [42] , reservoir water levels [43] , daily traffic flow [44] , stock indices [45] , significant wave height [46] , turbidity [47] , industry market demand [48] , cash demand in ATMs [49] , local vehicle population [50] and even oil prices [51] . In the field of disease forecasting, GMDH has been recently used to predict the number of patients with lower respiratory disease due to air pollution [52] and total number of knee and hip replacements in arthritis patients [53] but it has yet not been used to predict the size of an epidemic. This paper, in a first, proposes GMDH to predict the growth of pandemic like COVID 19 after explaining the algorithm in next section. GMDH is an inductive self organizing technique proposed by Ukrainian scientist A. G. Ivakhnenko [54] which identifies the internal structure of non linear systems by extracting knowledge from data samples. The GMDH network uses polynomials to model the mathematical relationship between multiple inputs and single output. It is an advanced version of perceptron [55] where the total number of layers and number of nodes in each layer of network are not prespecified but are automatically decided as the calculation proceeds and the network evolves. The layers and respective nodes in each layer are linked by a quadratic transfer function. The weights of these transfer functions are calculated by solving Gauss normal equations for a group of inputs at a time rather than randomly searching among all inputs, hence the name Group Method of Data Handling. The passage of each node to next layer is determined by the survival of fittest criterion, until the final optimized model with minimum error is achieved. The GMDH uses non linear Kolmogarov-Gabor [56] polynomial as output equation for network development given as follows: where a 0, a i, a ij, a ijk. denote polynomial weights and x 1, x 2, x 3...... x N denote the input variables. The above equation is linear in weights and non linear in variable x. The detailed algorithm is explained in next sub section. Divisionthe first step consists of modelling the data structure where dataset is divided into training and checking set respectively. Training set is used for estimating the weights of polynomial transfer functions whereas checking set is used for selection of fittest node in a respective layer. Different ratios of training to checking set observations can be used to examine the potential of algorithm on different datasets with different statistical properties. Generationwith polynomials as partial functions, each GMDH network is made up of variable number of hidden layers with each layer having variable number of nodes the method of selection of which is described in next two steps. Two inputs are fed at each node which undergo a quadratic transfer function as per the Ivakhnenko [54] polynomial: y*(x 1 , x 2 ) = a + bx 1 +cx 2 + dx 1 2 + ex 2 2 + fx 1 x 2 where a, b, c, d, e, f are coefficients of polynomial for pair of input variables x 1 , x 2 . If n is the number of input variables at a layer then the number of nodes in that layer is given by n C 2 = n(n-1)/2 and so are the number of Ivakhnenko polynomials for that layer. The number of nodes that enter the next layer are governed by Regularity criteria as described in step IV. Hence, groups of many lower order polynomials are used for successive approximation at each layer rather than one higher order polynomial with all the input variables with terms of all powers. III. Parameter estimationthe regression coefficients of node transfer function are determined using Training set and least squares method, i.e., forming a network such that square of difference between the actual output y i present in training set and predicted output y i * is minimum for each pair of input variables [57] . IV. Self Selectionthe outputs of polynomial transfer functions in a layer serve as inputs to the next layer but the number of variables to be passed on to next layer are determined by using a regularity criterion and checking set. The regularity criterion 'R' measures the mean squared error between the predicted and actual value for each node (but this time using the checking set). If the value of R is less than a threshold value, then the node output is passed as an input to the next layer otherwise it is eliminated. This self selection procedure is analogous to Darwin's theory of evolution. The regularity criterion is given as follows: V. Model fitting -After selecting the fittest node outputs for the next layer, the value of R min for each layer is recorded. This process is repeated unless the GMDH model begins to show over fitting, i.e., unless the value of R min for a layer is greater than the value of R min for the previous layer. Hence, the polynomial with least value of R is chosen as best polynomial and the model so formed as the most optimal model as shown in the Fig. 1 . A simplified GMDH model has been shown in Fig. 2 . Let x 1 , x 2 , x 3 , x 4 be the total number of inputs. For 4 inputs, the total numbers of pairs fed to the first layer are 4 C 2 = 6. For six input pairs (x 1 , x 2 ), (x 1 , x 3 ), (x 1 , x 4 ), (x 2 , x 3 ), (x 2 , x 4 ), (x 3 , x 4 ), let the outputs of polynomial transfer functions (as per Eq. 2) be y 11 , y 12 , y 13 , y 14 , y 15 and y 16 respectively. As described in step IV, only those outputs are passed as inputs to the next layer, which fit the regularity criteria. Assuming that y 11 , y 13 and y 15 pass the regularity criteria (as indicated by dark circles), they become inputs for the second layer. Now, for 3 inputs, 3 C 2 = 3 pairs are generated, i.e. (Y 11 , Y 13 ), (Y 11 , Y 15 ) and (Y 13 , Y 15 ) respectively. The outputs for the second layer are given by y 21 , y 22 and y 23 . Assuming that y 23 fails the regularity criteria y 21 and y 22 are passed on to third layer as inputs whose output y 31 is determined as the best polynomial. The following example has taken 4 inputs for simplicity. The following exercise can be done for any number of inputs until only one output passes the regularity criteria. The total number of layers in this network are three with 6, 3 and 1 nodes respectively which have been determined automatically using the regularity criteria as the algorithm proceeds. It is also important to note that with four inputs (x 1 , x 2 , x 3 , x 4 ), the complete polynomial with terms of all powers would have had a total of 70 terms. Hence, determining a fourth order polynomial fit would have involved a simultaneous estimation of 70 parameters whereas GMDH involves calculation of only 6 parameters at a time as per Ivakhnenko polynomial. This saves computation time and makes GMDH preferable over other techniques to solve large dimensional problems when the data sequence is Optimum fit comparatively short. The next section discusses the application of this GMDH algorithm for COVID 19 forecasting. Ever since the outbreak of global pandemic COVID 19, medical researchers throughout the world have been working on development of its vaccine but till this time, the COVID 19 remains incurable and social distancing has been identified as the only preventive measure. As of 30/06/2020, India has reported 5,66,840 confirmed novel coronavirus cases with 3,34,822 recoveries and 16,893 deaths as shown in Fig. 3 [58] . India is home to world's second largest population and has significantly higher population density as compared to USA, Brazil, Russia, UK, Italymost adversely affected countries by coronavirus. The countries like USA, UK and Italy reported highest number of deaths due to COVID 19 [7] , inspite of having the world's best medical infrastructure. Learning from this challenging situation of most of the developed nations, India took a proactive decision amongst the South East Asian countries even when the roots of pandemic were not so deep in the region. To break the infection chain in its very early stages, the Government of India announced a complete lockdown across the nation for 21 days from 25 March, 2020 to 14 April, 2020. It was historical in the sense that After bearing a 50 day complete shutdown of all social and economic activities, the government announced lockdown III with some relaxation in norms. These norms granted some allowances to public like permitting them to get out of their homes between 7 AM to 7 PM and use vehicles with 50% occupancy. Public and private sector offices were also allowed to resume functioning with one third employees and so were shops and industries while strictly following the norms of social distancing. All the districts were divided into red, orange, green and containment zones as per the number of cases in the region. The government imposed huge fines for these to get reported during lockdown I and II as shown in Fig. 4 . As shown in the Table 1 , the daily average number of reported cases also became more than four times from lockdown II to lockdown IV. Along with increase in average recovery rate, decrease in average growth rate of daily cases was also an important parameter for government to partially remove the lockdown. As visible from Table 1 , the average growth rate decreased to 5.09% (lockdown IV) from 15.73% (lockdown I) but the percentage decrease in growth rate in going from lockdown II to III was merely 17.80% whereas it was 53.11% while going from lockdown I to II. Apart from relaxation in isolation norms, there had been two more important reasons for sudden increase in the number of positive casesincreased testing capacity and migration of labourers. As per the daily bulletin published by Indian Council of MedicaI Research (ICMR) [59] , the total samples tested during lockdown III were almost equivalent to total samples tested during first two lockdowns as shown in Fig. 5 and Table 1 . Although the tests performed per million population in India are still way too less than those performed by countries above India in the tally of total confirmed cases, the testing capabilities have been strengthened by setting up more labs during lockdown period and revising testing norms. Further, the mass movement of internal migrants across nation also emerged as an important cause for virus transmission. As per a report published by World Bank [60], around 40 million internal migrants were effected due to lockdown. After the first lockdown, with complete halt on transportation services, an increasingly large number of migrants started heading towards their native places on foot. Hence, the government decided to run special trains and buses to regulate interstate transfer of migrants from 1 st May, 2020 onwards. As per PIB bulletin of 28 th May, 2020 a total of 3543 'Shramik Special' trains took 48 lakh migrants back to their home states from different parts of country in the span of 26 days (01/05/2020 and 26/05/2020) during lockdown III-IV [61]. The migrant inflow resulted in spreading of the infection to rural areas and the districts which were earlier in green zone. All these reasons resulted in tremendous growth in number of cases post 3 rd May, 2020. As visible from Fig. 6 , till the end of lockdown IV, the highest spike in number of daily reported cases since outbreak was observed on 31 st May, 2020 with 8380 cases reported in single day. The last parameter to conclude the effect of relaxation in lockdown isthe doubling time, which has been used worldwide by statisticians to predict the growth of coronavirus. Doubling time is the time taken by the number of infections to double from a given day. The doubling time for India on 24/03/2020, before imposition of first lockdown was 3.4 days. It has been quoted at several places that at this rate, without lockdown India would have surpassed 100000 cases in April end which it actually did on 19/05/2020. Undoubtedly, due to lockdown as per Table 1 , the average doubling time increased over the four editions but it can be easily noticed from Fig. 6 , that during lockdown I and II, the doubling time showed a very steady growth but during lockdown III, it remained consistently constant at around 11 days [62] . Also, it is important to note that the average growth rate of doubling time decreased drastically from 3.61% to 0.67% in going from lockdown II to lockdown III when restrictions were partially removed. Hence, this analysis concludes the factual description of COVID 19 growth in India due to relaxation in lockdown III-IV. After four stages of lockdown, the government has finally started to unlock the nation in phases from 01/06/2020. The night curfew has been reduced to 10 PM to 5 AM, travel restrictions have been lifted off and economic activities have been allowed in entire nation except for containment zones. All the time series used in this section -total confirmed cases, total deaths, total recovered cases, daily reported cases has been formulated using data available from daily COVID 19 bulletin published by Press Information Bureau (PIB) of India [63] , daily situation reports published by World Health Organization [64] and website of Ministry of Health and Family Welfare, Government of India [65] . Although various online COVID dashboards and media is primarily talking about the confirmed, recovered and deceased cases, the number of active cases is the most important parameter to analyse the impact of coronavirus and quality of healthcare services in a country. Active cases are defined as the number of cases which remain after subtracting the recovered and deceased patients from total confirmed positive cases on a given date. Although India has crossed the 5,00,000 mark in terms of total confirmed COVID 19 cases, with a recovery rate of 59.06% as on 30 th June, 2020, India has managed to keep the number of active cases in check by imposing lockdown in early stages of infection. With 5,66,840 total confirmed cases, India has a total of 2,15,125 active cases today. The number of active cases helps government to mark containment zones and monitor the growth of virus in a particular region. The active cases time series has been formulated using the time series of confirmed, recovered and deceased cases. As explained in subsection 4.1, lockdown III-IV were quite different from lockdown I and II in terms of both impact and statistics. The complete dataset for total active cases in India till lockdown IV has been divided in two sets, i.e. time series 1 and 2 respectively. With total active cases from 31/01/2020 to 03/05/2020 (lockdown I-II), time series 1 contains 94 observations whereas time series 2 consists of 28 observations from 04/05/20 to 31/05/20 (lockdown III-IV). As mentioned before, GMDH is a highly self organized data mining technique which extracts the correlation between data samples without any prior knowledge about the time No. of days series and enhances the forecasting accuracy. The GMDH model is applied to time series 1 to forecast time series 2 so as to validate and standardise the model. Table 2 and Fig. 8 . The Mean Absolute Percentage Error (MAPE) has been used as accuracy criteria. The GMDH model has forecasted the active cases with MAPE of 2.58%. The total number of confirmed cases has also been forecasted for lockdown III-IV with MAPE of 2.00% which depicts the efficiency of GMDH as a nonlinear forecaster, following the same division of dataset and forecasting horizon. The results are as shown in Table 3 and Fig. 9 . Table 4 . With successive unlocking of country after every lockdown and unavailability of vaccine, GMDH forecasts are suggestive of the fact that the novel coronavirus is here to stay for a long time. However, the model predicts that the growth rate of active cases will slow down with time which is in favour of steady growth in recovery rate since lockdown ended on 31/05/2020. The model suggests that the active cases in country will reach 2 lakh by June end but it will take 1.5 more months for it to reach the mark of 4 lakh and 2 more months for the double to Fig. 10 suggests that there would be a total of 7,73,558 active COVID-19 cases by the end of year 2020. The fortnightly requirements of ICU beds, ventilators and oxygen support systems have been tabulated in Table 4 as per the forecasted active cases. With limited data and limitations of data science, it is impossible for any model to accurately predict the number of people effected by COVID-19 but a forecasted range of effected people may keep the medical fraternity and the government a step ahead in fighting pandemic. Several epidemiological and soft computing models have been used by researchers all over the world to forecast the quantitative growth of novel coronavirus across different regions but the short length and nonlinearity exhibited by time series of COVID cases has been an incessant challenge. In this paper, an applied soft computing technique has been used to predict the number of active COVID 19 patients and ventilator requirements in India for the upcoming months of 2020 by considering the available data published upto 30/06/2020 by the Ministry of Health, Government of India . The country has shown a considerable growth in recovery rate till now and the death rate is also low as compared to the most severely affected nations worldwide but it has entered the top five nations in terms of number of confirmed cases and is presently at an alarming fourth position worldwide. Therefore, with world's second largest population and rising number of cases, these GMDH predictions can guide the healthcare system in terms of preparedness and management of apparatus particularly ventilators. The sequence of lockdown has prevented the growth from becoming exponential but it has not been able to bring a decrease in the number of daily reported cases. Every lockdown has witnessed a successive rise in the number of average reported daily cases. 19,458 cases were reported on 29/06/2020. Hence, these predictions are very important as the country getting ready to fully unlock in phases. Apart from forecasting active cases, the effect of initial relaxations by the government and the impact of partial unlocking of nation during lockdown III and lockdown IV has also been covered in this paper. An in depth numerical analysis has been carried out in terms of standard COVID parameters like doubling rate, average daily reported cases, average growth rate, total samples tested, recoveries, deaths to elaborate the impact of lockdown III and IV in increasing the COVID spread as compared to lockdown I and II. The degree of statistical variation between the time series of four lockdowns has been explained and then their forecasting has been used to standardise the GMDH model. Model's accuracy in matching the actual number of cases for lockdown III-IV has validated its authenticity for the number of forecasted active cases for next six months. The lockdown certainly gave preparation time to the government for boosting up the medical infrastructure and limiting the growth rate of virus but no economy whether developing or developed can afford lockdown for a long time. The lockdown V which is better called as unlock I was certainly a necessary event but the unavoidable risk to which it has exposed the citizens of nation cannot be overlooked. Applied soft computing and these active case forecasts are one of the many ways one can assist the government to combat this risk. Moreover, USA and China have reported their success in conducting preliminary trials for coronavirus vaccine but even if they succeed in trials for a larger number of populations in next stages, in which hopefully they will, it will take a minimum of six months to one year for the vaccine to reach markets. Hence, prediction over a period of six months has been made using GMDH in terms of active cases. The present trend and predictions both are suggestive of fact that the novel coronavirus is a problem we will have to stay with for a longer period of time and today prevention is the measure, not just better than cure. ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Influenza: The mother of all pandemics Coronavirus disease 2019 (COVID-19) Situation Report-94 Novel Wuhan (2019-nCoV) coronavirus Genomic characterization and epidemiologyof 2019 novel coronavirus: implications for virus origins and receptor binding Identification of a novelcoronavirus causing severe pneumonia in human: a descriptive study Tracking country's first 50 COVID-19 cases COVID-19) Situation Report162, Worl Health Organization (WHO) Transmission of Novel Coronavirus (2019-nCoV) | CDC A tale of two viruses: the distinct spike glycoproteins of feline coronaviruses Activation of the SARS coronavirus spikeprotein via sequential proteolytic cleavage at two distinct sites Novel coronavirus (2019-nCoV) Situation Report-7 Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand On the method of investigating periodicities in disturbed series, with special reference to Wolf's sunspot numbers Stationary sequences in Hilbert space Time Series Analysis, Forecasting and Control Dynamic Forecasting of Zika Epidemics Using Google Trends Epidemic trend of measles in Shandong province, China Online detection and quantification of epidemics Open-source analytics tools for studying the COVID-19 coronavirus outbreak COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis Trend Analysis and forecasting of COVID-19 outbreak in India Application of the ARIMA model on the COVID-2019 epidemic dataset Forecasting the novel coronavirus COVID-19 Seasonal dynamics of recurrent epidemics Mathematical modelling, simulation, and optimal control of the 2014 Ebola outbreak in West Africa Dynamics of Measles Epidemics: Estimating Scaling of Transmission Rates Using a A Modified Sir Model to Study on Physical Behaviour among Smallpox Infective Population in Bangladesh Real-time estimation and prediction for pandemic A/H1N1(2009) in Japan Forecasting influenza epidemics in Hong Kong Analysis and forecast of COVID-19 spreading in China, Italy and France Forecasting the Worldwide Spread of COVID-19 based on Logistic Model and SEIR Model Data-based analysis, modelling and forecasting of the COVID-19 outbreak Comparative Study of Four Time Series Methods in Forecasting Typhoid Fever Incidence in China Forecasting dengue epidemics using a hybrid methodology Attention-based recurrent neural network for influenza epidemic prediction Time series forecasting of COVID-19 transmission in Canada using LSTM networks Multiple-input deep convolutional neural network model for covid-19 forecasting in china, medRxiv Modeling projections for COVID-19 pandemic by combining epidemiological, statistical, and neural network approaches Seasonal Time Series Forecasting by Group Method of Data Handling Load Forecasting by Group Method of Data Handling Wind speed forecasting models based on data decomposition, feature selection and group method of data handling network Reservoir water level forecasting using group method of data handling A match-then-predict method for daily traffic flow forecasting based on group method of data handling Forecasting the REITs and stock indices: group method of data handling neural network approach. Pac Rim Prop Res J Modeling of significant wave height Using wavelet and genetic programming Research on Forecasting The Market Demand of Sci-Tech Service Industry based on Improved GMDH Algorithm Chaotic Time Series Analysis with Neural Networks to Forecast Cash Demand in ATMs The Prediction Research on the Civilian Vehicle Population of Guangxi Province in China with the GMDH Algorithm Method Oil Price Forecasting Based on Self-Organizing Data Mining Short-term effects of air pollution on lower respiratory diseases and forecasting by the group method of data handling A reliable time-series method for predicting arthritic disease outcomes: New step from regression toward a nonlinear artificial intelligence method Polynomial theory of complex systems The perceptron: A probabilistic model for information storage and organization in the brain Heuristic self-organization in problems of engineering cybernetics Self-Organizing Methods in Modeling: GMDH Type Algorithms Lockdown in India has impacted 40 million internal migrants Doubling Time -COVID-19 India-Timeline an understanding across States and Union Territories' by Siva Athreya, Nitya Gadhiwala, and Abhiti Mishra, Indian Statistical Institute PIB's daily bulletin on COVID 19 WHO situation reports on COVID-19 Ministry of Health & Family Welfare, Government of India I would like to thank Dr. Ekta Kanojia for giving an insight about the virology of COVID 19 and thereby helping me build a strong foundation for this paper.