key: cord-0736153-6ub0dpmo authors: Bhardwaj, Shivam; Sharma, Sunil; Bhardwaj, Rashmi title: Machine Learned Hybrid Gaussian Analysis of COVID-19 Pandemic in India date: 2021-08-02 journal: Results Phys DOI: 10.1016/j.rinp.2021.104630 sha: e142022e33944b94098102e62c3d7c432c27fb40 doc_id: 736153 cord_uid: 6ub0dpmo This article discusses short term forecasting of the Novel Corona Virus (COVID -19) data for infected, recovered and active cases using the Machine learned hybrid Gaussian and ARIMA method for the spread in India. The Covid-19 data is obtained from the World meter and MOH (Ministry of Health, India). The data is analyzed for the period from January 30, 2020 (the first case reported) to October 15, 2020. Using ARIMA (2, 1, 0), we obtain the short forecast up to October 31, 2020. The several statistics parameters have tested for the goodness of fit to evaluate the forecasting methods but the results show that ARIMA (2, 1, 0) gives better forecast for the data system. It is observed that COVID 19 data follows quadratic behavior and in long run it spreads with high peak roughly estimated in September 18, 2020. Also, using nonlinear regression it is observed that the trend in long run follows the Gaussian mixture model. It is concluded that COVID 19 will follow secondary shock wave in the month of November 2020. In India we are approaching towards herd immunity. Also, it is observed that the impact of pandemic will be about 441 to 465 days and the pandemic will end in between April-May 2021. It is concluded that primary peak observed in September 2020 and the secondary shock wave to be around November 2020 with sharp peak. Thus, it is concluded that the people should follow precautionary measures and it is better to maintain social distancing with all safety measures as the pandemic situation is not in control due to non-availability of medicines. When Shivering and fever; Breathlessness; Diarrhea; Headache and Body ache. Further, it exhibits various astute qualities to sustain itself. Like travelling within us without getting detected (unless tested properly), infecting everything its host touches, reproducing itself liberally. However, its origin has not been confirmed till now. [2] Initially, it was suspected to be originated in Wuhan's South China Seafood Wholesale Market and later on, new possible origins were supposed like some scientists claimed that the crossspecies circulation may be from snakes to humans; however, this claim was disputed [3] . Coronaviruses (CV) are RNA viruses that are respiratory pathogens. Coronavirus transmission is defined as zoonotic i.e. between animals and people. They can cause benign seasonal illnesses like common cold to more severe public health emergencies like Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). A novel strain of Coronavirus disease has been identified in 2019 which was not previously identified in humans. [4] . The death rate from Covid-19 is considered very low for many age groups; but, the virus has turned out to be deadly for people above the age of 60 and due to its spread in over 200 countries, World Health Organization (WHO) on 11 th February. 2020 declared it as a pandemic. The COVID 19 fatality rate by age (as on February 11, 2020) was shown in Fig. 1 . [6]. With an estimated reproduction number >1 (range 2.6-4.7), early reports predicted a potential Coronavirus outbreak. [7] China plus some other regions using mathematical & traditional time-series prediction models [40] . Mathematical model-based prediction at an early stage achieved for the outburst of this particular virus in China [41] . Extensive exploration of pneumonia outbreak via corona-genome originating from bat species [42] . Coronavirus can have harmful effects on our bodies and livelihood. Some of them are: 1. Forming Blood Clots in the Human Body Doctors and scientists across the globe are witnessing a surfeit of clotting-related disorders -ranging from innocuous skin bruise seen on the foot occasionally called as "Covid toe" to the life-threatening strokes & vain blockage. The issue is clear in clots that is thrombi, it forms in a patient's arterial catheters and filters used to support the failing of kidneys. More dangerous the blood clots, more impeding of the blood circulation in the lungs and breathlessness. 2. Causes Silent Hypoxia Researchers and doctors have revealed a medical condition known as "happy" or "silent" hypoxia, in this condition the patients have extremely low blood oxygen levels in their bodies and yet do not show any symptoms of breathlessness. They are now supporting for its early detection as a means to avoid a deadly disorder called "Covid pneumonia", a malign condition found in patients that are severely affected by COVID-19. It is preceded by "silent hypoxia" that is a form of oxygen deprivation and is harder to elicit than regular hypoxia. In numerous cases the COVID-19 patients with silent hypoxia did not exhibit signs such as a lack of coughing or breaths until their oxygen levels fell to very low, at this point the risk of acute respiratory distress (ARDS) and organ failure is engendered. As the number of cases infected with Covid-19 disease reaches approximately 3 lakhs in India, it becomes more than ever important to prevent ourselves from the virus. Also, we can witness that many states are loosening the restrictions that were previously imposed to prevent the community spread. So, personal and individual hygiene is eminent [31] . Some of the measure that can ensure personal hygiene and safety are: 1. Always sanitize your hands after coming in contact with anyone. It is crucial as many cases are asymptomatic and the virus can be transferred without getting noticed. 3. Avoid going in crowd places Going in crowed places highly increases the risk of getting infected with the COVID-19, it is because social distancing is not followed in these areas. Always stay updated about the virus with trusted sources Staying updated from trusted sources will help in maintaining your welfare and safety. The different techniques and methodologies used for forecasting are given in flow chart [38] as fig.3. Case Fatality Rate (CFR) = (total deaths/total confirmed cases) *100 …(1) To control epidemiology the value of CFR should be minimum [22] . = Aggregate confirmed cases -Aggregate deaths -Aggregate recovered cases For control of pandemic the value of Cumulative active cases to be minimum with cumulative death cases as zero and cumulative recovered cases as maximum [18] . Nonlinear regression is a type of regression examination wherein information is fit to a model and afterward communicated as a numerical capacity. Basic direct relapse relates two factors (X and Y) with a straight line (y = mx + b), while nonlinear regression relates the two factors in a nonlinear (curved) relationship. The objective of the model is to make the amount of the squares as little as could really be expected. The amount of squares is an action that tracks how far the Y perceptions differ from the nonlinear (curved) function that is utilized to foresee Y. The Box-Cox transformation is principally useful family of transformations [34] [35] . Theorem: Suppose a sample of n response values . Let be a value such that 1 , 2 , …., . . Compute the set of with respect to as: Natural log is applied in the case of instead of the aforementioned formula. It = 0 helps to define the measure of normality of resulting transformation. It is meant to moderate non-normal dependent variable into normal contour. The measure computes the correlation coefficient of normal probability. Correlation is simulated for the variables of probability plot and a scale of linearity of probability plot. Vertical axis encapsulates correlation coefficient of normal probability and horizontal axis stands for the values of . This stationarity test is  applied towards the positive and negative values [36] [37] . The statistics under consideration are said to have autocorrelation whenever the response variables, at time-domain, are determined to be correlated through the values, ′ where, refers to the time increment that lies in the upcoming + ′ + events [10, 11, 14, 15] . It can be observed that within the long memory-process, autocorrelation deteriorates over time resulting in the power-law trend written as The interpretations should be uniformly-sampled. Unlike cross-correlation, ACF result in a correlation-coefficient signifying degree of resemblance of two response variables at time, . ACF used to identify non-randomness in data and propagate appropriate time-+ repressiveness when data has no chaos. Whenever ACF is applied for locating apt time successive regression, there ACF gets charted. - Basically, a unit root test to check stationarity as these unit roots can cause unpredictable results in the autoregressive models of time series analysis. Time series are different in comparison to the predictive modeling. As in modeling the assumptions exist that summary statistics of observations are consistent. In context to time series, these expectations are referred as time domain being stationary [19, 20, 33] . Time-series is taken to be stationary when it does not contain trend or seasonal effects. So, these summary statistics computed on time domain is said to be consistent over time. Thus, statistical modeling considers that stationarity in the series makes it effective [10, 12, 13] . In particular, it concludes how strongly a time series is defined by a trend. Grading goodness-of-fit (GOF) for various-distributions one can get impressions for whichever distribution is satisfactory& whichever is not. From cumulative-distributionfunction (CDF), derive histogram and the probability-density-function (PDF) [24, 25, 26] . Theorem: The measurement of discrepancies among observed and fitted values is regarded the deviation. For Poisson-responses, deviances take this form: First-term-identical towards binomial-deviance, demonstrating "twice a sum of observed times log of observed over fitted". Second term is the sum of differences between observed and fitted values, is usually zero [16, 17, 21, 27] . freedom, whenever n considers no. of observations; p for no. of parameters. Thus, deviance can be utilized directly towards testing goodness-of-fit of this model. R-squared, the statistical measure of the closeness of data to look for fitted-regressionline. It is also known referred as coefficient of determination, coefficient of multidetermination for multi-regressors. Description of R-squared is fairly simple; it is the percentage of retort-variable having variation which usually described through the regression. R-squared lies between 0 and 1 where 0 indicates that model describes none of variability of this response data around the mean. 1 determines that the prototype describes all of the variability around response data of the mean [28, 29, 30, 32] . For long term behavior, data sets of India from January 30, 2020 to October 15, 2020 is analyzed. The spread of COVID 19 in different states of India as on August 9, 2020 is shown in Fig. 4 [39] . Descriptive statistics for new cases; total cases; new deaths; total deaths; new recovered; total recovered; new active; total active and CFR is given in Table. 1 with the details of correlation and coefficient of determination in Table- Cross correlation for total recovered cases with respect to dates are shown in fig. 6 . The details of normality and white noise test for date; total recovered and date/ recovered cases are shown in Table 3 for different Statistics like Box-pieces for six degrees of freedom and 12 degrees of freedom; Liung Box for six degrees of freedom and 12 degrees of freedom; Mcleod Li for six degree of freedom and 12 degrees of freedom. Fig. 7 and total error is depicted in fig. 8 . For total recovered cases using Box Cox transformation, with lamda as zero as differencing as zero, polynomial regression is fitted with goodness of fit statistics R 2 as 0.691; adjusted R 2 as 0.69. In case of seasonal fitting goodness of fit statistics are R 2 as 0.002; adjusted R 2 as -0.046. All trends are shown in fig. 9 and fig. 10 . Fig. 11 shows the forecasted and trend analysis using ARIMA (2,1,0) model for total confirmed; total death; total recovered and total active cases with detailed values in Table 4 . fig. 13 . From the figures and tables, it is observed that total confirmed, total death, total active and total recovered cases are highly correlated. For ARIMA (2,1,0) it is observed that the value of constant is zero for all cases. Total confirmed cases; total death cases; total recovered cases and total active cases are exactly fit to forecast using ARIMA (2,1,0) model but daily deaths initially show a perturbed or random pattern which is not perfectly fitted using ARIMA (2,1,0) model. But later it is showing similar patter as forecasted using ARIMA (2,1,0). It is observed that the actual and forecasted values using model ARIMA (2,1,0) from August 3, 2020 to August 11, 2020 are providing the better results. It is concluded that ARIMA (2,1,0) model gives the best fit for long term and short-term behavior. Nonlinear regression and Gaussian mixture model also show the same trend for total cases as forecasted using ARIMA (2,1,0). It is advised as number of cases are increasing so proper cautionary and health guidelines to be strictly adhered to fight with pandemic COVID 19 to remain healthy and safe. The forecasting of COVID 19 in order to prevalence as pandemic in India play an important role for the policy makers and health department to focus on the strengthening the surveillance system and reallocating the resources. It is observed that COVID 19 data follows quadratic behavior and in long run it spreads with high peak roughly estimated in July 2020. Also, using nonlinear regression it is observed that the trend in long run follows the Gaussian mixture model. It is concluded that COVID 19 will follow secondary shock wave in the month from October 2020 end to mid November 2020. In India we are approaching towards herd immunity. Also, it is observed that the impact of pandemic will be about 441 to 465 days. Thus, it is concluded that the people should follow precautionary measures and it is better to maintain social distancing with all safety measures as the pandemic situation is not in control due to non-availability of medicines. The time series model plays the important role in the prediction and controlling of the disease. The results of the study can help the policymakers to reallocate the resources like hospitals, staff and the facilities required for the critically infected peoples. The cases everyday increasing in the country and there is a need to pay more attention and utilization of the available resources. The analysis helps in the understanding the complex nature of spread of the disease. For further research, this method can be compared by the other models like Neural Networks and machine learning. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review Ranked: Global Pandemic Preparedness by Country Consensus Algorithm. 71 AN APPARATUS AND METHOD WITH IOT TO DETECT AND CONTROL TEMPERATURE CHANGE SIMULATION CASE Date of Indian Patent Publication River water quality estimation through Artificial Intelligence conjuncted with Wavelet Decomposition. 979. Numerical Optimization in Engineering and Sciences Neuro-Fuzzy Analysis of Demonetization on NSE. 816, Soft Computing for Problem Solving Nonlinear Time Series Analysis of Environment Pollutants Wavelets and Fractal Methods with environmental applications Fractal Analysis of Indian Rhinoceros Poaching at Kaziranga Time Series Analysis of Heat Stroke, JNANABHA, Vijnana Parishad of India Data Driven Estimation of Novel COVID-19 Transmission Risks through Hybrid Soft-Computing Techniques. Chaos, Soliton and Fractals. . 140; 110152 Convection dynamics of Nanofluids for temperature and magnetic field variations Synchronization of two three-species food chain system with Beddington-DeAngelis functional response using active controllers based on the Lyapunov function Development of a Recommender System Health Mudra Using Blockchain for Prevention of Diabetes Development of Epidemiological Modeling RD_COVID-19 of Coronavirus Infectious Disease and its Numerical Simulation. Mathematical Modelling and Analysis of Infectious Disease Problems (COVID-19) Optimization Techniques. Revista INGLOMAYOR Ingeniena Global Mayor. 18 (A) Auto-Regressive Integrated Moving-Averages Model for Daily Rainfall Forecasting Evolutionary Techniques for Optimizing Air Quality Model Development of Model for Sustainable Nitrogen Dioxide Prediction Using Neuronal Networks Variability analysis in PM2.5 monitoring. Data in Brief Classification and Clustering of Time Series of Weather Data Complexity Analysis of Pathogenesis of Coronavirus Epidemiology Spread in the China region Water Quality Evaluation Using Soft Computing Method How to Remain Composed During Pandemic COVID-19 18 (B) DISPERSION ANALYSIS OF MONTHLY RAINFALL & TEMPERATURE TIME SERIES -1901-2015 Fuzziness-Randomness modeling of Plasma Disruption in First Wall of Fusion Reactor Using Type I Fuzzy Random Set An Introduction to Fuzzy Sets Evaluation of Statistical Bias Correction Methods for Numerical Weather Prediction Model (NWP) Forecasts of Maximum and Minimum Temperatures Improving precipitation forecasts skill over India using a multi-model ensemble technique Analysis and very short-range forecast of Cyclone "AILA" with radar data assimilation with rapid intermittent cycle using ARPS 3DVAR and cloud analysis techniques Assimilation of Doppler weather Radar Data in WRF model for simulation of tropical cyclone Aila Nonlinear Time series analysis of Pathogenesis of COVID-19 Epidemiology Spread in Saudi Arabia Computers Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Early Prediction of the 2019 Novel Coronavirus Outbreak in the Mainland China Based on Simple Mathematical Model A pneumonia outbreak associated with a new coronavirus of probable bat origin