key: cord-303523-m16vlv1q authors: Ogundokun, R. O.; Awotunde, J. B. title: MACHINE LEARNING PREDICTION FOR COVID 19 PANDEMIC IN INDIA date: 2020-05-26 journal: nan DOI: 10.1101/2020.05.20.20107847 sha: doc_id: 303523 cord_uid: m16vlv1q Background: Coronavirus was detected in December 2019 in a bulk seafood shop in Wuhan, China. The original incident of COVID-19 pandemic in India was conveyed on 30th January 2020 instigating from the nation called china. As of 25th April 2020, the Ministry of Health and Family Welfare has established a total of 24, 942 incidents, 5, 210 recuperation including 1 relocation, and 779 demises in the republic. Objective: The objective of the paper is to formulate a simple average aggregated machine learning method to predict the number, size, and length of COVID-19 cases extent and wind-up period crosswise India. Method: This study examined the datasets via the Autoregressive Integrated Moving Average Model (ARIMA). The study also built a simple mean aggregated method established on the performance of 3 regression techniques such as Support Vector Regression (SVR, NN, and LR), Neural Network, and Linear Regression. Result: The results showed that COVID-19 disease can correctly be predicted. The result of the prediction shows that COVID-19 ailment could be conveyed through water and air ecological variables and so preventives measures such as social distancing, wearing of mask and hand gloves, staying at home can help to avert the circulation of the sickness thereby resulting in reduced active cases and even mortality. Conclusion: It was established that the projected method outperformed when likened to previously obtainable practical models on the bases of prediction precision. Hence, putting in place the preventive measures can effectively manage the spread of COVID-19, and also the death rate will be reduced and eventually be over in India and other nations. Coronaviruses are a wide intimate of diseases, few of which lead to disease in humans and the remaining which mingle between animals and natures. Animal coronaviruses will occasionally transmit to individuals and only transmit to humans [1] . In recent years, zoonotic coronaviruses have formed triggering humanoid outbursts for instance coronavirus ailment 2019 (COVID- 19) , severe acute respiratory syndrome (SARS), as well as a respiratory syndrome in the Middle East (MERS). The human disease occurs often as a lung infection, or occasionally as a stomach infection. The clinical range of disease ranges from no signs or moderate breathing problems to extreme, increasingly progressing pneumonia, severe breathing suffering condition, infected tremor, or death-induced multiple-body part catastrophe [2] . As of 25 April 2020, India registered more than 24, 942 established crisis of . Around 5, 210 individuals are now in good health from these, whereas 779 crises have led to death. The sum of individuals afflicted with the disease was increasing in the South Asian nation which led to the government swinging into action to further curb the outbreak's spread. As of mid-April 2020, more than two million cases of coronavirus have been identified worldwide. A Sikh preacher who, bearing the virus, returned from traveling to Italy and Germany, became a "mega propagator". He was present at a Sikh commemoration in Anandpur Sahib from 10th to 12th of March 2020 [2, 3] . 27 cases of COVID-19 were backtracked in the direction of him [4] . On 27 March more than 40,000 residents were quarantined in 20 villages in Punjab to control the circulation of the disease [3, 5] . On 31st of March, an occasion of a spiritual assembled group of people in Tablighi Jamaat that happened in Delhi at the beginning of March arose as a novel disease asperity next multiple crisis around the world were backtracked toward the occasion [6] . More than 9 thousand proselytizers might have joined the service, accompanying the bulk arriving from different nations in India [7, 8] including nine hundred and sixty from forty republics abroad [9] . As reported by the Office of Health and Family Wellbeing, this incident was related to 4,291 out of 14,378 confirmed cases in twenty-three Indian cities including unified terrains before the 18th of April 2020 [10] . On the 6th of April 2020, 26 nurses and 3 physicians were found to have been diagnosed with the virus in Mumbai's Wockhardt Hospital. Hospitals were locked down immediately, and a safety region was professed. Health treatment incompetence was responsible for the infections [11] . In India, COVID-19 infection rates are estimated towards 1.7, slightly lesser than in the worst-pretentious nations [12] . The outburst had proclaimed a widespread in over 12 nations as well as unification terrains, where requirements of the 1897 Infectious Ailments Act were applied, and public institutions and other business enterprises were on standstill. Indian had revoked entirely every traveler entry permit since most reported cases have been related to other countries [13] . On 22 March 2020, at Prime Minister Narendra Modi's say, India implemented a 14-aera community restriction. The government responded with movement restriction in 75 areas wherever COVID-19 events and all major cities had taken place [14, 15] . Also, the Prime Minister ordered a 21-day national shutdown on 24 March, affecting India's entire 1.3 billion population [16, 17] . The Prime Minister extended the current national lockout until 3 May on 14 April [18] . Michael Ryan, Chief Executive Officer of the Healthiness Emergencies Program of the World Health Organization, said that India has "great potential" to cope with COVID-19 epidemic as per the next most populated nation, would possess a significant effect proceeding the ability of the globe to cope with the disease [19] . Many analysts were apprehensive concerning the financial destruction instigated via the lockout, having immense consequences on migrant employees, large as well as trivial initiatives, agriculturalists, and entrepreneurial individuals, who were abandoned without an income in the nonappearance of transport and consumer contact [20, 21] . Spectators noted the lockout reduced the pandemic's progress frequency by 6 April to double every six days [22] and by 18 April to double every eight days [23] . Founded on statistics from seventy-three nations, the Oxford Covid-19 Régime Response Tracker (OxCGRT) states that the Indian Government has reacted to the pandemic more rigorously than other nations. This acknowledged the fast intervention of the government, emergency policymaking emergency healthcare spending, budgetary initiatives, expenditure in vaccine development and successful reaction towards the crisis, and rated India with a "100" aimed at her firmness [24, 25] . India registered the first COVID-19 case in Kerala on 30 January, which grew to 3 incidences by 3rd February 2020; altogether were undergraduates who came back from Wuhan, China [26, 27] . The remainder of February showed no noticeable increase in events. On the 4th of March 2020, twenty-two fresh incidence was discovered together with those groups of visitors from Italy involving 14 of the participants contaminated [28] . The spread intensified in mid-March, amid news of multiple cases around the world, several of which were related to individuals with itinerant antiquity to pretentious nations. On the 12th of March, 2020 an individual of age 76 who had come back from Saudi Arabia turned out to be the country's primary survivor of the COVID-19 disease [29] . Confirmed deaths reached a hundred on 15th March 2020, [30] 1thousand by 28th March 2020, [31] 5 thousand by 7th of April 2020, [32] 10 thousand by 14th April 2020 [33] . Demise toll passed 50 by 1st April [34] then 100 by 5th April [35] . With COVID-19 no additional antiviral therapy is recommended. Infected patients should provide medical medication to help in pain relief. In extreme cases, the function of vital organs should be protected. [24] SARS-CoV-2 is currently not available as a vaccine. Evitement is the primary form of deterrence. Numerous international projects have quickly arisen to identify and evaluate the efficacy of antivirals, immunotherapies, monoclonal antibodies, and vaccines. Pharmacotherapy Protocols and Reviews for COVID-19 have been written. [25, [36] [37] [38] . Transmission of infectious disease is a dynamic mechanism of transmission that happens within the crowd. Frameworks can be developed for this method to potentially examine and test the propagation mechanism of infectious diseases [39] so that we can forecast correctly the future pattern of infectious diseases [40] . Therefore, to monitor or reduce the damage of infectious diseases, the study and review of predictive models for infectious diseases have been a hot topic of science [41] . Therefore, the study projected a simple average aggregated scheme, and the aggregated system has been established by aggregating three regression methods which include Support Vector Regression, Linear Regression as well as Artificial Neural Network. The variables utilized for the formulation of an aggregated method are the figures of COVID-19 cases. The dataset was gathered and collated from Statistica.com from January to April 2020 gathered monthly. The study predicted the values from the previous COVID-19 incidents and values of environment variables such as Water and Air. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The authentic datasets of COVID-19 have been gathered from https://www.mygov.in/ and https://www.pharmaceutical-technology.com/ the dataset is publicly available on cases from India from the first case index on January 30 2020. The datasets gathered were in a monthly form that is January 2020 to April 2020. Table 1 displays the scenario of COVID-19 incidents in India from January 2020 to April 2020. As at 25th April 2020 COVID-19 dataset includes accumulated 408, 658 total samples, confirmed cases of 24, 942, recovered cases of 5, 209; 779 death cases and 1 migration. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107847 doi: medRxiv preprint There are numerous diverse methods utilized to perform machine learning tasks. Machine learning approaches require certain types of algorithmic approaches. According to Dataquest, 2020 , there are three types of machine learning algorithms and they are: i. Supervised learning algorithms such as Classification, Regression, and Ensemble ii. Unsupervised learning algorithm such as Association, Clustering, Dimensionality reduction iii. Reinforcement learning The utmost extensively ANNs utilized in the estimation problem is multi-layer perceptions (MLPs), which employs the solitary tiers feed-forward network [43] . This method is categorized by a system of 3 layers. The nodes in several tiers are as well identified as altering fundamentals. The outcome of the method is calculated employing the subsequent mathematical expression In statistics, linear regression is the scheme for demonstrating the association amid the scalar or reliant variables and solitary or additional self-determining variables [44, 45, 46, 47, 48] . The circumstance of the descriptive variables is termed a simple linear regression while in the circumstance of greater than single descriptive variables the procedure is referred to as multiple linear regressions. The mathematical formulation of the linear regression is as specified beneath. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . Support vector machines (SVM) are a supervised learning scheme which could be employed for the classification as well as regression glitches (Adikari et al., 2012; Shanthi et al., 2012) . The core impression of SVM once employed to dual categorization difficulties is to discover an established manic level that excellently splits the 2 assumed clusters of tutoring models. In circumstances where data points are not linearly divisible, a lenient border manic level classifier is created as below To employ SVMR, as in the categorization difficulty this belongs to a few clusters, for instance, A1 and A2 in the circumstance of regression and support vector machine at this point is the actual figure and additional variables are equivalent as for the categorization glitches [45] . Regression techniques are one of the prevailing methods for the prediction of specific datasets. In this study, the authors formulated a simple mean aggregated method by combining 3 popular regression models and predicted the sum of COVID-19 in India. Authors such as [49] and [50] have employed the regression method predominantly to communicable datasets and as well hypotheses an ensemble model typically with 3 estimation approaches. There is a key challenge in the study of prediction predominantly in the relation of predicting an occurrence of a specific illness as it had been stated earlier. 1. Employing the independent variables, this revert them to discover the estimation by employing SVR, LR, and ANN . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. ܰ is the overall scope of crises. This section describes the objective of finding the equivalence between the figures of COVID-19 cases with environmental variables such as water (sewage overflow) and air (air streams or wind). In this study, ecological variables play an important role in the spread of COVID-19 diseases. To accomplish this objective the authors have applied a statistical P-value test to determine the equivalence between the figure of COVID-19 cases. After examination by the Pvalue test, it was concluded that water is a protuberant ecological variable for the incidence of COVID-19 incident in India. Many authors have found the equivalence between infectious diseases and ecological variables by P-value test [51, 52] . The application of the P-value test accomplishes that ecological variables like water and air had a positive equivalence with the occurrence of several cases for the period since this disease started. The statistical implication was measured p<0.05. The study discards the null hypothesis. The equivalence of COVID-19 cases with water and air is substantial table 2. Tables 8-11 displays the evaluation of the prediction precision in terms of root mean square error, mean scaled error and mean absolute error. For the confirmation of our postulated aggregated system, one instantaneous series dataset was utilized in this research. These are the numeral of COVID-19 crises in India and these datasets were gathered on Statistica.com. The explanation of period series datasets is obtainable in Table 12 . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . The figures of all these errors are anticipated is to be as low as probable for improved prediction accuracy. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . Figure of recorded COVID-19 crises as of January 2020 to April 2020 As a substitute for epidemiologic spread procedure, the study employed 3 aggregated methods SVR, NN, and LR to predict the instantaneous movement of the conveyance dynamics and generate the real-time predictions of COVID-19 disease transversely the metropolises of India. The finding presented that the precision of estimation and succeeding multiple-process prediction was extreme. It was revealed that predicting enhanced when aggregating 3 regression models together compared to when used individually. The postulated system appraised the possible period when there will be decreasing evolution of fresh established cases crosswise the metropolises and the extent of COVID-19 disease across India and the minute the figure of the accrued established incident of COVID-19 would spread to the upland of their accrued crises. The core intention of this investigation is to formulate a simple mean aggregated method for the estimation of COVID-19 disease in India. To formulate this, the study first computed the revert figures of the COVID-19 disease from the whole separate regression procedure followed by their amalgamation in an aggregated method. In the second stage of formulating the method, the study intention is to decrease the prediction errors of COVID-19 in India and these errors include RMSE, MAE, and MAPE. Each time a certain method gives a greater prediction error compared to alternative methods its weightiness in the aggregate is diminished and vice versa. This study has joined the 3 employed methods and presented the data for COVID-19 disease in India as stated below. In this study, the formulation of aggregated methods illustrates a substantial enhancement in the prediction of the COVID-19 disease in India. The postulated aggregated method is likened with the separate prediction of these methods. The attained COVID-19 prediction precision attained for the entire approach is represented in the table 8-10. The obtainable outcome of COVID-19 disease is shown in Table 13 which demonstrates that our postulated aggregated system delivered the least prediction error amid the entire separate fitted methods. The study delivered a substantial enhancement in prediction precisions for COVID-19 disease in India when the postulated aggregated system was employed. If the datasets are dependable and there exist no subsequent outbreak, the aggregated methods predicted that COVID-19 outburst in India could end by May ending. The aggregated methods permit entering the interferences information as well as examining the influence of interferences on the extent of the disease outburst and the ending period if the COVID-19 epidemic. Employing the assumed original extents of the COVID-19 outburst, the study utilized the aggregated methods (SVR, NN, and LR) with identified framework and model to evaluate the extent of the outburst in time to come and excite the influence of the interferences on the magnitude and asperity of the pandemic. Approbatory to the individually used methods for the conveyance of the COVID-19 diseases, the aggregated methods (SVR, NN, and LR) provide real-time predicting instruments used for shaping and tracking COVID-19 disease in India, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107847 doi: medRxiv preprint reckoning the COVID-19 disease, obtaining COVID-19 disease asperity, predicting the extent of the pandemic together with supporting government and health staffs to constitute strategy and competent verdicts towards the eradication of the COVID-19 diseases in India. The integration of the prediction from diverse methods substantially decreases prediction errors and consequently makes available advanced precision. A few decades back, several researchers' studies have suggested several statistical methods. The study postulated a simple-mean aggregated method for the prediction of COVID-19 disease in India. The projected aggregated approach investigated separate predictions and therefore the prediction of COVID-19 disease in India is formulated with 3 very recognized methods: Neural Network, Support Vector Regression, and Linear Regression. The result indicates that the postulated method outperformed based on prediction precision of the COVID-19 disease in India when likened to currently existing applicable methods. In the future the competence of the postulated method could be as well as be investigated and some other regression models or algorithms can be used and evaluated. The authentic datasets of COVID-19 have been gathered from https://www.mygov.in/ and https://www.pharmaceutical-technology.com/ the dataset is publicly available on cases from India from the first case index on January 30, 2020 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.20.20107847 doi: medRxiv preprint Effectiveness of N95 respirators versus surgical masks in protecting health care workers from acute respiratory infection: a systematic review and meta-analysis Effectiveness of Surgical and Cotton Masks in Blocking SARS-CoV-2: A Controlled Comparison in 4 Patients Coronavirus disease 2019 (COVID-19) Situation Report -48. World Health Organization Novel Coronavirus (2019-nCoV) in the U.S. Centers for Disease Control and Prevention (CDC) Wuhan Virus: What Clinicians Need to Know US now has more coronavirus cases than either China or Italy MMWR Morb Mortal Wkly Rep Northern California coronavirus patient wasn't tested for days. The Washington Post Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster COVID-19): COVID-19 Situation Summary Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand CDC: First Person-to-Person Spread of Novel Coronavirus in US reports its first case of person-to-person transmission. The New York Times COVID-19): People at Higher Risk Novel Coronavirus The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention Lost Sense of Smell May Be Peculiar Clue to Coronavirus Infection. The New York Times Epidemiological Characteristics of 2143 Pediatric Patients With 2019 Coronavirus Disease in China Clinical and epidemiological features of 36 children with coronavirus disease 2019 (COVID-19) in Zhejiang, China: an observational cohort study Interim Healthcare Infection Prevention and Control Recommendations for Patients Under Investigation for 2019 Novel Coronavirus Novel Coronavirus COVID-19 Treatment: A Review of Early and Emerging Options. Open Forum Infectious Diseases (OFID) Novel Coronavirus Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1 Stability of SARS-CoV-2 in different environmental conditions. The Lancet Microbe Virological assessment of hospitalized patients with COVID-2019 Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Viral dynamics in mild and severe cases of COVID-19 Presumed Asymptomatic Carrier Transmission of COVID-19 A familial cluster of infection associated with the 2019 novel coronavirus indicating potential person-to-person transmission during the incubation period SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients Respiratory virus shedding in exhaled breath and efficacy of face masks Infectious Diseases Society of America Guidelines on the Treatment and Management of Patients with COVID-19 Pharmacologic Treatments for Coronavirus Disease 2019 (COVID-19): A Review Review of Emerging Pharmacotherapy for the Treatment of Coronavirus Disease Analysis and projections of transmission dynamics of nCoV in Wuhan Reporting, epidemic growth, and reproduction numbers for the 2019 novel coronavirus (2019-nCoV) epidemic Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model Retrieved on 24 th Forecasting strong seasonal time series with artificial neural networks Dimensionality Reduction for Indexing Time Series Based on the Minimum Distance Work Watermarking MPEG-4 2D mesh animation with time-series analysis An evaluation of neural network ensembles and model selection for time series prediction Prediction of blood glucose concentration ahead of time with feature based neural network An Empirical Application of Linear Regression Method and Fir Individual versus super ensemble forecasts of seasonal influenza outbreaks in the United States Ensemble method for dengue prediction Classification of dengue illness based on readily available laboratory data Predicting the severity of dengue fever in children on admission based on clinical features and laboratory indicators: Application of classification tree analysis The author declared that they didn't receive any funds from any organization. Authors declared that there is no conflict of interest