key: cord-0859351-aidtsyo1 authors: Hierro-Recio, Luis Angel; Garzon-Gordon, Antonio Jose; Atienza-Montero, Pedro; Marquez, Jose Luis title: Predicting clinical needs derived from the COVID-19 pandemic: the case of Spain date: 2020-04-06 journal: nan DOI: 10.1101/2020.04.03.20051821 sha: 08826d0596a01a2a482eb6d80edb0e87cddc304e doc_id: 859351 cord_uid: aidtsyo1 Background The evolution of the pandemic caused by COVID-19, its high reproductive number and the associated clinical needs, is overwhelming national health systems. We propose a method for predicting both ICU requirements and the number of deaths, and which will enable the health authorities of the countries involved to plan the resources needed to face the pandemic as many days in advance as possible. Methods We use official Spanish data to predict ICU admissions and deaths based on the number of infections. We employ OLS to perform the econometric estimation, and through RMSE, MSE, MAPE, and SMAPE forecast performance measures we select the best lagged predictor of both dependent variables. Findings For Spain, our prediction shows that the best predictor of ICU admissions is the number of people infected eight days before, and that the best predictor of deaths is the number of people infected five days before. In the first case, we obtain a 98% coefficient of determination, and in the second a 97% coefficient. The estimated delayed elasticities find that a 1% increase in the number of cases today will imply a 0.72% increase in ICU patients eight days later and a 1.09% increase in the number of deaths five days later. Interpretation The model is not intended to analyse the epidemiology of COVID-19. Our objective is rather to estimate a leading indicator of clinical needs. Having a forecast model available several days in advance can enable governments to more effectively face the gap between needs and resources triggered by the outbreak and thus reduce the deaths caused by COVID-19. The global health crisis sparked by the COVID-19 pandemic has alerted governments worldwide. The speed of transmission caused by the high reproductive number of COVID-19 1 and the clinical needs associated with the clinical management of patients suffering from severe acute respiratory infection (SARI) 2 has led to a gap between resources and needs which may prove decisive to the survival of many patients. COVID-19 entails a great need for clinical treatment, particularly extracorporeal membrane oxygenation (EMO) to patients with acute respiratory distress syndrome (ARDS), and which is pushing existing ICU facilities to breaking point as the pandemic expands 3 . As contagion worsens in each country, the best short-term strategy for containing the spread and so preventing health services from collapsing is social distancing, principally through a quarantine imposed by confining the population, as more and more governments are gradually doing, following WHO recommendations 4 . The objective of the WHO and of national authorities is to flatten the epidemic curve 5 so that health systems retain the capacity to clinically attend to patients in hospital centres and ICUs, while the scientific community strives to develop an effective vaccine or a viable antiviral. Increasing health resources rapidly is both complex and costly. The first country to be affected, China, promoted the building of macro-hospitals. Spain and Italy, the two countries currently most affected, have tried to drastically reduce time by using existing macro-buildings such as conference and trade fair centres. This strategy reduces the cost and, particularly, the time needed to set up ICU beds, making the volume of resources required depend mainly on the supply of medical equipment and on the availability of healthcare workers (Spain set up a field hospital with hundreds of beds at the IFEMA fair and conference facility in under 24 hours, and has since continued to expand it to an expected 5.000 beds (https://elpais.com/sociedad/2020-03-21/llegan-los-primerospacientes-al-hospital-abierto-en-los-pabellones-de-ifema.html). Another issue that countries are having to face is how to deal with COVID-19 deaths. In countries where the mortality rate is higher, local funeral systems are unable to meet the cremation needs of COVID-19 deaths, posing saturation problems in the mortuaries. Italy has employed the army to distribute the corpses throughout the territory, moving them away from the most affected areas (http://www.rainews.it/dl/rainews/media/Coronavirustroppi-morti-a-Bergamo-esercito-porta-le-bare-in-altre-citta-9987ef6c-1056-47c4-862c-18e9255a541a.html#foto-1) while the authorities in Madrid have converted an ice rink into a makeshift mortuary (https://elpais.com/espana/madrid/2020-03-23/madridutilizara-como-morgue-las-instalaciones-del-palacio-de-hielo.html). In these circumstances, where time is key to determining how many lives might be saved, it is essential to have predictive models which can give health authorities a few days prior warning so as to expand their resources and adapt them to the needs they will soon be facing. Our work seeks to provide a predictive model that allows us to determine the needs of ICU beds as far in advance as possible, since admission to ICU offers around a 50% chance of survival for critically ill patients 9 . The model also helps to deal with the practical arrangements involved in dealing with so many deaths. Through its Ministry of Health website (https://www.mscbs.gob.es/), the Spanish government is providing daily data on the number of detected cases of infected people, deaths, and ICU admissions nationwide. Using officially published data, we econometrically estimate the following equations: where i=1, 2 …,10 are the number of delays of the explanatory variable, ℎ ) is the total number of deaths up to day t, ) is the number of cases detected up to date t, ) is the number of ICU admissions up to date t, and t=1, 2 …,T representing each day. In equation (2), variables are defined as log ( ℎ ) + 1) and log ( )23 + 1) to avoid missing observations from the first few days where the number of deaths was zero. For equation (1), the sample period spans from 08/03/2020 to 24/03/2020, while in equation (2) the period spans from 28/02/2020 to 24/03/2020. We left March 25 and 26 as out-of-sample observations in order to measure the forecast performance of the estimated model. The estimation sample is shorter for equation (1) due to the lack of observations. Estimation is performed through the OLS estimator. Given that our main goal is to predict ICU beds and the number of deaths using the lagged number of infected cases, after estimating equations using different lags, we select the one with the best forecast performance (which minimises forecasting errors). Using data from 25/03/2020 and 26/03/2020 as out-of-sample period, we calculated RMSE, MSE, MAPE, and SMAPE as indicators of predictive capacity. RMSE is defined as follows: where N=2 is the number of out-of-sample observations, which we use to estimate the forecast performance of our estimate, C is the estimated value of the dependent variable, and is the actual value. Secondly, MSE is defined as follows: MAPE is calculated using the following formula: All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is Finally, we select the estimate with the lagged explanatory variable that shows the lowest value in this set of indicators, and we make the corresponding prediction in total values. First, we select the model that has the best forecast performance. Table 1 shows RMSE, MSE, MAPE, and SMAPE values for the estimation of equation (1), using from 1 to 10 delays of the explanatory variable. We select the model including eight delays, since it shows the lowest value in the different indicators, which means it outperforms the other estimates in forecasting accuracy. The equation of the model that evidences the best forecast performance is the following: The coefficient = 0.7175 is what in economics is called elasticity; in this case delayed. We obtain it by deriving the function, and it represents the relationship between the variation of the dependent and independent variables: This delayed elasticity implies that a 1% increase in the number of infected cases predicts a 0.72% increase in the number of ICU patients eight days later. The estimate presents an R-square of 0.98. Our model allows us to predict the number of ICU beds to be used, taking the number of people infected eight days earlier as a reference. In this way, we can anticipate the needs in terms of the number of beds for the next few days. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is Table 2 shows the actual and estimated total number of ICU admissions for the period covered by the estimation, as well as the estimation error of the model. In addition, we include out-of-sample observations, which allows us to analyse the forecast performance of the model. To obtain the actual total number of ICU admissions, we perform the following transformation: That is, we convert the values in logs to total numbers. As the results show, the estimation error, measured as a percentage of total ICU admissions, decreases over the period, although in absolute terms it is greater. Figure 2 graphically shows these same results, including the ICU number estimated until 31/03/2020. As can be seen, the model allows us to very accurately predict the number of ICU patients. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is We proceed in the same way in the case of deaths caused by COVID-19. Table 3 shows the different forecast performance indicators calculated for the different models in equation (2) using up to 10 delays. According to the estimates, we select the model that includes the explanatory variable with five delays, since it produces the lowest values for the four different measures of forecast performance (RMSE, MSE, MAPE, and SMAPE). We select the five-day delayed model, since it shows the best forecast performance according to the indicators. The equation of the model that presents the best forecast performance is: log( ℎ ) + 1) V = −2.7157 + 1.0896 * log ( )2c + 1) + ) the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is The delayed elasticity function is as follows: showing that a 1% increase in the number of infected cases detected predicts a 1.09% increase in the number of deaths six days later. The estimate presents a high goodness of fit, with an R-square of 0.97. Figure 3 displays the estimate of the model with respect to the actual values, both in logs. As was the case with ICU numbers, we find that residuals decrease throughout the estimation period, so that the estimate values at the end of the sample are closer to the actual values. Table 4 shows the number of actual and estimated deaths, as well as the errors for each time period. To obtain the total number of deaths, we carry out the following transformation: Results show that the prediction error, in percentage terms, decreases over time. As for total values, it increases at the beginning of the period and then subsequently decreases. As regards the predicted out-of-sample values, we observe that the error is very small, the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is In Figure 4 , we see the trend in the actual number of deaths and their estimates in graphical terms. We also include the prediction of deaths obtained by the model up to 31/03/2020. The estimates show a high goodness of fit, presenting an R-square of 0.97, in addition to generating a prediction with a very low error, as shown in Table 3 . the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is To date, predictive models for COVID-19 have focused on predicting changes from exponential adjustments for the growth in the infected population, without a predefined contagion model 6, 7 or with a predefined contagion model 8 . These types of adjustments are highly dependent on the estimated equation, and lose their predictive accuracy when the growth rate of infections decreases. They also fail to determine the most statistically efficient maximum prediction time 6 . This makes them ineffective for prediction and prevents health authorities from obtaining reliable forecasts. Our strategy consists of performing simple delayed regressions of the number of patients affected in ICUs, and deaths and by selecting the equation corresponding to the delay that provides the most statistically efficient prediction. By foregoing any epidemiological, or indeed any other type of objective, and by avoiding predetermining the function which is to be estimated, we are able to focus solely and exclusively on finding the best predictor. The result of this simple strategy is a model with a very high predictive accuracy, offering an eight-day advance for ICU admissions and a five-day advance for deaths. This advanced forecast is what can allow authorities to improve resource planning to face the clinical needs-health resources gap caused by the epidemic. Clearly, the findings are valid for Spain but not for other countries. However, the method is replicable to any country, state, or region. In addition, it is applicable to any type of health needs, provided we have data on their consumption. It can be applied to anticipate the needs of health staff, hospital beds, medicine, equipment ..., and can also be applied to anticipate other clinical situations apart from deaths. The method does, nevertheless, evidence certain shortcomings that must be taken into account and that basically affect the available data. The method only uses one independent variable, namely those officially declared as being infected, and which thus relies on the possibility of actually applying detection tests. It also depends on health authorities being truthful in their information policy. In Spain, few detection tests were carried out in the early stages of the outbreak, such that patients displaying mild symptoms were not tested. Large-scale testing of mild cases began around 20/03/2020 (https://www.lavozdegalicia.es/noticia/pontevedra/2020/03/20/test-rapidos-salir-cocheinician-hoy-cita-previa/0003_202003P20C1992.htm), although not throughout the whole country. Obviously, if the number of tests for possible COVID-19 infected cases were to be increased, the model would tend to overestimate ICUs and deaths and would need to be recalibrated. In other words, any significant change in the number of tests would affect our independent variable, the number of those declared as being infected, and would have consequences for the predictive performance of both independent variables. Moreover, a second issue arises in one of the dependent variables. The number of patients in the UCIs represents the actual number of patients there, but not the number of patients who are in need of intensive care. If ICUs are swamped, medical staff will be forced to ethically screen critical patients 10 , such that data will cease to be representative of clinical needs and will become representative of the limited resources available. If we consider that this need to screen will increase the more the pandemic spreads and the greater the number of patients in need of ICU care, the model may be underestimating ICU needs as infection expands. Moreover, any improvement in clinical pharmacological treatment using antivirals in the pre-ICU admission stage will entail an overestimation of needs. A third problem concerns the communication of data. Given the high reproductive number of COVID-19, contagion is growing exponentially, and having accurate daily data available is essential. When 800 patients die every day, at a rate of 33 deaths per hour, reporting the number of deaths inaccurately can lead to fundamental errors in the estimates. The health authorities must establish a specific time to report the numbers and must keep this stable, added to which health centres must prepare the counts accurately and ensure that they reflect the established time periods. In Spain, it was decided to close the daily count at 9.00 pm, and this is a time which must remain fixed until the pandemic is over. Obviously, public information must be truthful and any cases of UCIs being overwhelmed must be reported. The effect of data issues can be corrected in two ways: by increasing the variables whose numbers are gathered and published (for example, by publishing daily figures of the tests carried out and of their results or through the numbers of patients not admitted to UCIs as a result of previous ethical screening), and by recalibrating estimates if they lose their predictive accuracy. The first alternative would enable us to replace the variables or to correct them prior to performing estimations. The second alternative would allow us to refine the predictions recursively when the model loses its predictability. Whatever the case, this latter alternative can be performed daily given the ease of the method and the small amount of additional information required. All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is On a final note, it should be remembered that our work does not pursue any purpose regarding epidemiological analysis, but merely seeks to offer a predictive method which can enhance the planning capacity of health authorities, who are facing the greatest global health challenge the world has seen in the last 100 years. The reproductive number of COVID-19 is higher compared to SARS coronavirus Clinical management of severe acute respiratory infection (SARI) when COVID-19 disease is suspected: interim guidance (ncov)-infection-is-suspected Planning and provision of ECMO services for severe ARDS during the COVID-19 pandemic and other outbreaks of emerging infectious diseases Responding to community spread of COVID-19: interim guidance How will country-based mitigation measures influence the course of the COVID-19 epidemic? COVID-19 and Italy: what next? Real-time estimation of the risk of death from novel coronavirus (COVID-19) infection: Inference using exported cases Optimization Method for Forecasting Confirmed Cases of COVID-19 in China Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Respiratory Medicine 2020 Recomendaciones éticas para la toma de decisiones en la situación excepcional de crisis por pandemia covid-19 en las unidades de cuidados intensivos Predicting clinical needs derived from the COVID-19 pandemic: the case of Spain Version 1.0 -27th March 2020