key: cord-0707170-oqpb0c7o
authors: Tavakoli, Mahdieh; Tavakkoli-Moghaddam, Reza; Mesbahi, Reza; Ghanavati-Nejad, Mohssen; Tajally, Amirreza
title: Simulation of the COVID-19 patient flow and investigation of the future patient arrival using a time-series prediction model: a real-case study
date: 2022-02-12
journal: Med Biol Eng Comput
DOI: 10.1007/s11517-022-02525-z
sha: a930f949ace549a7648ef2bd71c570d8143090e7
doc_id: 707170
cord_uid: oqpb0c7o

COVID-19 looks to be the worst pandemic disease in the last decades due to its number of infected people, deaths, and the staggering demand for healthcare services, especially hospitals. The first and most important step is to identify the patient flow through a certain process. For the second step, there is a crucial need for predicting the future patient arrivals for planning especially at the administrative level of a hospital. This study aims to first simulate the patient flow process and then predict the future entry of patients in a hospital as the case study. Also, according to the system status, this study suggests some policies based on different probable scenarios and assesses the outcome of each decision to improve the policies. The simulation model is conducted by Arena.15 software. The seasonal auto-regressive integrated moving average (SARIMA) model is used for patient’s arrival prediction within 30 days. Different scenarios are evaluated through a data envelopment analysis (DEA) method. The simulation model runs for predicted patient’s arrival for the least efficient scenario and the outputs compare the base run scenario. Results show that the system collapses after 14 days according to the predictions and simulation and the bottleneck of the ICU and CCU departments becomes problematic. Hospitals can use simulation and also prediction tools to avoid the crisis to plan for the future in the pandemic. GRAPHICAL ABSTRACT: [Image: see text]

COVID-19 looks to be the worst pandemic disease in the last decades due to its number of infected people, deaths, and also the staggering demand for healthcare services, especially hospitals. Based on Worldometer's COVID-19 data reports, the number of confirmed infected people is growing every day. As reported on 7 May 2021, the total number of worldwide cases of coronavirus is 157,049,695. Of these, 3,274,689 (2%) people were dead and 134,416,166 (85%) were recovered, and the number of active patients is 19,358,840 (13%). 1 As COVID-19 is a global public health emergency, hospitals are the most important organizations that can help to avoid the destructive effects of this pandemic.

The first and most important step in facing this pandemic is to identify the suspected patients who enter the hospital whether they are infected. To do this, all patients go through a certain process. In this case, if they are not infected, they will not enter the treatment system, but if they are infected, they need to go through certain steps in the hospital, based on their symptom level. So, the COVID-19 patient journey in hospitals should be specified to find the bottlenecks in this process. According to circumstance happened during the pandemic, there is a critical need to change some resources (e.g., nurses, physicians, and facilities) or planning in different process of hospitals to serve the patient in a stable situation [40] . Usually, understanding the entire process for a patient's journey is difficult. In this case, researchers most use queue theory, Petri nets, and simulation for clarifying the structure of patient processes [8, 17] . In this study, we use the simulation tool for visualizing the suspected COVID-19 patient's journey process.

To improve the performance of hospitals as a system, it is necessary to have a dynamic understanding of it. To achieve such an understanding, simulation provides an ideal tool for determining and allocating the capacity needed to respond to demand in a timely manner and minimize delay. Also, simulation is a convenient alternative with less time and more cost than most traditional statistical methods [7] . In general, discrete event simulation models that have been used in various studies to analyze health care delivery systems mainly focus on two areas (a) optimizing the flow of patients in different departments and (b) allocating resources to improve services. In optimizing the flow of patients, the goal is to improve the output of patients and reduce waiting times, and the second area is to improve the use of resources and determine the amount of resources needed (physical and human) to provide quality services [5] .

For the second step, there is a crucial need for predicting the future trend to be ready for facing the challenges and making preparations especially for the administrative levels [39] . Developing some accurate models for predicting the infected ones in the future can help decision-makers suggest appropriate policies. Besides, it is important to assess the effectiveness and impacts of every policy before running [35] . Predicting the ones who will be infected in the future can help to obtain the pressure in the treatment process and plan for avoiding the overloading through the number of nurses, physicians, beds, etc. However, statistics show a high degree of uncertainty in COVID-19 infected people's behavior [47] . Thus, advanced and accurate predictive models are essential [32] . Machine learning (ML) has recently been used for prediction models of COVID-19 with a high-level ability and reliability that most researchers have acclaimed [9, 11, 19, 33] . Although some researchers used ML for other former pandemics, such as H1N1 influenza, Ebola, dengue fever, and swine fever [2, 13, 21, 37] , using ML for COVID-19 outbreak prediction is rare in researches and it is not saturated. Since there is no more information about the probable effective parameters on patient arrival, the seasonal auto-regressive integrated moving average (SARIMA) timeseries model will be used to predict the number of COVID-19 patients who will arrive in the next few days [6, 18] .

For this aim, in this study, we try to first predict the future entry of patients in the case study hospital. Then, according to the system status, we suggest some policies based on different probable scenarios and assess the outcome of each scenario to help the policy's decision-making. So, the current study has six steps as follows:

• Simulating the current process for COVID-19 suspected patient arrival in the case study hospital • Investigating the outputs for the current process, such as total time, average waiting time, discharged patient ratio, and the cost • Predicting the patient entry within two months later using machine learning algorithms • Evaluating and comparing different scenarios • Running the worst scenario by considering the predicted patient arrival to simulate the pessimistic situation in the future • Analyzing the system in the pessimistic situation and suggesting some solutions

The rest of this paper is organized as follows. The related literature and studies are reviewed in Sect. 2. Methods and materials are explained in Sect. 3. The hospital case study is described completely in Sect. 4. Section 5 describes simulation models of the current process and scenarios as results. Evaluating the scenarios and simulation of predicted process are discussed in Sect. 6. Finally, the conclusion and future suggestions are discussed in Sect. 7.

This study includes two research streams. First, the reviewed studies focus on simulation models for different processes especially in hospitals. Then, the prediction models in healthcare services in the literature are investigated. In the first stream, simulation techniques have been used in most papers in hospital processes due to their ease of inception and also less costly in the first stream since they can optimize the process while risks are decreasing (Diaz & Dawson, 2020a) . Also, sometimes, simulation models can help the decision-makers to find the best scenario or policy for aiming the optimum outputs in different goals, such as resource allocation, process sequence, and order of activities [45, 46] . For example, Azadeh et al. [4] used a simulation approach for finding the best and optimum policy for maintenance. They first simulated the maintenance system based on historical data and then used a Taguchi method for evaluating different scenarios for the system and calculated output values for each scenario. They evaluated the efficiency of each scenario through a data envelopment analysis (DEA) method and selected an optimal scenario. Pan et al. [26] simulated an ophthalmic specialist outpatient clinic in Singapore. They focused on patient and information flow. Finally, they proposed several improving strategies to decrease turnaround time and analyzed the scenarios via the design of experiment (DOE) method. (Diaz & Dawson, 2020a ) used simulation for a COVID-19 resuscitation process in a 47-bed pediatric emergency department over 2 weeks. They considered the arrival of patients, resuscitation, and disposition of patients besides the facilities and staff in their simulation model. They could understand which changes can lead to a more efficient process by comparing the outputs before and after each change. Finally, they concluded the optimal room layout, number of equipment, and staff.

Alban et al. [1] used stochastic process simulation for ICU capacity management during the COVID-19 pandemic.

They simulate an ICU for the patients who are COVI-19 and non-COVID-19. They assessed the increase in COVID-19 patient entry during the pandemic to help a hospital manager for a better management decision. They defined a stochastic queuing model for patient flow. They finally investigated the impact of each decision such as needing to transfer a patient to other hospitals and decreasing bed occupancy rate based on the service level output. Zeinalnezhad et al. [43] used simulation techniques of a heart clinic during the COVID-19 pandemic. According to bottlenecks available in the base process, three scenarios were proposed for improving the waiting time of the process as the target variable. They used timed colored Petri nets for workflow simulation. Finally, they compared the three scenarios waiting time output and chose the best strategy. Melman et al. [24] tried to balance the hospital resources during the COVID-19 pandemic while using the discrete-event simulation model. They used data of COVID-19 patient flow for a hospital in the UK. They proposed three different resource allocation scenarios and evaluated them with simulation run outputs.

Recently, the papers tried to combine some other techniques besides the simulation tools to improve the outputs and close them to reality. For example, Kovalchuk et al. [20] used simulation for patient flow in acute coronary syndrome (ACS) unit. They also combined some machine learning approaches to identify each patient path class. So, classifying the patient, they could improve the length of stay of patients. Ordu et al. [25] proposed the integrated forecasting-simulation-optimization approach in a hospital to help the managers for their resource allocation problem. So, they first predicted the professions demand in hospital, then in the second step, simulated the patient journey in a case study hospital, and finally, developed an optimization model for bed and staff allocation based on the outputs of the two previous steps. They expressed that their proposed model could be as a decision support system. Sasanfar et al. [36] simulated the emergency department (ED) of a hospital to find the best resource allocation policy. They simulated an ED in a case study hospital in Iran and could decrease waiting times 23.1% (3.31 min) and 81.7% (10.58 min) for internal and emergent patients, respectively. Pereira et al. [29] evaluated a public hospital efficiency while using a DEA with simulation. They used a Monte Carlo method to model the hospital supply chain and defined several providers and then evaluated them using the DEA to find the target area. Finally, they could justify the target scenario for their case study hospital. Teberga Campos et al. [10] tried to simulate the COVID-19 infected ones' pattern to inform hospitals and carried out their simulation results in a case study. So, they could improve patient waiting time, movement intensity, length of stay (LOS), and either adoption rate.

In the second stream, some studies used the prediction models such as the time-series model and regression models.

In this stream, Tomar & Gupta [39] tried to predict the COVID-19 spread in India and used long short-term memory (LSTM) and curve fitting to predict the COVID-19 cases in India within 30 days based on the available data. The effectiveness and related results for solutions (e.g., isolation and lockdown) were investigated. Heo et al. [15] tried to predict and monitor military hospitals in South Korea. They aimed to select the important patients due to medical resource shortage. An application gathers some information about age,body temperature; pre-disease physical status; history of cardiovascular disease; hypertension; visit to a region with an outbreak; and symptoms of chills, feverishness, dyspnea, and lethargy. So, important patients were selected through prediction models. Ardabili et al. [3] used machine learning techniques to predict COVID-19 outbreak and used several machine learning and mathematical model (e.g., logistic, linear, logarithmic, quadratic, cubic, compound, power, and exponential) for predicting COVID-19 outbreak and compared them. Multi-layered perceptron (MLP), adaptive network-based fuzzy inference system (ANFIS), and finally, time series were employed for prediction as ML techniques. Their results showed that ML models have fewer errors for prediction and are more powerful.

Besides, some studies focused on new COVID-19 cases in the future by predicting using the risk factors or the arrivals history. They used different machine learning techniques to predict how many infected cases may occur in future days [16] ,G. [45, 46] . Several recent papers tried to predict new cases through time series. For this aim, Zeroual et al. [44] forecasted the COVID-19 patients based on time series. They tried to predict the new cases infected in short term to be used by resource managers. They used five methods for time-series forecasting considering recent historical data of infected and recovered cases in the USA, China, France, Spain, and Italy. Finally, they compared the error metrics of each model such as RMSE and MAE. Maleki et al. [22] also predicted the COVID-19 new confirmed and recovered cases through time-series modeling. To this aim, they used autoregressive models in time series as TP-SMN (two pieces-scale mixture normal distributions). Since the infected cases trend and then their entrance to a special hospital includes uncertainty, some researchers tried to forecast new cases using time-series models while uncertainty was considered. In this regard, Ye & Yang [42] predicted the future cases of COVID-19 in China through time series in an uncertain environment. Based on their result, the prediction accuracy was greater compared to the classical time series. In addition to the time-series models, other prediction models were used for the new case prediction, such as the linear regression model. For instance, Rath et al. [31] used a multiple linear regression model for the new active cases of COVID-19 prediction based on a WHO data set. They compared the results of their method with a simple linear regression. Roy et al. [34] used the additive regression model for infected cases all over the world based on global data for each country and compared them to help the economic evaluation for countries based on their future infected people. However, support vector machine (SVM) models were also used for forecasting. Parbat & Chakraborty [27] used an SVM model for COVID-19 case prediction based on the data of the recent 2 months. They could develop a prediction model with 97% accuracy for all cases of infection, deaths, and recoveries.

Based on the reviewed literature, no study has used the integrated simulation, DEA, and time-series models, especially in the COVID-19 patient process. For instance, Diaz & Dawson (14, Alban et al. [1] , Melman et al. [24] , and Teberga Campos et al. [10] only used simulation tools for optimal layout when they did not have any approach for deciding optimality. Besides, Zeinalnezhad et al. [43] and Pan et al. [26] first simulated their case study processes and then used Petri nets and DOE for analyzing the identified scenarios. In the literature reviewed, only Azadeh et al. [4] and Pereira et al. [29] tried to evaluate the scenarios by the DEA based on the simulation outputs that we carry out in our study. There was no study that used a time-series prediction model for the simulation model which predicted the input. So, this study has the novelty in its methodology and the case study process, which is the most complete patient flow for COVID-19-suspected patients. The main contributions of this research can be summarized and highlighted below:

• Developing a simulation model of the COVID-19 patient flow in a hospital as a real-case study • Predicting the different categories of patient (i.e., outpatient, emergency, and inpatient) arrival using the timeseries model • Proposing various scenarios based on different levels of input variables using the Taguchi method • Evaluating the proposed scenarios based on their input and simulation output using the DEA method • Predicting the bottlenecks of the patient flow process by simulating the worst scenario with predicted patient arrivals • Suggesting some public health policies according to different scenarios and assessing the outcome of each scenario for the policy's decision-makers

A brief overview of the research methodology framework is shown in Fig. 1 . In the first stage, the data about the COVID-19 patient flow are collected. Then, the simulation model is designed based on the distribution functions for each parameter. In the base run of the simulation model, the current bottlenecks of the process are identified.

In the second stage, since the COVID-19 patient journey in a hospital during a pandemic is a complex process and consists of different input parameters affecting the process output, different values can be considered for each of these parameters for each unit. Furthermore, various combinations of these parameters lead to several scenarios. For defining all possible combinations of the input variables, the Taguchi method is used. For each scenario proposed by this method, the simulation model will run and the output variables will measure.

In the third stage, the best scenario should be identified based on the output comparison. It is less probable that a scenario has the best value for all three outputs. If there is a scenario that has the best outputs comparing with the other scenarios, it will be considered as the best scenario. Otherwise, for evaluating the scenarios, the DEA method is considered. Finally, in the last stage, the patient entry in the future will predict and the predicted values will be the input of the best scenario simulation model. The future bottlenecks will be recognized to inform the hospital managers. The tools and techniques used in this study will be explained in detail.

An Arena is an application software with high modeling capabilities and a powerful simulation tool that allow users to create and test a simulation model, while also having an easy-to-use interface. The Arena can simulate a discrete event system (DES) that accelerates the analysis of the behavior of a process or system over time. So, before we get into the practical implementation of a business process, it is best to first model and evaluate it so that we can better decide on some changes in that process and improve it [12] . Furthermore, before it becomes costly and productive, we realize the best of that business process and make the best decisions. The capabilities of Arena are as follows:

• Showing a graphical representation of process flows for even the most complex business processes • Monitoring, analyzing, and better understanding the behavior of workflows • Guessing more accurately the efficiency, response time, and bottlenecks of a new system or design • Evaluating the impact of error rates • Changing or improving how the system is configured and tasks are performed • Testing different ways to find the best solution for a topic • Showing the results graphically and numerically to increase the acceptance and understanding of decisions [23] 

Data envelopment analysis (DEA) is a mathematical planning model for evaluating the performance of decisionmaking units (DMUs) that have multiple inputs and multiple outputs. Charnes et al. (1978) proposed this method as a CCR model by the first letter of their names for calculating the efficiency of each DMU by solving a nonlinear mathematical model as Eqs. (1)- (3):

where x io means the value of input i, y ro , the value of output r, v i , the weight of input i, u r , the weight of output r, number j of DMUs, and is a positive parameter.

However, this nonlinear programming can be transferred into linear as Eqs. 

In this method, an efficient boundary curve is created from a series of points that are determined by linear programming. The linear programming method determines whether the DMU is on the edge of efficiency or outside it. Thus, efficient and inefficient units are separated from each other based on Fig. 2 .

The parameters considered inputs are as follows:

• Number of physicians in the emergency COVID-19 special line • Number of nurses in the emergency COVID-19 special line • Number of physicians in ICU • Number of nurses in ICU 

To examine the different values of input factors affecting the entire system, for every eleven input parameters, three levels were chosen and given in Table 1 . In this table, each column indicated three levels and inputs are in rows. These levels are defined based on the minimum, medium, and maximum values that can be in the system based on the hospital capacity and budget. If we want to examine a full factorial experiment for eleven inputs having three levels, the number of required requirements will be 3 11 (i.e., 177,147) experiments. However, the Taguchi method reduces the number of experiments to fewer experiments that can be investigated easier (Davim, 2003) . The steps of this method include the following: (1) choose control factors, (2) choose suitable levels for factors, (3) choose an orthogonal array, which is appropriate for the control factors, (4) carry out the experiments, and (5) analyze the experiments and find the best combination for levels of factors. Based on orthogonal arrays of Taguchi, for eleven of three-level factors, 27 scenarios are suggested as given in Table 2 . While the value for each factor is 1 it means the minimum value, 2 the medium value, and three the maximum value.

For the prediction of patient arrival based on their historical arrival, the SARIMA(p, d, q),(P, D, Q)m model is used. This model is used because of its simplification and appropriation for predicting the patient arrival. Also, it is used when there are only two columns in the dataset (i.e., data and event frequency) in non-linear cases, which have seasonal behavior [28] . However, other time-series model which is more complicated (i.e., LSTM) is used one there are other exogenous features impact on the event frequency [41] . Since the data available in the case study includes the information of date and patient arrival and other information (e.g., the region population, the infection rate, the connection frequency, and some detailed features), we will use the SARIMA model for time-series prediction. This model includes several parameters that can be tuned to achieve optimal performance. These parameters are trend elements and seasonal elements as follows: To get the best prediction, the values of SARIMA(p, d, q),(P, D, Q)m should be optimized. For this aim, we used "grid search" to iteratively explore different combinations of these parameters. The evaluation metric used for the grid search is the Akaike information criterion (AIC) value. The AIC measures how well a model fits the data while considering the overall complexity of the model [30] .

The augmented Dickey-Fuller (ADF) test is used for checking stationary. The ADF approach is essentially a statistical significance test that compares the p-value with the critical values and does the hypothesis testing. Using this test, we can determine whether the processed data are stationary or not with different levels of confidence. If p-value > 0.05, then the zero hypothesis with the stationary will reject data [38] .

The metric used for evaluation is the root mean squared error (RMSE) as Eq. (9), where y t which is the actual patient entry on the date (t), y predicted t is the predicted value of the patient entry on the date (t), and n is the number of test dates.

This study was performed at a hospital in Iran. This hospital provides services to COVID-19 patients in four normal

n units with 104 beds and three intensive care units with 70 beds. The laboratory and CT scan units of this hospital are also active 24 h a day of outpatients, emergencies, and inpatients who have been hospitalized before for other reasons. The emergency unit of this hospital also has 10 beds for the severe symptom patients who stay in these beds until the inpatient unit bed becomes empty. The data for patient entry (i.e., outpatients, emergencies, and inpatients), symptoms (i.e., no symptoms, moderate, and severe), laboratory test results and CT-scan reports, and length of stay in normal and intensive units for all entered cases are collected from 3 May to 5 October 2020. Figure 3 shows the data related to patients collected from the case study. However, the information about the available beds, the human resources in each unit, the waiting times of patient in every stage, and the average time of each activity is obtained from the hospital information system (HIS), and the managers for each category of patients (i.e., no symptoms, moderate, and severe) will be explained in five different flows in the next section as the "base process."

In the patient flow of the COVID-19-suspected ones in the case study hospital, three groups of patients enter the process. These three categories include outpatients, emergencies, and inpatients. The definition of each category has been identified:

• Outpatients are patients who go directly to a lab or CT scan based on their suspicion or some of the symptoms of COVID-19 disease visibility. • Emergencies are patients who have called the emergency services due to the visibility of some symptoms and are delivered by ambulance to the emergency unit of the hospital. • Inpatients include patients who have already been hospitalized for other reasons before and need to be tested for COVID-19 services due to the visibility of some symptoms or even before their surgery operations.

Outpatients take three steps based on the severity of their first symptoms. If they have no symptoms, they usually go to the lab. If they have moderate symptoms, they will have a CT scan for a chest x-ray. It is rare for outpatients to have severe symptoms, but if so, they go to the triage of the emergency unit of the COVID-19 line. Emergencies are also delivered by ambulance to the triage of the hospital's emergency COVID-19 line. In this case, with the diagnosis of emergency unit triage, if their symptoms are not severe, they are sent to the laboratory. If they have moderate symptoms, they will have a CT scan for a chest x-ray. Most patients referred to the emergency department have severe symptoms, in which case, while staying on one of the beds in the Number of nurses in CCU 6 7 8 7

Number of the service providers in CT-scan unit 4 8 12 8

Number of the service providers in Laboratory 14 18 22 9 Number of beds in ICU 32 52 72 10

Number of beds in CCU 10 18 38 11 Number of beds in emergency COVID-19 special line 5 10 15 hospital's emergency COVID-19 line, they go to the laboratory and CT-scan unit at the same time until their admission will be done. Inpatients, as needed or sometimes observing the symptoms of the disease, go to the laboratory and radiology to ensure that they are not infected with COVID-19 to continue their treatment in another disease that they have. In this case, they are sometimes confirmed to have COVID-19, which must be treated at the same time as their underlying disease and coronary heart disease. If inpatients have no symptoms but need a COVID-19 test, they are sent to a laboratory first. If they have moderate symptoms, they first go to the radiologist and then to the laboratory. In this case, these patients are in their non-coronary wards and are transferred for tests and returned to their wards. However, if they have severe symptoms, they are immediately isolated in their ward to prevent transmission to other patients in the non-coronary wards of the hospital and sent to the laboratory and radiology at the same time.

In general, in COVID-19 disease, observing the symptoms is the priority in making a patient decision. After that, the CT-scan results and finally the test result determine whether or not a patient is infected. There are five different flows of patients, which are explained as follows:

1. First flow: Outpatients and emergencies with no symptoms wait in the laboratory and will be tested after a while. The test result is also prepared after an average of 6 h. Ten percent of the time, it is necessary to repeat the test. If there is no need to repeat the test, based on the test results, then patients follow three ways: (I) If their test is negative, they leave the hospital, (II). If the test is positive, some patients leave the hospital on their own to go home, some patients go home based on physician's orders to be quarantined, and some patients prefer to go to another hospital, and (III) the rest of the patients go to the radiology for CT scan. In radiology, patients are waiting in a queue, and after an average of 25 min, while no waiting for emergency patients, a CT scan is performed. The CT-scan result is ready after 2 h on average. Based on the CT-scan result along with the test result, the following conditions occur: Scenario  Input 1  Input 2  Input 3  Input 4  Input 5  Input 6  Input 7  Input 8  Input 9  Input 10  Input 11   1  1  1  1  1  1  1  1  1  1  1  1  2  1  1  1  1  2  2  2  2  2  2  2  3  1  1  1  1  3  3  3  3  3  3  3  4  1  2  2  2  1  1  1  2  2  2  3  5  1  2  2  2  2  2  2  3  3  3  1  6  1  2  2  2  3  3  3  1  1  1  2  7  1  3  3  3  1  1  1  3  3  3  2  8  1  3  3  3  2  2  2  1  1  1  3  9  1  3  3  3  3  3  3  2  2  2  1  10  2  1  2  3  1  2  3  1  2  3  1  11  2  1  2  3  2  3  1  2  3  1  2  12  2  1  2  3  3  1  2  3  1  2  3  13  2  2  3  1  1  2  3  2  3  1  3  14  2  2  3  1  2  3  1  3  1  2  1  15  2  2  3  1  3  1  2  1  2  3  2  16  2  3  1  2  1  2  3  3  1  2  2  17  2  3  1  2  2  3  1  1  2  3  3  18  2  3  1  2  3  1  2  2  3  1  1  19  3  1  3  2  1  3  2  1  3  2  1  20  3  1  3  2  2  1  3  2  1  3  2  21  3  1  3  2  3  2  1  3  2  1  3  22  3  2  1  3  1  3  2  2  1  3  3  23  3  2  1  3  2  1  3  3  2  1  1  24  3  2  1  3  3  2  1  1  3  2  2  25  3  3  2  1  1  3  2  3  2  1  2  26  3  3  2  1  2  1  3  1  3  2  3  27  3  3  2  1  3  2  1  2  1  3  1 • If the CT scan is positive, the test result is positive, and the patient has no symptoms, he/she goes home based on physician's orders; however, he/she must be quarantined at home and rest until complete recovery.

• If the CT-scan result is negative while the test result is positive and the patient has no specific symptoms, he/ she goes home for quarantine based on the physician's orders. After an average of 5 min of waiting for their CT scan to be done, the CT-scan answer is ready after 2 h on average. Based on the CT scan result along with the test result, the following conditions occur:

• If the CT scan and the test are positive and the patient has no symptoms, the patient should be isolated in his/her unit.

• If the CT scan result is negative while the test result is positive and the patient has no specific symptoms, the patient should remain in his/her unit. 5. Fifth flow: Inpatients with moderate and severe symptoms are first referred to radiology and then transferred to the laboratory. Due to the moderate and severe symptoms, these patients are isolated in their unit, regardless of their results, and are waiting for COVID-29-unit admission.

In the hospital's COVID-19 units, patients are separated into normal units or intensive care units based on clinical diagnosis. So that if the patient has a clinical disease before, he/she will be admitted to intensive care units and otherwise to normal units. In intensive care units, after undergoing the relevant treatments based on the physician's order, patients are first transferred to normal units and complete their treatment there. Then, patients will leave the system in three modes. Either they are discharged based on the physician's order due to complete recovery or continuing treatment, or they unfortunately die, or are transferred to another hospital to continue their treatment. The process explained above is depicted in Fig. 4 in five flows described and a general flow, which is common for all categories of patients.

In this section, the base patient flow is simulated. For simulating the model of the present study, the data collected from the case study hospital, and then in one of the Arena software add-ins called input analyzer, the input data are converted into distributed functions and these functions are used in the process. Also, the distribution functions of the variables are explained in Section A in the Supplementary Materials.

In the process of discrete event simulation performed with Arena software, different modules are used. In the simulation of this research, the "Create" module is used to create the entities of the studied process (i.e., patients). Patients are divided into three categories, namely, outpatients, emergencies, and inpatients when each of them has its flow. The "Process" module is used to perform various activities during the process, for example, activities (e.g., laboratory test, CT scan, and treatment). The "Hold" module is used by different entities to wait in the queue at different workstations. In some parts of the simulation, the "Decide" module is used to divide the different paths of the entities. Another module used in this model is the "Assign" module, which is used to separate entities in different sections for accurate monitoring. The "Record" module is used to record data of different parts of the process. All entities are also removed from the process by the "Dispose" module. Also, Fig. 5 is part of the simulation model of the patient flow in the hospital. However, the complete simulation model is shown in Section B in the Supplementary Materials.

According to the outputs of the base patient flow model, the total time of outpatients in the system is from 56 to 2998 min. These patients vary in the length of time and stay in the system according to their various conditions. However, these patients are in the system for 187 min on average. Also, the waiting time of outpatients in the system is from 42 to 258 min, with an average of 64 min. Emergencies are in the system from 85 to 12,129 min and an average of 172 min. Also, their average waiting time is 86 min in the range of 61 to 364 min. Inpatients also stay in the system from 25 to 8753 min, with an average of 546 min remaining in the system. The average waiting time for inpatients is 419 when the range of it was between 13 and 293 min. The comparison of the average total time and waiting time of each category is shown in Fig. 6 .

As different 27 scenarios are defined in Table 2 with three levels described in Table 1 , based on the simulation model running, the outputs of the model are obtained. The outputs of the model include the following: Table 3 .

Now, when all the inputs and outputs are specified for all scenarios, the DEA can be done with eleven inputs and four outputs to evaluate the scenarios and compare them as shown in Fig. 7 . Also, Table 4 is depicted the efficiency score of each scenario and its rank. As can be seen in Fig. 7 , the least-efficient scenarios are scenarios of 3, 7, 11, and 12, which are less or equal to 0.998 efficient scores. Also, the scenarios of 5, 15, and 24 are 0.999 efficient. The base model has the best efficient score. Other scenarios, which have more input variables and sources, are not more efficient. The reason for this is because of the time need for the result of the CT scan and laboratory. So that the added resources (e.g., physicians, nurses, and beds) do not affect the total time and the waiting time amazingly. Consequently, the discharged patient ratio is not changed really. So, with more resources that lead to more costs in all scenarios and not better outputs, the efficient scores of all scenarios are less than the base scenario. However, in Fig. 8 , the scenarios with the same efficiency score are shown. Although they alter the outputs with different values, entirely, their performance is the same. The decision-makers should select among them considering their priority of total time, waiting time, and discharge ratio. Besides, the increase or decrease amount of each output based on all scenarios in comparison to the base scenario is calculated in Table 5 .

Based on Tables 4 and 5 , we have to find the worst scenario. For finding the worst scenario, we do not focus on the DEA efficiency score and consider the output values, too. So, the scenario of number 14 can be considered as the worst based on its effects on outputs although it was efficient. However, it did not change total time, increased the average waiting time with more costs while improved the discharged patient ratio.

This study proposes a time-series framework for three categories of patient entries (i.e., outpatients, emergencies, and inpatients). The framework of the prediction models is illustrated in Fig. 9 . First, the raw data are preprocessed and checked for the stationary test. Then, the data are divided into train and test sets using the train-test-split module on "Sklearn. Model_selection" package in Python programming language (70% train and 30% test). Finally, the SARIMA model is constructed. Then, the patient entry is predicted for 30 days later. The accuracy of the model will be verified by comparing the measured data with the real data via the RMSE.

For data preprocessing, the missing values for someday patient arrivals and the average values are considered. For checking the stationary of data, the rolling mean and standard deviation of each column are calculated within 6 days for mean and 24 days in standard deviation. Daily patient entry of emergencies, outpatients, and inpatients are depicted in Fig. 10 based on collected data. As can be seen in Fig. 11 , we see that the rolling mean itself has a trend component even though the rolling standard deviation is fairly constant with time. For our time series to be stationary, we need to ensure that both the rolling statistics (mean and standard deviation) remain time-invariant or constant with time. Thus, the curves for both of them have to be parallel to the x-axis, which in the outpatient arrival is not so. Table 6 shows the ADF test results for three types of patients.

To help data to be stationary, detrending is done as Eq. (10) and the detrending patient's arrivals are shown in Fig. 12 . (10) y.detrend = (y − y.rolling(window = 6).mean())∕y.rolling(window = 24).std() Fig. 11 Rolling mean and standard deviation of patient entry of emergency patients, outpatients, and inpatients Finally, Table 7 shows the ADF test results for outpatients and emergencies after detrending. Now, three types of patients are stationary and the prediction model can be developed.

For tuning the parameters of the SARIMA model, the "gridsearch" method is used. This is a parameter-tuning solution.

The key point about the performance of this method is that, for each possible combination of parameters in the grid, the model is constructed and evaluated. Hence, it can be said that this algorithm has a search nature. We define different ranges for the parameters and selected the AIC as evaluation metric. As mentioned before, the best prediction model is the model with the lowest AIC value. Based on results, the model of SARIMA(1, 1, 1) × (0, 1, 1, 12) has the lowest AIC value for inpatient entry, SARIMA(0, 1, 1) × (0, 1, 1, 12) for outpatient and also emergency entry.

Results for the next 30 days of patient arrival prediction for all patient categories are shown in Fig. 13 . The RMSE of the SARIMA with a season length of 12 for inpatients, outpatients, and emergencies are 4.87, 27.54, and 3.06, respectively. The gray area above and below the orange line in this figure represents the 95% confidence interval and as with virtually all forecasting models, as the predictions go further into the future, the less confidence we have in our values. In this case, we are 95% confident that the actual patient arrivals will fall inside this range as shown in Table 8 for each patient category.

Assume that the worst situation will be happening. So, the simulation model of scenario 14 is run again with the upper bound of time-series prediction of patient arrivals. This scenario had the worst outputs among other scenarios and could be considered the worst situation. However, all the scenarios can be run again, we obtain whether there are some bottlenecks in the patient flow in a pessimistic situation. In the simulated system used in this study, the most bottlenecks are related to the ICU, CCU, and corona special beds. After these three bottlenecks, the number of nurses in CCU and ICU wards and laboratories is the biggest challenge in the system. In this regard, to evaluate the stability of the system against the number of patients and their deterioration and the ability to respond to existing needs, the future of the system should be simulated. Therefore, scenario 14, which has the least increase in resources in the expressed bottlenecks and also does not improve significantly compared to the baseline, is selected and the number of patients admitted to the system in all three types of outpatients, inpatients, and emergency patients using the time-series machine learning method is predicted. Based on the simulation findings of scenario 14 in the future of the system, it is concluded that the system will collapse after 14 days according to the predictions made. This means that the bottleneck of the ICU and CCU becomes problematic. In this regard, the following solutions must be taken for the system to continue:

• Creating more capacities for hospitalization of coronary patients in the studied hospital • Creating the capacity to hospitalize coronary patients referred to the studied hospital in other hospitals • Establishment of temporary capacities (i.e., hospitals) to transfer patients required to be admitted to those places • Transferring more patients to their homes and providing services remotely and in patients' homes It is mentioned that, for practical use and exploitation, managers of the hospital can analyze each decision they are going to make using this proposed model as what has been carried out above. So, it can be a decision support tool for evaluating every policy before implementation.

In this study, the COVID-19 patient flow in the hospital of a case study was first investigated. Then, the process with detailed data was simulated and the outputs were obtained. Consequently, 27 scenarios based on 11 inputs were defined based on the Taguchi method and simulated all scenarios. Then, the DEA method was used to calculate the efficiency score of scenarios. Finally, the worst scenario was simulated with predicted patient arrivals, which was the output of the SARIMA time-series model and the bottlenecks were identified. Moreover, we tried to highlight the simulation tools as decision support systems for hospital managers, who are willing to be more efficient and rely on data as the datadriven decision-makers. Since simulation can visualize the future and help the managers in human resource planning, facilities procurement, and other strategic and tactical decisions, we demonstrate the proposed approach in the case study for helping the managers in decision-making.

Our study presents some limitations. First, we only considered the patient arrival rate and not the other features which could be the impact on being infected. Second, we did not consider the impact of workload on the physician and nurse capability or even their infection as well as their specialty level. Third, we did not consider the beds can be transferred from other units to ICU and their quality. As these are the limitations of our study, we should highlight that, although our proposed approach could be as a decision support system, it does not guarantee optimal results that could be continued by future studies.

Furthermore, future studies can focus on different prediction models of patient arrivals based on other exogenous features for other time-series prediction models such as LSTM and machine learning regression models. Also, using other methods of investigating different inputs' effects on the outputs (e.g., system dynamic approach) can analyze different scenario results. Besides, other studies can focus on the same problem using process mining tools.

Author contribution All authors (Mahdieh Tavakoli, Reza Tavakkoli-Moghaddam, Reza Mesbahi, Mohssen Ghanavati-Nejad, Amirreza Tajally) contributed to all parts of this research including conceptualization; formal analysis; resources; methodology; supervision; data collection and investigation; software; validation; and writing -review & editing and Reza Tavakkoli-Moghaddam has the role of project administration.

Availability of data and material All data generated or analyzed during this research are included in this published article.

Code availability Not applicable.

Ethics approval The authors certify that they have no affiliation with or involvement with human participants or animals performed by any of the authors in any organization or entity with any financial or nonfinancial interest in the subject matter or materials discussed in this paper.

Mohssen Ghanavati-Nejad is currently the PhD candidate in School of Industrial Engineering at the University of Tehran, Iran. He obtained his BSc and MSc degrees in Industrial Engineering from Qom University of Technology and Tarbiat Modares University in Tehran in 2015 and 2017, respectively. He is very interested in digital transformation and lean approaches, business startup, business model development, and data-driven marketing.

Amirreza Tajally is currently the MSc student in School of Industrial Engineering at the University of Tehran. He obtained his BSc degree in Industrial Engineering from North Tehran Branch at Islamic Azad University. He has strong background in datadriven optimization and machine learning and has more than 6 years experienced in the instructing of data science. Also, he implemented data-driven tools and techniques for organizational problemsolving in different fields, such as automotive industries and healthcare systems. His main research interests are focused on data-driven decision-making, customer behavior analytics, and statistical learning with applications ranging from manufacturing, transportation, and retail to health management and cognitive computing.

Sent D (2020) ICU capacity management during the COVID-19 pandemic using a stochastic process simulation

Spatiotemporal dengue fever hotspots associated with climatic factors in taiwan including outbreak predictions based on machine-learning

COVID-19 outbreak prediction with machine learning

An integrated multi-criteria taguchi computer simulation-DEA approach for optimum maintenance policy and planning by incorporating learning effects

A discrete event simulation model for coordinating inventory management and material handling in hospitals

A reliable timeseries method for predicting arthritic disease outcomes: new step from regression toward a nonlinear artificial intelligence method

Maximization of open hospital capacity under shortage of sars-cov-2 vaccines-an open access, stochastic simulation tool

Modeling and analysis of the emergency department at university of Kentucky Chandler Hospital using simulations

The norovirus epidemiologic triad: predictors of severe outcomes in US norovirus outbreaks

Safety assessment for temporary hospitals during the COVID-19 pandemic: a simulation approach

Consensus and conflict among ecological forecasts of Zika virus outbreaks in the United States

Computer simulation and discrete-event models in the analysis of a mammography clinic patient flow

A comparative study on predicting influenza outbreaks using different feature spaces: application of influenza-like illness data from Early Warning Alert and Response System in Syria

Use of simulation to develop a COVID-19 resuscitation process in a pediatric emergency department

COVID-19 outcome prediction and monitoring solution for military hospitals in South Korea: development and evaluation of an application

Feature selection and risk prediction for patients with coronary artery disease using data mining

Evaluation of posterior airway space after setback surgery by simulation

Seasonal difference in temporal transferability of an ecological model: near-term predictions of lemming outbreak abundances

Simulation of patient flow in multiple healthcare units using process and data mining techniques for model identification

Prediction for global African swine fever outbreaks based on a combination of random forest algorithms and meteorological data

Time series modelling to forecast the confirmed and recovered cases of COVID-19

Arena software tutorial

Balancing scarce hospital resources during the COVID-19 pandemic using discrete-event simulation

A novel healthcare resource allocation decision support tool: a forecasting-simulation-optimization approach

Patient flow improvement for an ophthalmic specialist outpatient clinic with aid of discrete event simulation and design of experiment

A python based support vector regression model for prediction of COVID19 cases in India

Comparative evaluation of hybrid sarima and machine learning techniques based on time varying and decomposition of precipitation time series

Measuring the efficiency of the Portuguese public hospitals: a value modelled network data envelopment analysis with simulation

Predicting cutaneous leishmaniasis using SARIMA and Markov switching models in Isfahan, Iran: a time-series study

Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model

Superensemble forecast of respiratory syncytial virus outbreaks at national, regional, and state levels in the United States

A predictive management tool for blackfly outbreaks on the Orange River

Suresh A (2020) Prediction and spread visualization of COVID-19 pandemic using machine learning

Inter-outbreak stability reflects the size of the susceptible pool and forecasts magnitudes of seasonal epidemics

Improving emergency departments: simulation-based optimization of patients waiting time and staff allocation in an Iranian hospital

Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran

Forecasting zoonotic cutaneous leishmaniasis using meteorological factors in eastern Fars province, Iran: a SARIMA analysis

Prediction for the spread of COVID-19 in India and effectiveness of preventive measures

Optimizing access to heart failure care in Canada during the COVID-19 pandemic

Forecasting tourist daily arrivals with a hybrid Sarima-Lstm approach

Analysis and prediction of confirmed COVID-19 cases in China with uncertain time series. Fuzzy Optimization and Decision Making, 1-20

Simulation and improvement of patients' workflow in heart clinics during COVID-19 pandemic using timed coloured petri nets

Deep learning methods for forecasting COVID-19 time-series data: a comparative study

A machine learning approach for mortality prediction only using non-invasive parameters

Simulation-based optimization to improve hospital patient assignment to physicians and clinical units

Early prediction of the 2019 novel coronavirus outbreak in the mainland China based on simple mathematical model

The authors declare no competing interests.

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Mahdieh Tavakoli is currently the PhD candidate in School of Industrial Engineering at the University of Tehran, Iran. She obtained her BSc and MSc degrees from the Alzahra University and Tarbiat Modares University in Tehran in 2015 and 2017, respectively. She has started using industrial engineering functions in healthcare systems for 5 years and experienced different projects in hospitals in fields of process mining, simulation, data analysis, and system dynamic. Her interest areas are optimization, data-driven decision-making, and process mining.